Back to Archive

AutoBench Run 4 - November 2025

Latest AutoBench run with models Gemini 3 Pro, Gpt 5.1, Grok 4.1 and more

Past
Date
November 28, 2025
Version
2025-11-28
Models
32
New Models
18

Run data

Model
ScoreAvg Cost ($ Cents)Avg Latency (sec)P99 Latency (sec)Iterations
4.28 (#11)0.18 (#16)136s (#31)421s (#30)319
4.27 (#13)15.44 (#32)114s (#29)362s (#27)324
4.39 (#3)4.62 (#28)45s (#20)119s (#13)324
4.23 (#16)0.58 (#20)107s (#28)348s (#25)325
4.31 (#9)4.06 (#27)58s (#22)186s (#19)326
3.7 (#26)0.03 (#3)36s (#13)252s (#24)326
4.25 (#14)0.83 (#21)100s (#27)362s (#26)327
4.17 (#20)0.07 (#9)23s (#11)66s (#9)327
4.21 (#19)0.17 (#15)45s (#19)220s (#21)328
4.34 (#7)0.98 (#24)86s (#25)429s (#31)328
4.16 (#21)0.12 (#11)100s (#26)384s (#28)328
4.45 (#2)7.71 (#31)152s (#32)434s (#32)328
4.49 (#1)7.53 (#30)130s (#30)386s (#29)328
4.3 (#10)1.42 (#25)40s (#15)154s (#16)329
4.37 (#5)4.84 (#29)55s (#21)170s (#18)329
4.32 (#8)0.26 (#18)69s (#24)207s (#20)329
4.37 (#4)0.06 (#8)17s (#6)55s (#7)329
4.34 (#6)0.19 (#17)64s (#23)223s (#23)329
4.24 (#15)0.29 (#19)43s (#17)223s (#22)330
4.27 (#12)1.88 (#26)42s (#16)166s (#17)330
4.22 (#17)0.14 (#13)14s (#4)69s (#10)330
3.61 (#28)0.05 (#6)13s (#2)48s (#4)330
3.7 (#27)0.91 (#23)13s (#3)33s (#2)330
3.35 (#32)0.17 (#14)7s (#1)21s (#1)330
4.21 (#18)0.05 (#7)38s (#14)131s (#15)331
4.08 (#23)0.10 (#10)20s (#10)53s (#6)331
3.55 (#30)0.04 (#4)18s (#9)79s (#11)331
4.1 (#22)0.13 (#12)44s (#18)128s (#14)331
3.95 (#24)0.89 (#22)17s (#7)57s (#8)331
3.6 (#29)0.04 (#5)25s (#12)105s (#12)331
3.46 (#31)0.01 (#1)18s (#8)47s (#3)331
3.81 (#25)0.02 (#2)15s (#5)49s (#5)332
AutoBench Run 4 - November 2025 - AutoBench