Back to Archive

AutoBench Run 5 - December 2025

Latest AutoBench run with models Gpt 5.2, Claude Opus 4.5, Gemini 3 Flash and more

Latest
Date
December 19, 2025
Version
2025-12-19
Models
38
New Models
3

Run data

Model
ScoreAvg Cost ($ Cents)Avg Latency (sec)P99 Latency (sec)Iterations
3.877365 (#30)0.08 (#6)24s (#2)57s (#1)312
3.94904 (#28)0.21 (#11)20s (#1)69s (#2)313
4.028261 (#25)0.00 (#1)30s (#3)98s (#3)314
4.059981 (#23)3.94 (#30)61s (#9)132s (#4)277
3.473742 (#38)1.30 (#25)52s (#7)135s (#5)312
4.303068 (#7)1.95 (#27)46s (#6)137s (#6)313
3.811798 (#33)0.38 (#16)52s (#8)147s (#7)306
3.570151 (#36)0.05 (#3)31s (#4)154s (#8)306
4.171935 (#15)2.12 (#28)66s (#10)174s (#9)312
3.779105 (#35)0.07 (#4)39s (#5)183s (#10)310
4.405224 (#3)6.85 (#32)76s (#14)186s (#11)312
3.935105 (#29)0.51 (#18)90s (#19)198s (#12)307
4.206201 (#11)0.27 (#12)69s (#12)207s (#13)306
3.500291 (#37)0.08 (#5)67s (#11)212s (#14)311
4.294065 (#9)6.48 (#31)87s (#18)222s (#15)313
4.031744 (#24)0.75 (#21)78s (#16)227s (#16)312
3.783612 (#34)0.18 (#9)76s (#15)240s (#17)311
4.287269 (#10)0.91 (#22)93s (#20)258s (#18)312
4.060397 (#22)0.34 (#15)100s (#21)269s (#19)309
3.85021 (#32)0.00 (#2)122s (#24)270s (#20)307
4.181097 (#14)0.11 (#8)75s (#13)292s (#21)292
4.170821 (#16)3.79 (#29)111s (#23)317s (#22)312
4.107852 (#21)0.33 (#14)83s (#17)329s (#23)312
3.98095 (#27)0.19 (#10)105s (#22)337s (#24)302
4.39496 (#4)17.26 (#37)144s (#28)373s (#25)313
4.109586 (#20)0.09 (#7)125s (#25)410s (#26)311
3.864646 (#31)0.54 (#19)163s (#29)425s (#27)306
4.430061 (#2)7.36 (#33)130s (#26)434s (#28)312
3.990557 (#26)0.71 (#20)137s (#27)473s (#29)308
4.118577 (#19)0.99 (#23)171s (#31)477s (#30)308
4.30218 (#8)11.39 (#36)170s (#30)477s (#31)307
4.197064 (#12)8.12 (#34)180s (#32)562s (#32)293
4.38364 (#5)10.80 (#35)227s (#34)627s (#33)310
4.132794 (#18)1.25 (#24)187s (#33)630s (#34)306
4.315342 (#6)1.86 (#26)248s (#35)729s (#35)287
4.476206 (#1)81.88 (#38)261s (#36)784s (#36)303
4.196769 (#13)0.32 (#13)317s (#38)811s (#37)283
4.141433 (#17)0.47 (#17)310s (#37)833s (#38)288
AutoBench Run 5 - December 2025 - AutoBench