Back to Archive
AutoBench Run 5 - December 2025
Latest AutoBench run with models Gpt 5.2, Claude Opus 4.5, Gemini 3 Flash and more
Latest
Date
December 19, 2025
Version
2025-12-19
Models
38
New Models
3
Run data
Model | Score | Avg Cost ($ Cents) | Avg Latency (sec) | P99 Latency (sec) | Iterations |
|---|---|---|---|---|---|
| 3.877365 (#30) | 0.08 (#6) | 24s (#2) | 57s (#1) | 312 | |
| 3.94904 (#28) | 0.21 (#11) | 20s (#1) | 69s (#2) | 313 | |
| 4.028261 (#25) | 0.00 (#1) | 30s (#3) | 98s (#3) | 314 | |
| 4.059981 (#23) | 3.94 (#30) | 61s (#9) | 132s (#4) | 277 | |
| 3.473742 (#38) | 1.30 (#25) | 52s (#7) | 135s (#5) | 312 | |
| 4.303068 (#7) | 1.95 (#27) | 46s (#6) | 137s (#6) | 313 | |
| 3.811798 (#33) | 0.38 (#16) | 52s (#8) | 147s (#7) | 306 | |
| 3.570151 (#36) | 0.05 (#3) | 31s (#4) | 154s (#8) | 306 | |
| 4.171935 (#15) | 2.12 (#28) | 66s (#10) | 174s (#9) | 312 | |
| 3.779105 (#35) | 0.07 (#4) | 39s (#5) | 183s (#10) | 310 | |
| 4.405224 (#3) | 6.85 (#32) | 76s (#14) | 186s (#11) | 312 | |
| 3.935105 (#29) | 0.51 (#18) | 90s (#19) | 198s (#12) | 307 | |
| 4.206201 (#11) | 0.27 (#12) | 69s (#12) | 207s (#13) | 306 | |
| 3.500291 (#37) | 0.08 (#5) | 67s (#11) | 212s (#14) | 311 | |
| 4.294065 (#9) | 6.48 (#31) | 87s (#18) | 222s (#15) | 313 | |
| 4.031744 (#24) | 0.75 (#21) | 78s (#16) | 227s (#16) | 312 | |
| 3.783612 (#34) | 0.18 (#9) | 76s (#15) | 240s (#17) | 311 | |
| 4.287269 (#10) | 0.91 (#22) | 93s (#20) | 258s (#18) | 312 | |
| 4.060397 (#22) | 0.34 (#15) | 100s (#21) | 269s (#19) | 309 | |
| 3.85021 (#32) | 0.00 (#2) | 122s (#24) | 270s (#20) | 307 | |
| 4.181097 (#14) | 0.11 (#8) | 75s (#13) | 292s (#21) | 292 | |
| 4.170821 (#16) | 3.79 (#29) | 111s (#23) | 317s (#22) | 312 | |
| 4.107852 (#21) | 0.33 (#14) | 83s (#17) | 329s (#23) | 312 | |
| 3.98095 (#27) | 0.19 (#10) | 105s (#22) | 337s (#24) | 302 | |
| 4.39496 (#4) | 17.26 (#37) | 144s (#28) | 373s (#25) | 313 | |
| 4.109586 (#20) | 0.09 (#7) | 125s (#25) | 410s (#26) | 311 | |
| 3.864646 (#31) | 0.54 (#19) | 163s (#29) | 425s (#27) | 306 | |
| 4.430061 (#2) | 7.36 (#33) | 130s (#26) | 434s (#28) | 312 | |
| 3.990557 (#26) | 0.71 (#20) | 137s (#27) | 473s (#29) | 308 | |
| 4.118577 (#19) | 0.99 (#23) | 171s (#31) | 477s (#30) | 308 | |
| 4.30218 (#8) | 11.39 (#36) | 170s (#30) | 477s (#31) | 307 | |
| 4.197064 (#12) | 8.12 (#34) | 180s (#32) | 562s (#32) | 293 | |
| 4.38364 (#5) | 10.80 (#35) | 227s (#34) | 627s (#33) | 310 | |
| 4.132794 (#18) | 1.25 (#24) | 187s (#33) | 630s (#34) | 306 | |
| 4.315342 (#6) | 1.86 (#26) | 248s (#35) | 729s (#35) | 287 | |
| 4.476206 (#1) | 81.88 (#38) | 261s (#36) | 784s (#36) | 303 | |
| 4.196769 (#13) | 0.32 (#13) | 317s (#38) | 811s (#37) | 283 | |
| 4.141433 (#17) | 0.47 (#17) | 310s (#37) | 833s (#38) | 288 |