Back to Archive

AutoBench Run 3 - August 2025

Latest AutoBench run with enhanced metrics including evaluation iterations and fail rates

Past
Date
August 14, 2025
Version
2025-08-14
Models
33
New Models
26

Run data

Model
ScoreAvg Cost ($ Cents)Avg Latency (sec)P99 Latency (sec)Iterations
3.54 (#32)0.02 (#1)5s (#1)10s (#1)393
3.66 (#28)0.02 (#2)8s (#3)19s (#2)392
3.88 (#24)0.03 (#3)30s (#12)135s (#13)393
3.61 (#30)0.04 (#4)11s (#5)40s (#5)393
3.88 (#25)0.05 (#6)33s (#13)151s (#14)392
3.64 (#29)0.05 (#5)11s (#4)71s (#6)388
3.98 (#20)0.08 (#7)61s (#21)239s (#23)392
3.95 (#22)0.08 (#8)73s (#29)243s (#25)390
4.06 (#17)0.09 (#9)26s (#10)116s (#9)391
4.02 (#18)0.11 (#10)19s (#8)127s (#12)389
3.95 (#23)0.12 (#11)40s (#17)200s (#19)392
4.48 (#3)0.14 (#12)27s (#11)119s (#10)388
3.49 (#33)0.18 (#13)8s (#2)20s (#3)389
3.71 (#26)0.20 (#14)18s (#7)90s (#7)390
4.18 (#12)0.24 (#15)65s (#24)391s (#33)325
4.33 (#7)0.24 (#16)67s (#27)232s (#22)390
4.02 (#19)0.35 (#17)62s (#22)202s (#20)391
3.98 (#21)0.36 (#18)68s (#28)241s (#24)392
4.39 (#6)0.42 (#19)79s (#30)284s (#32)331
4.32 (#8)0.45 (#20)49s (#19)244s (#26)387
3.71 (#27)0.61 (#21)24s (#9)97s (#8)392
4.18 (#13)0.63 (#23)81s (#31)246s (#27)389
4.49 (#2)0.63 (#22)66s (#26)231s (#21)392
4.18 (#14)0.64 (#24)119s (#33)266s (#29)385
3.59 (#31)0.83 (#25)12s (#6)25s (#4)393
4.27 (#10)0.87 (#26)39s (#16)186s (#17)393
4.17 (#15)0.91 (#27)33s (#14)181s (#16)392
4.42 (#4)1.59 (#28)65s (#25)199s (#18)388
4.17 (#16)1.71 (#29)34s (#15)120s (#11)393
4.41 (#5)1.85 (#30)64s (#23)277s (#30)391
4.31 (#9)2.92 (#31)61s (#20)263s (#28)360
4.51 (#1)4.37 (#32)90s (#32)278s (#31)385
4.24 (#11)9.13 (#33)49s (#18)155s (#15)387
AutoBench Run 3 - August 2025 - AutoBench