Back to Archive

AutoBench Run 2 - April 2025

Second major AutoBench run with o4-mini, GPT-4.1-mini, Gemini 2.5 Pro Preview, Claude 3.7 Sonnet:thinking, etc.

Past
Date
April 25, 2025
Version
2025-04-25
Models
24
New Models
24

Run data

Model
ScoreAvg Cost ($ Cents)Avg Latency (sec)P99 Latency (sec)Iterations
4.57 (#1)0.79 (#20)19.10s (#13)52.30s (#14)-
4.46 (#2)1.23 (#23)36.57s (#21)64.18s (#15)-
4.39 (#3)- (#25)45.80s (#24)82.60s (#20)-
4.34 (#4)0.14 (#14)15.38s (#11)29.19s (#10)-
4.34 (#5)1.70 (#24)33.94s (#18)69.79s (#17)-
4.26 (#7)0.52 (#17)84.77s (#25)223.47s (#25)-
4.26 (#6)0.32 (#16)43.84s (#23)94.45s (#21)-
4.26 (#8)0.61 (#19)10.69s (#6)23.67s (#9)-
4.2 (#10)1.13 (#22)15.53s (#12)32.86s (#12)-
4.2 (#9)0.03 (#3)30.03s (#16)79.12s (#19)-
4.18 (#11)0.04 (#7)25.04s (#14)48.74s (#13)-
4.17 (#12)0.10 (#11)34.73s (#20)66.70s (#16)-
4.16 (#14)0.10 (#12)42.28s (#22)140.54s (#24)-
4.16 (#13)0.04 (#4)5.76s (#3)8.82s (#1)-
4.1 (#15)0.85 (#21)11.74s (#8)23.32s (#8)-
4.09 (#16)0.09 (#10)34.57s (#19)106.53s (#23)-
4.05 (#17)0.53 (#18)29.18s (#15)96.77s (#22)-
4.02 (#18)0.04 (#5)31.03s (#17)73.70s (#18)-
4 (#19)0.04 (#6)12.17s (#9)21.75s (#6)-
4 (#21)0.07 (#9)9.76s (#5)23.11s (#7)-
4 (#20)0.05 (#8)8.49s (#4)13.82s (#4)-
3.99 (#22)0.18 (#15)10.80s (#7)17.98s (#5)-
3.89 (#23)0.02 (#2)5.22s (#1)12.47s (#3)-
3.88 (#24)0.01 (#1)13.99s (#10)29.62s (#11)-
3.83 (#25)0.14 (#13)5.65s (#2)9.93s (#2)-
AutoBench Run 2 - April 2025 - AutoBench