Back to Archive

AutoBench Run 2 - April 2025

Second major AutoBench run with o4-mini, GPT-4.1-mini, Gemini 2.5 Pro Preview, Claude 3.7 Sonnet:thinking, etc.

Past
Date
April 25, 2025
Version
2025-04-25
Models
24
New Models
24

Run data

Model
ScoreAvg Cost ($ Cents)Avg Latency (sec)P99 Latency (sec)Iterations
3.99 (#22)0.18 (#15)11s (#7)18s (#5)-
4.39 (#3)- (#25)46s (#24)83s (#20)-
4.2 (#10)1.13 (#22)16s (#12)33s (#12)-
4.26 (#7)0.52 (#17)85s (#25)223s (#25)-
4.09 (#16)0.09 (#10)35s (#19)107s (#23)-
4.16 (#14)0.10 (#12)42s (#22)141s (#24)-
4.16 (#13)0.04 (#4)6s (#3)9s (#1)-
4.46 (#2)1.23 (#23)37s (#21)64s (#15)-
4.2 (#9)0.03 (#3)30s (#16)79s (#19)-
4.34 (#4)0.14 (#14)15s (#11)29s (#10)-
4 (#19)0.04 (#6)12s (#9)22s (#6)-
4.1 (#15)0.85 (#21)12s (#8)23s (#8)-
4.34 (#5)1.70 (#24)34s (#18)70s (#17)-
4.18 (#11)0.04 (#7)25s (#14)49s (#13)-
4.02 (#18)0.04 (#5)31s (#17)74s (#18)-
4.26 (#6)0.32 (#16)44s (#23)94s (#21)-
4 (#21)0.07 (#9)10s (#5)23s (#7)-
4 (#20)0.05 (#8)8s (#4)14s (#4)-
4.05 (#17)0.53 (#18)29s (#15)97s (#22)-
3.88 (#24)0.01 (#1)14s (#10)30s (#11)-
3.89 (#23)0.02 (#2)5s (#1)12s (#3)-
3.83 (#25)0.14 (#13)6s (#2)10s (#2)-
4.26 (#8)0.61 (#19)11s (#6)24s (#9)-
4.57 (#1)0.79 (#20)19s (#13)52s (#14)-
4.17 (#12)0.10 (#11)35s (#20)67s (#16)-
AutoBench Run 2 - April 2025 - AutoBench