Back to Archive
AutoBench Run 2 - April 2025
Second major AutoBench run with o4-mini, GPT-4.1-mini, Gemini 2.5 Pro Preview, Claude 3.7 Sonnet:thinking, etc.
Past
Date
April 25, 2025
Version
2025-04-25
Models
24
New Models
24
Run data
Model | AutoBench | LMArena | AAI Index | MMLU-Pro |
|---|---|---|---|---|
| 3.83 (#25) | 1245 (#17) | 37080 (#18) | 0.691 (#16) | |
| 3.88 (#24) | 1217 (#19) | 35280 (#20) | 0.652 (#19) | |
| 3.89 (#23) | 1217 (#20) | 32530 (#22) | 0.59 (#22) | |
| 3.99 (#22) | 1237 (#18) | 34740 (#21) | 0.634 (#21) | |
| 4 (#19) | 1272 (#12) | 35680 (#19) | 0.648 (#20) | |
| 4 (#20) | 1271 (#13) | 50530 (#8) | 0.809 (#5) | |
| 4 (#21) | - | 42990 (#12) | 0.752 (#12) | |
| 4.02 (#18) | 1257 (#15) | 41110 (#13) | 0.713 (#13) | |
| 4.05 (#17) | 1249 (#16) | 38270 (#15) | 0.697 (#15) | |
| 4.09 (#16) | 1318 (#7) | 45580 (#11) | 0.752 (#11) | |
| 4.1 (#15) | 1288 (#11) | 39230 (#14) | 0.709 (#14) | |
| 4.16 (#13) | 1372 (#3) | 53240 (#5) | 0.819 (#4) | |
| 4.16 (#14) | 1356 (#5) | 48090 (#10) | 0.779 (#10) | |
| 4.17 (#12) | 1310 (#8) | - | - | |
| 4.18 (#11) | 1269 (#14) | 37280 (#17) | - | |
| 4.2 (#10) | 1293 (#10) | 48150 (#9) | 0.803 (#6) | |
| 4.2 (#9) | 1342 (#6) | 37620 (#16) | 0.669 (#18) | |
| 4.26 (#6) | 1358 (#4) | 60220 (#4) | 0.844 (#2) | |
| 4.26 (#8) | - | - | 0.69 (#17) | |
| 4.26 (#7) | 1305 (#9) | 62860 (#3) | 0.791 (#8) | |
| 4.34 (#5) | - | 52860 (#6) | 0.781 (#9) | |
| 4.34 (#4) | 1402 (#2) | 50630 (#7) | 0.799 (#7) | |
| 4.39 (#3) | 1293 (#10) | 48150 (#9) | 0.803 (#6) | |
| 4.46 (#2) | 1439 (#1) | 67840 (#2) | 0.858 (#1) | |
| 4.57 (#1) | - | 69830 (#1) | 0.832 (#3) |