Back to Archive
AutoBench Run 2 - April 2025
Second major AutoBench run with o4-mini, GPT-4.1-mini, Gemini 2.5 Pro Preview, Claude 3.7 Sonnet:thinking, etc.
Past
Date
April 25, 2025
Version
2025-04-25
Models
24
New Models
24
Run data
Model | Average (All Topics) | Coding | Creative Writing | Current News | General Culture | Grammar | History | Logics | Math | Science | Technology |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.01 (#1) | 0.02 (#1) | 0.01 (#1) | 0.01 (#1) | 0.01 (#1) | 0.01 (#1) | 0.01 (#1) | 0.01 (#1) | 0.02 (#1) | 0.01 (#1) | 0.01 (#1) | |
| 0.02 (#2) | 0.02 (#2) | 0.01 (#2) | 0.01 (#2) | 0.01 (#2) | 0.01 (#2) | 0.01 (#2) | 0.02 (#2) | 0.03 (#2) | 0.01 (#2) | 0.01 (#2) | |
| 0.03 (#3) | 0.04 (#3) | 0.02 (#4) | 0.02 (#3) | 0.02 (#3) | 0.02 (#3) | 0.02 (#3) | 0.02 (#3) | 0.04 (#3) | 0.02 (#3) | 0.02 (#3) | |
| 0.04 (#4) | 0.08 (#7) | 0.02 (#3) | 0.03 (#4) | 0.03 (#4) | 0.03 (#4) | 0.03 (#4) | 0.04 (#4) | 0.06 (#5) | 0.03 (#5) | 0.03 (#4) | |
| 0.04 (#5) | 0.05 (#4) | 0.02 (#5) | 0.03 (#6) | 0.03 (#6) | 0.03 (#6) | 0.03 (#6) | 0.04 (#5) | 0.06 (#4) | 0.03 (#6) | 0.03 (#5) | |
| 0.04 (#6) | 0.05 (#5) | 0.03 (#6) | 0.03 (#7) | 0.03 (#7) | 0.03 (#7) | 0.04 (#7) | 0.04 (#7) | 0.06 (#6) | 0.03 (#7) | 0.03 (#6) | |
| 0.04 (#7) | 0.06 (#6) | 0.04 (#8) | 0.03 (#5) | 0.03 (#5) | 0.03 (#5) | 0.03 (#5) | 0.04 (#6) | 0.06 (#7) | 0.03 (#4) | 0.03 (#7) | |
| 0.05 (#8) | 0.08 (#8) | 0.03 (#7) | 0.04 (#8) | 0.04 (#8) | 0.04 (#8) | 0.04 (#8) | 0.05 (#8) | 0.07 (#8) | 0.04 (#8) | 0.04 (#8) | |
| 0.07 (#9) | 0.11 (#9) | 0.04 (#9) | 0.05 (#9) | 0.05 (#9) | 0.05 (#9) | 0.05 (#9) | 0.09 (#9) | 0.11 (#9) | 0.05 (#9) | 0.05 (#9) | |
| 0.09 (#10) | 0.14 (#10) | 0.09 (#11) | 0.08 (#12) | 0.08 (#12) | 0.07 (#10) | 0.07 (#11) | 0.11 (#11) | 0.15 (#10) | 0.07 (#12) | 0.08 (#10) | |
| 0.09 (#11) | 0.15 (#11) | 0.11 (#13) | 0.07 (#10) | 0.07 (#11) | 0.07 (#11) | 0.07 (#12) | 0.10 (#10) | 0.15 (#11) | 0.07 (#10) | 0.08 (#11) | |
| 0.10 (#12) | 0.16 (#12) | 0.06 (#10) | 0.08 (#11) | 0.07 (#10) | 0.07 (#12) | 0.07 (#10) | 0.15 (#13) | 0.17 (#12) | 0.07 (#11) | 0.12 (#14) | |
| 0.14 (#13) | 0.25 (#13) | 0.15 (#14) | 0.11 (#14) | 0.11 (#14) | 0.11 (#13) | 0.10 (#14) | 0.14 (#12) | 0.21 (#14) | 0.10 (#14) | 0.10 (#12) | |
| 0.15 (#14) | 0.25 (#14) | 0.09 (#12) | 0.10 (#13) | 0.09 (#13) | 0.12 (#14) | 0.10 (#13) | 0.19 (#15) | 0.31 (#15) | 0.09 (#13) | 0.11 (#13) | |
| 0.18 (#15) | 0.33 (#15) | 0.15 (#15) | 0.16 (#15) | 0.16 (#15) | 0.15 (#15) | 0.18 (#16) | 0.17 (#14) | 0.21 (#13) | 0.15 (#15) | 0.16 (#16) | |
| 0.32 (#16) | 0.51 (#16) | 0.17 (#16) | 0.17 (#16) | 0.17 (#16) | 0.22 (#16) | 0.15 (#15) | 0.60 (#17) | 0.85 (#17) | 0.17 (#16) | 0.15 (#15) | |
| 0.52 (#17) | 0.82 (#17) | 0.32 (#17) | 0.29 (#17) | 0.28 (#17) | 0.37 (#17) | 0.28 (#17) | 0.86 (#20) | 1.36 (#21) | 0.29 (#17) | 0.28 (#17) | |
| 0.52 (#18) | 0.83 (#18) | 0.46 (#19) | 0.47 (#19) | 0.46 (#19) | 0.42 (#19) | 0.46 (#19) | 0.47 (#16) | 0.72 (#16) | 0.47 (#19) | 0.50 (#19) | |
| 0.61 (#19) | 0.93 (#19) | 0.43 (#18) | 0.35 (#18) | 0.38 (#18) | 0.41 (#18) | 0.38 (#18) | 0.96 (#22) | 1.56 (#22) | 0.37 (#18) | 0.36 (#18) | |
| 0.79 (#20) | 1.20 (#20) | 0.60 (#21) | 0.63 (#20) | 0.63 (#20) | 0.80 (#21) | 0.61 (#20) | 0.99 (#23) | 1.30 (#19) | 0.61 (#20) | 0.56 (#20) | |
| 0.85 (#21) | 1.30 (#21) | 0.55 (#20) | 0.66 (#21) | 0.66 (#21) | 0.65 (#20) | 0.69 (#21) | 0.95 (#21) | 1.72 (#23) | 0.61 (#21) | 0.67 (#21) | |
| 1.13 (#22) | 2.26 (#22) | 0.88 (#23) | 0.90 (#22) | 1.13 (#22) | 0.85 (#22) | 1.14 (#22) | 0.84 (#19) | 1.34 (#20) | 1.00 (#22) | 1.00 (#22) | |
| 1.23 (#23) | 2.95 (#24) | 0.64 (#22) | 0.98 (#23) | 1.22 (#23) | 1.21 (#23) | 1.16 (#23) | 0.78 (#18) | 1.08 (#18) | 1.13 (#23) | 1.11 (#23) | |
| 1.69 (#24) | 2.64 (#23) | 1.03 (#24) | 1.24 (#24) | 1.30 (#24) | 1.41 (#24) | 1.32 (#24) | 2.14 (#24) | 2.83 (#24) | 1.22 (#24) | 1.83 (#24) | |
| 4.32 (#25) | 7.97 (#25) | 2.55 (#25) | 2.26 (#25) | 2.74 (#25) | 2.58 (#25) | 2.94 (#25) | 6.54 (#25) | 10.23 (#25) | 2.79 (#25) | 2.59 (#25) |