Back to Archive
AutoBench Run 2 - April 2025
Second major AutoBench run with o4-mini, GPT-4.1-mini, Gemini 2.5 Pro Preview, Claude 3.7 Sonnet:thinking, etc.
Past
Date
April 25, 2025
Version
2025-04-25
Models
24
New Models
24
Run data
Model | Average (All Topics) | Coding | Creative Writing | Current News | General Culture | Grammar | History | Logics | Math | Science | Technology |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 4.57 (#1) | 4.55 (#1) | 4.51 (#1) | 4.57 (#1) | 4.59 (#1) | 4.6 (#1) | 4.61 (#1) | 4.48 (#1) | 4.57 (#1) | 4.67 (#1) | 4.61 (#1) | |
| 4.46 (#2) | 4.5 (#2) | 4.42 (#5) | 4.48 (#2) | 4.59 (#2) | 4.53 (#3) | 4.6 (#2) | 4.17 (#5) | 4.17 (#4) | 4.56 (#2) | 4.59 (#2) | |
| 4.39 (#3) | 4.48 (#3) | 4.48 (#2) | 4.32 (#5) | 4.48 (#3) | 4.4 (#5) | 4.54 (#3) | 4.18 (#4) | 4.06 (#6) | 4.45 (#3) | 4.48 (#3) | |
| 4.34 (#4) | 4.42 (#5) | 4.41 (#6) | 4.22 (#8) | 4.3 (#10) | 4.44 (#4) | 4.32 (#9) | 4.3 (#3) | 4.34 (#3) | 4.3 (#8) | 4.4 (#4) | |
| 4.34 (#5) | 4.33 (#6) | 4.47 (#3) | 4.36 (#3) | 4.43 (#4) | 4.54 (#2) | 4.45 (#4) | 4.05 (#9) | 4.07 (#5) | 4.42 (#4) | 4.36 (#6) | |
| 4.26 (#8) | 4.05 (#14) | 4.46 (#4) | 4.29 (#7) | 4.35 (#6) | 4.32 (#8) | 4.39 (#5) | 3.97 (#12) | 3.95 (#9) | 4.35 (#5) | 4.39 (#5) | |
| 4.26 (#7) | 4.17 (#11) | 4.38 (#8) | 4.33 (#4) | 4.33 (#7) | 4.36 (#6) | 4.34 (#7) | 4.06 (#8) | 3.91 (#10) | 4.31 (#7) | 4.36 (#7) | |
| 4.26 (#6) | 4.44 (#4) | 4.35 (#10) | 4.09 (#13) | 4.2 (#12) | 4.23 (#11) | 4.21 (#12) | 4.32 (#2) | 4.41 (#2) | 4.21 (#13) | 4.25 (#12) | |
| 4.2 (#9) | 4.27 (#7) | 4.41 (#7) | 4.15 (#12) | 4.31 (#9) | 4.14 (#15) | 4.34 (#8) | 3.96 (#14) | 3.87 (#13) | 4.29 (#10) | 4.3 (#9) | |
| 4.2 (#10) | 3.98 (#17) | 4.35 (#9) | 4.29 (#6) | 4.36 (#5) | 4.33 (#7) | 4.38 (#6) | 3.9 (#17) | 3.7 (#17) | 4.33 (#6) | 4.34 (#8) | |
| 4.18 (#11) | 4.1 (#13) | 4.3 (#13) | 4.2 (#9) | 4.32 (#8) | 4.27 (#9) | 4.32 (#10) | 3.99 (#11) | 3.68 (#18) | 4.3 (#9) | 4.29 (#11) | |
| 4.17 (#12) | 4.23 (#9) | 4.3 (#14) | 4.06 (#18) | 4.17 (#14) | 4.19 (#13) | 4.21 (#13) | 4.1 (#6) | 4.03 (#7) | 4.22 (#12) | 4.24 (#13) | |
| 4.16 (#13) | 4.25 (#8) | 4.33 (#11) | 4.17 (#11) | 4.17 (#13) | 4.22 (#12) | 4.18 (#14) | 4.07 (#7) | 3.97 (#8) | 4.11 (#19) | 4.13 (#17) | |
| 4.16 (#14) | 4.18 (#10) | 3.99 (#23) | 4.18 (#10) | 4.28 (#11) | 4.24 (#10) | 4.3 (#11) | 3.97 (#13) | 3.85 (#15) | 4.25 (#11) | 4.29 (#10) | |
| 4.1 (#15) | 4.12 (#12) | 4.17 (#18) | 4.08 (#14) | 4.17 (#15) | 4.16 (#14) | 4.16 (#15) | 3.92 (#16) | 3.87 (#14) | 4.19 (#14) | 4.14 (#16) | |
| 4.09 (#16) | 4.01 (#15) | 4.32 (#12) | 4.08 (#15) | 4.14 (#17) | 4.11 (#16) | 4.06 (#21) | 4.04 (#10) | 3.91 (#11) | 4.13 (#18) | 4.12 (#18) | |
| 4.05 (#17) | 3.98 (#18) | 4.19 (#17) | 4.07 (#17) | 4.08 (#21) | 4.05 (#20) | 4.09 (#20) | 3.87 (#19) | 3.88 (#12) | 4.17 (#15) | 4.18 (#15) | |
| 4.02 (#18) | 3.83 (#23) | 4.02 (#22) | 4.07 (#16) | 4.17 (#16) | 4.1 (#17) | 4.13 (#17) | 3.93 (#15) | 3.52 (#24) | 4.15 (#16) | 4.21 (#14) | |
| 4 (#20) | 3.97 (#20) | 4.2 (#16) | 4 (#22) | 4.1 (#20) | 3.97 (#22) | 4.03 (#22) | 3.82 (#22) | 3.79 (#16) | 4.07 (#22) | 4.07 (#23) | |
| 4 (#19) | 3.98 (#19) | 4.04 (#21) | 3.99 (#23) | 4.05 (#22) | 4.1 (#18) | 4.1 (#19) | 3.86 (#20) | 3.64 (#19) | 4.1 (#20) | 4.1 (#19) | |
| 4 (#21) | 3.88 (#21) | 4.04 (#20) | 4.04 (#20) | 4.1 (#19) | 4.09 (#19) | 4.11 (#18) | 3.89 (#18) | 3.53 (#23) | 4.14 (#17) | 4.09 (#20) | |
| 3.99 (#22) | 4 (#16) | 4.2 (#15) | 4.04 (#19) | 4.11 (#18) | 3.98 (#21) | 4.15 (#16) | 3.85 (#21) | 3.44 (#25) | 4.05 (#23) | 4.07 (#22) | |
| 3.89 (#23) | 3.73 (#25) | 3.86 (#24) | 3.86 (#24) | 4.04 (#24) | 3.9 (#25) | 4.02 (#23) | 3.77 (#23) | 3.56 (#21) | 4.02 (#24) | 4.05 (#24) | |
| 3.88 (#24) | 3.86 (#22) | 3.42 (#25) | 4.01 (#21) | 4.05 (#23) | 3.94 (#23) | 4.02 (#24) | 3.66 (#25) | 3.59 (#20) | 4.09 (#21) | 4.08 (#21) | |
| 3.83 (#25) | 3.81 (#24) | 4.06 (#19) | 3.78 (#25) | 3.9 (#25) | 3.91 (#24) | 3.82 (#25) | 3.74 (#24) | 3.56 (#22) | 3.86 (#25) | 3.86 (#25) |