Back to Archive
AutoBench Run 5 - December 2025
Latest AutoBench run with models Gpt 5.2, Claude Opus 4.5, Gemini 3 Flash and more
Latest
Date
December 19, 2025
Version
2025-12-19
Models
38
New Models
3
Run data
Model | Average (All Topics) | Coding | Creative Writing | Current News | General Culture | Grammar | History | Logics | Math | Science | Technology |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 4.41 (#3) | 4.23 (#5) | 4.35 (#7) | 4.51 (#4) | 4.7 (#1) | 4.2 (#15) | 4.56 (#3) | 4.12 (#5) | 4.3 (#1) | 4.51 (#3) | 4.48 (#11) | |
| 4.48 (#1) | 4.37 (#1) | 4.49 (#2) | 4.56 (#1) | 4.69 (#2) | 4.52 (#1) | 4.59 (#2) | 4.32 (#1) | 4.29 (#2) | 4.39 (#10) | 4.55 (#3) | |
| 4.38 (#5) | 4.19 (#6) | 4.34 (#8) | 4.46 (#6) | 4.68 (#3) | 4.5 (#2) | 4.44 (#12) | 4.08 (#6) | 4.13 (#6) | 4.49 (#4) | 4.49 (#10) | |
| 4.39 (#4) | 4.33 (#2) | 4.37 (#5) | 4.17 (#24) | 4.65 (#4) | 4.48 (#3) | 4.5 (#7) | 4.14 (#4) | 4.21 (#5) | 4.57 (#1) | 4.52 (#6) | |
| 4.11 (#21) | 3.61 (#28) | 4.26 (#9) | 4.45 (#9) | 4.65 (#5) | 4.07 (#23) | 4.51 (#6) | 3.4 (#27) | 3.4 (#30) | 4.28 (#18) | 4.44 (#14) | |
| 4.32 (#6) | 4.03 (#11) | 4.37 (#6) | 4.46 (#7) | 4.6 (#6) | 4.28 (#11) | 4.54 (#4) | 3.81 (#14) | 4.13 (#7) | 4.23 (#21) | 4.56 (#2) | |
| 4.3 (#7) | 3.91 (#18) | 4.25 (#10) | 4.42 (#11) | 4.58 (#7) | 4.34 (#7) | 4.54 (#5) | 4.02 (#8) | 4.06 (#8) | 4.44 (#9) | 4.45 (#13) | |
| 4.3 (#8) | 4.12 (#8) | 4.5 (#1) | 4.48 (#5) | 4.57 (#8) | 4.44 (#4) | 4.42 (#13) | 3.57 (#21) | 3.87 (#17) | 4.48 (#5) | 4.55 (#4) | |
| 4.2 (#13) | 3.7 (#26) | 4.22 (#11) | 4.19 (#22) | 4.57 (#9) | 4.24 (#12) | 4.44 (#11) | 3.88 (#10) | 3.78 (#22) | 4.26 (#19) | 4.46 (#12) | |
| 4.17 (#16) | 3.9 (#19) | 4.18 (#14) | 4.42 (#10) | 4.52 (#10) | 4.3 (#9) | 4.48 (#8) | 3.74 (#15) | 3.53 (#27) | 4.16 (#27) | 4.54 (#5) | |
| 3.88 (#30) | 3.27 (#34) | 4.14 (#17) | 4.15 (#25) | 4.49 (#11) | 4.04 (#24) | 4.35 (#20) | 3.17 (#33) | 2.84 (#37) | 4.16 (#26) | 4.22 (#32) | |
| 4.29 (#10) | 4.18 (#7) | 4.39 (#4) | 4.45 (#8) | 4.48 (#12) | 4.29 (#10) | 4.29 (#24) | 3.9 (#9) | 4.05 (#9) | 4.32 (#12) | 4.52 (#7) | |
| 4.13 (#18) | 3.95 (#15) | 4.15 (#15) | 4.11 (#30) | 4.47 (#13) | 4.11 (#19) | 4.29 (#25) | 3.45 (#25) | 4.01 (#11) | 4.29 (#15) | 4.33 (#22) | |
| 3.94 (#29) | 3.54 (#29) | 3.89 (#25) | 4.13 (#28) | 4.45 (#14) | 4.11 (#21) | 4.39 (#15) | 3.41 (#26) | 3.11 (#32) | 4.01 (#32) | 4.3 (#24) | |
| 4.11 (#20) | 3.74 (#22) | 4 (#22) | 4.51 (#3) | 4.44 (#15) | 4.14 (#17) | 4.32 (#23) | 3.48 (#24) | 3.72 (#23) | 4.2 (#23) | 4.39 (#17) | |
| 4.12 (#19) | 3.71 (#24) | 4.07 (#20) | 4.37 (#12) | 4.44 (#16) | 4.11 (#20) | 4.45 (#10) | 3.55 (#23) | 3.65 (#25) | 4.31 (#13) | 4.43 (#16) | |
| 4.29 (#9) | 4.02 (#12) | 4.15 (#16) | 4.37 (#13) | 4.43 (#17) | 4.37 (#5) | 4.4 (#14) | 4.02 (#7) | 4.24 (#4) | 4.45 (#8) | 4.38 (#19) | |
| 4.21 (#11) | 3.88 (#20) | 4.2 (#13) | 4.33 (#15) | 4.43 (#18) | 4.18 (#16) | 4.38 (#16) | 3.71 (#17) | 3.89 (#15) | 4.48 (#6) | 4.39 (#18) | |
| 3.86 (#31) | 3.4 (#31) | 3.63 (#35) | 4.04 (#35) | 4.4 (#19) | 3.61 (#33) | 4.16 (#33) | 3.69 (#18) | 3.47 (#29) | 3.9 (#34) | 4.25 (#29) | |
| 4.03 (#24) | 3.7 (#25) | 3.74 (#31) | 4.05 (#34) | 4.4 (#20) | 4.13 (#18) | 4.22 (#29) | 3.73 (#16) | 3.79 (#21) | 4.24 (#20) | 4.2 (#33) | |
| 4.43 (#2) | 4.3 (#3) | 4.44 (#3) | 4.54 (#2) | 4.4 (#21) | 4.37 (#6) | 4.6 (#1) | 4.18 (#2) | 4.26 (#3) | 4.55 (#2) | 4.59 (#1) | |
| 3.78 (#34) | 3.29 (#33) | 3.85 (#28) | 4.13 (#29) | 4.35 (#22) | 3.62 (#32) | 4.26 (#28) | 3.11 (#34) | 3.1 (#33) | 3.9 (#35) | 4.14 (#34) | |
| 3.98 (#27) | 3.65 (#27) | 3.84 (#29) | 4.05 (#33) | 4.32 (#23) | 3.96 (#28) | 4.26 (#27) | 3.27 (#29) | 3.7 (#24) | 4.29 (#16) | 4.27 (#27) | |
| 3.99 (#26) | 3.46 (#30) | 3.65 (#33) | 4.33 (#16) | 4.3 (#24) | 4.01 (#27) | 4.2 (#30) | 3.34 (#28) | 3.94 (#13) | 4.14 (#28) | 4.29 (#26) | |
| 4.03 (#25) | 3.88 (#21) | 3.63 (#34) | 4.14 (#27) | 4.29 (#25) | 3.8 (#30) | 4.18 (#31) | 3.68 (#19) | 3.81 (#19) | 4.19 (#24) | 4.49 (#9) | |
| 4.14 (#17) | 4.1 (#10) | 3.72 (#32) | 4.17 (#23) | 4.29 (#26) | 4.24 (#13) | 4.38 (#18) | 3.56 (#22) | 3.96 (#12) | 4.34 (#11) | 4.31 (#23) | |
| 4.06 (#22) | 3.98 (#14) | 3.85 (#27) | 4.28 (#17) | 4.29 (#27) | 3.91 (#29) | 4.18 (#32) | 3.61 (#20) | 3.87 (#16) | 4.23 (#22) | 4.23 (#31) | |
| 3.85 (#32) | 3.31 (#32) | 3.93 (#24) | 4.15 (#26) | 4.29 (#28) | 3.53 (#34) | 4.16 (#34) | 3.26 (#31) | 3.48 (#28) | 4.01 (#31) | 4.26 (#28) | |
| 4.2 (#12) | 4.11 (#9) | 4.2 (#12) | 4.23 (#19) | 4.26 (#29) | 4.2 (#14) | 4.34 (#22) | 4.18 (#3) | 3.81 (#20) | 4.3 (#14) | 4.33 (#21) | |
| 4.06 (#23) | 3.74 (#23) | 3.88 (#26) | 4.25 (#18) | 4.26 (#30) | 4.08 (#22) | 4.38 (#17) | 3.04 (#35) | 3.64 (#26) | 4.28 (#17) | 4.34 (#20) | |
| 4.17 (#15) | 4.01 (#13) | 4.05 (#21) | 4.37 (#14) | 4.21 (#31) | 4.31 (#8) | 4.29 (#26) | 3.83 (#12) | 4.02 (#10) | 4.12 (#29) | 4.44 (#15) | |
| 4.18 (#14) | 4.26 (#4) | 4.13 (#18) | 4.05 (#32) | 4.21 (#32) | 4.04 (#25) | 4.34 (#21) | 3.81 (#13) | 3.89 (#14) | 4.45 (#7) | 4.51 (#8) | |
| 3.57 (#36) | 2.95 (#37) | 3.57 (#36) | 4.1 (#31) | 4.16 (#33) | 3.24 (#37) | 3.97 (#36) | 2.94 (#36) | 2.78 (#38) | 3.78 (#38) | 4.1 (#35) | |
| 3.95 (#28) | 3.92 (#17) | 4.09 (#19) | 4.2 (#21) | 4.15 (#34) | 3.78 (#31) | 4.37 (#19) | 3.27 (#30) | 3.33 (#31) | 4.05 (#30) | 4.29 (#25) | |
| 3.81 (#33) | 3.02 (#36) | 3.94 (#23) | 4.21 (#20) | 4.07 (#35) | 4.03 (#26) | 4.47 (#9) | 3.2 (#32) | 2.89 (#35) | 3.98 (#33) | 4.25 (#30) | |
| 3.5 (#37) | 3.07 (#35) | 3 (#38) | 3.92 (#36) | 4.04 (#36) | 2.97 (#38) | 3.92 (#37) | 2.86 (#37) | 3.07 (#34) | 3.78 (#37) | 4.01 (#37) | |
| 3.47 (#38) | 2.84 (#38) | 3.55 (#37) | 3.73 (#37) | 3.96 (#37) | 3.34 (#36) | 3.98 (#35) | 2.55 (#38) | 2.85 (#36) | 3.82 (#36) | 3.98 (#38) | |
| 3.78 (#35) | 3.93 (#16) | 3.82 (#30) | 3.41 (#38) | 3.78 (#38) | 3.52 (#35) | 3.4 (#38) | 3.87 (#11) | 3.82 (#18) | 4.18 (#25) | 4.07 (#36) |