Back to Archive
AutoBench Run 5 - December 2025
Latest AutoBench run with models Gpt 5.2, Claude Opus 4.5, Gemini 3 Flash and more
Latest
Date
December 19, 2025
Version
2025-12-19
Models
38
New Models
3
Run data
Model | AutoBench | Chatbot Ar. | AAI Index | MMLU Index |
|---|---|---|---|---|
| 4.41 (#3) | 1492 (#1) | 73 (#1) | 0.9 (#2) | |
| 4.2 (#12) | 1478 (#2) | 65 (#7) | 0.87 (#7) | |
| 4.39 (#4) | 1470 (#3) | 70 (#4) | 0.9 (#1) | |
| 4.38 (#5) | 1457 (#4) | 70 (#5) | 0.87 (#5) | |
| 4.29 (#9) | 1451 (#5) | 60 (#13) | 0.86 (#9) | |
| 4.3 (#8) | 1450 (#6) | 63 (#10) | 0.88 (#4) | |
| 4.32 (#6) | 1429 (#7) | 67 (#6) | 0.85 (#12) | |
| 4.13 (#18) | 1425 (#8) | 56 (#16) | 0.83 (#17) | |
| 4.14 (#17) | 1418 (#9) | 59 (#14) | 0.86 (#8) | |
| 4.11 (#21) | 1416 (#10) | 50 (#25) | 0.82 (#20) | |
| 3.94 (#29) | 1415 (#11) | 38 (#31) | 0.81 (#26) | |
| 4.11 (#20) | 1414 (#12) | 52 (#20) | 0.84 (#13) | |
| 3.81 (#33) | 1411 (#13) | 35 (#34) | 0.68 (#35) | |
| 4.17 (#15) | 1408 (#14) | 51 (#23) | 0.84 (#14) | |
| 4.17 (#16) | 1402 (#15) | 55 (#17) | 0.76 (#30) | |
| 4.2 (#13) | 1397 (#16) | 57 (#15) | 0.84 (#16) | |
| 4.12 (#19) | 1395 (#17) | 52 (#19) | 0.85 (#10) | |
| 4.29 (#10) | 1392 (#18) | 64 (#9) | 0.84 (#15) | |
| 3.95 (#28) | 1378 (#19) | 40 (#30) | 0.81 (#23) | |
| 3.98 (#27) | 1374 (#20) | 45 (#28) | 0.83 (#18) | |
| 3.86 (#31) | 1370 (#21) | 49 (#26) | 0.82 (#19) | |
| 4.03 (#24) | 1367 (#22) | 54 (#18) | 0.82 (#22) | |
| 4.18 (#14) | 1352 (#23) | 61 (#12) | 0.81 (#24) | |
| 3.99 (#26) | 1345 (#24) | 61 (#11) | 0.82 (#21) | |
| 3.78 (#34) | 1340 (#25) | 45 (#29) | 0.81 (#25) | |
| 4.06 (#22) | 1339 (#26) | 51 (#24) | 0.77 (#29) | |
| 4.06 (#23) | 1334 (#27) | 47 (#27) | 0.81 (#27) | |
| 3.78 (#35) | 1318 (#28) | 52 (#22) | 0.75 (#31) | |
| 4.3 (#7) | - | 71 (#3) | 0.89 (#3) | |
| 4.43 (#2) | - | - | - | |
| 4.48 (#1) | - | 73 (#2) | 0.87 (#6) | |
| 3.57 (#36) | - | 28 (#36) | 0.64 (#36) | |
| 4.03 (#25) | - | 52 (#21) | 0.79 (#28) | |
| 3.85 (#32) | - | - | - | |
| 3.88 (#30) | - | 38 (#32) | 0.74 (#32) | |
| 4.21 (#11) | - | 64 (#8) | 0.85 (#11) | |
| 3.5 (#37) | - | 37 (#33) | 0.74 (#33) | |
| 3.47 (#38) | - | 32 (#35) | 0.73 (#34) |