Back to Archive
AutoBench Run 4 - November 2025
Latest AutoBench run with models Gemini 3 Pro, Gpt 5.1, Grok 4.1 and more
Past
Date
November 28, 2025
Version
2025-11-28
Models
32
New Models
18
Run data
Model | AutoBench | Chatbot Ar. | AAI Index | MMLU Index |
|---|---|---|---|---|
| 4.39 (#3) | 1495 (#1) | 73 (#1) | 0.9 (#1) | |
| 4.34 (#6) | 1481 (#2) | 64 (#5) | 0.85 (#9) | |
| 4.17 (#20) | 1462 (#3) | - | - | |
| 4.49 (#1) | 1454 (#4) | 70 (#2) | 0.87 (#5) | |
| 4.37 (#4) | 1451 (#5) | 60 (#8) | 0.86 (#6) | |
| 4.27 (#12) | 1449 (#7) | 59 (#9) | 0.88 (#2) | |
| 4.31 (#9) | 1449 (#6) | 63 (#6) | 0.88 (#3) | |
| 4.45 (#2) | 1437 (#8) | 68 (#3) | 0.87 (#4) | |
| 4.34 (#7) | 1429 (#9) | 67 (#4) | 0.85 (#10) | |
| 4.25 (#14) | 1426 (#10) | 56 (#13) | 0.83 (#13) | |
| 4.16 (#21) | 1421 (#11) | 57 (#10) | 0.85 (#8) | |
| 4.21 (#18) | 1416 (#12) | 50 (#17) | 0.82 (#16) | |
| 4.08 (#23) | 1410 (#13) | 57 (#12) | 0.83 (#14) | |
| 4.3 (#10) | 1405 (#14) | 54 (#15) | 0.84 (#11) | |
| 4.27 (#13) | 1402 (#15) | 55 (#14) | 0.76 (#24) | |
| 4.28 (#11) | 1397 (#16) | 57 (#11) | 0.84 (#12) | |
| 4.23 (#16) | 1395 (#17) | 52 (#16) | 0.85 (#7) | |
| 4.21 (#19) | 1382 (#18) | 37 (#22) | 0.78 (#22) | |
| 4.22 (#17) | 1380 (#19) | 48 (#19) | 0.81 (#18) | |
| 4.24 (#15) | 1374 (#20) | 45 (#20) | 0.83 (#15) | |
| 3.7 (#26) | 1364 (#21) | 22 (#31) | 0.67 (#31) | |
| 3.81 (#25) | 1354 (#22) | 29 (#27) | 0.68 (#30) | |
| 4.37 (#5) | 1352 (#23) | 61 (#7) | 0.81 (#19) | |
| 4.1 (#22) | 1340 (#24) | 45 (#21) | 0.81 (#20) | |
| 4.32 (#8) | 1338 (#25) | 49 (#18) | 0.77 (#23) | |
| 3.61 (#28) | 1327 (#26) | 36 (#24) | 0.81 (#21) | |
| 3.55 (#30) | 1319 (#27) | 28 (#28) | 0.71 (#27) | |
| 3.95 (#24) | 1305 (#28) | 33 (#25) | 0.82 (#17) | |
| 3.35 (#32) | 1288 (#29) | 32 (#26) | 0.69 (#29) | |
| 3.46 (#31) | 1255 (#30) | 23 (#30) | 0.71 (#28) | |
| 3.6 (#29) | - | 37 (#23) | 0.74 (#25) | |
| 3.7 (#27) | - | 25 (#29) | 0.73 (#26) |