Back to Archive

AutoBench Run 4 - November 2025

Latest AutoBench run with models Gemini 3 Pro, Gpt 5.1, Grok 4.1 and more

Past
Date
November 28, 2025
Version
2025-11-28
Models
32
New Models
18

Run data

Model
AutoBenchChatbot Ar.AAI IndexMMLU Index
4.17 (#20)1462 (#3)--
3.7 (#26)1364 (#21)22 (#31)0.67 (#31)
3.46 (#31)1255 (#30)23 (#30)0.71 (#28)
3.7 (#27)-25 (#29)0.73 (#26)
3.55 (#30)1319 (#27)28 (#28)0.71 (#27)
3.81 (#25)1354 (#22)29 (#27)0.68 (#30)
3.35 (#32)1288 (#29)32 (#26)0.69 (#29)
3.95 (#24)1305 (#28)33 (#25)0.82 (#17)
3.61 (#28)1327 (#26)36 (#24)0.81 (#21)
4.21 (#19)1382 (#18)37 (#22)0.78 (#22)
3.6 (#29)-37 (#23)0.74 (#25)
4.24 (#15)1374 (#20)45 (#20)0.83 (#15)
4.1 (#22)1340 (#24)45 (#21)0.81 (#20)
4.22 (#17)1380 (#19)48 (#19)0.81 (#18)
4.32 (#8)1338 (#25)49 (#18)0.77 (#23)
4.21 (#18)1416 (#12)50 (#17)0.82 (#16)
4.23 (#16)1395 (#17)52 (#16)0.85 (#7)
4.3 (#10)1405 (#14)54 (#15)0.84 (#11)
4.27 (#13)1402 (#15)55 (#14)0.76 (#24)
4.25 (#14)1426 (#10)56 (#13)0.83 (#13)
4.28 (#11)1397 (#16)57 (#11)0.84 (#12)
4.16 (#21)1421 (#11)57 (#10)0.85 (#8)
4.08 (#23)1410 (#13)57 (#12)0.83 (#14)
4.27 (#12)1449 (#7)59 (#9)0.88 (#2)
4.37 (#4)1451 (#5)60 (#8)0.86 (#6)
4.37 (#5)1352 (#23)61 (#7)0.81 (#19)
4.31 (#9)1449 (#6)63 (#6)0.88 (#3)
4.34 (#6)1481 (#2)64 (#5)0.85 (#9)
4.34 (#7)1429 (#9)67 (#4)0.85 (#10)
4.45 (#2)1437 (#8)68 (#3)0.87 (#4)
4.49 (#1)1454 (#4)70 (#2)0.87 (#5)
4.39 (#3)1495 (#1)73 (#1)0.9 (#1)
AutoBench Run 4 - November 2025 - AutoBench