Back to Archive

AutoBench Run 5 - December 2025

Latest AutoBench run with models Gpt 5.2, Claude Opus 4.5, Gemini 3 Flash and more

Latest
Date
December 19, 2025
Version
2025-12-19
Models
38
New Models
3

Run data

Model
AutoBenchChatbot Ar.AAI IndexMMLU Index
4.48 (#1)-73 (#2)0.87 (#6)
4.41 (#3)1492 (#1)73 (#1)0.9 (#2)
4.3 (#7)-71 (#3)0.89 (#3)
4.39 (#4)1470 (#3)70 (#4)0.9 (#1)
4.38 (#5)1457 (#4)70 (#5)0.87 (#5)
4.32 (#6)1429 (#7)67 (#6)0.85 (#12)
4.2 (#12)1478 (#2)65 (#7)0.87 (#7)
4.29 (#10)1392 (#18)64 (#9)0.84 (#15)
4.21 (#11)-64 (#8)0.85 (#11)
4.3 (#8)1450 (#6)63 (#10)0.88 (#4)
3.99 (#26)1345 (#24)61 (#11)0.82 (#21)
4.18 (#14)1352 (#23)61 (#12)0.81 (#24)
4.29 (#9)1451 (#5)60 (#13)0.86 (#9)
4.14 (#17)1418 (#9)59 (#14)0.86 (#8)
4.2 (#13)1397 (#16)57 (#15)0.84 (#16)
4.13 (#18)1425 (#8)56 (#16)0.83 (#17)
4.17 (#16)1402 (#15)55 (#17)0.76 (#30)
4.03 (#24)1367 (#22)54 (#18)0.82 (#22)
4.11 (#20)1414 (#12)52 (#20)0.84 (#13)
3.78 (#35)1318 (#28)52 (#22)0.75 (#31)
4.03 (#25)-52 (#21)0.79 (#28)
4.12 (#19)1395 (#17)52 (#19)0.85 (#10)
4.17 (#15)1408 (#14)51 (#23)0.84 (#14)
4.06 (#22)1339 (#26)51 (#24)0.77 (#29)
4.11 (#21)1416 (#10)50 (#25)0.82 (#20)
3.86 (#31)1370 (#21)49 (#26)0.82 (#19)
4.06 (#23)1334 (#27)47 (#27)0.81 (#27)
3.98 (#27)1374 (#20)45 (#28)0.83 (#18)
3.78 (#34)1340 (#25)45 (#29)0.81 (#25)
3.95 (#28)1378 (#19)40 (#30)0.81 (#23)
3.94 (#29)1415 (#11)38 (#31)0.81 (#26)
3.88 (#30)-38 (#32)0.74 (#32)
3.5 (#37)-37 (#33)0.74 (#33)
3.81 (#33)1411 (#13)35 (#34)0.68 (#35)
3.47 (#38)-32 (#35)0.73 (#34)
3.57 (#36)-28 (#36)0.64 (#36)
4.43 (#2)---
3.85 (#32)---
AutoBench Run 5 - December 2025 - AutoBench