Back to Archive

AutoBench Run 5 - December 2025

Latest AutoBench run with models Gpt 5.2, Claude Opus 4.5, Gemini 3 Flash and more

Latest
Date
December 19, 2025
Version
2025-12-19
Models
38
New Models
3

Run data

Model
Average (All Topics)CodingCreative WritingCurrent NewsGeneral CultureGrammarHistoryLogicsMathScienceTechnology
81.88 (#38)106.99 (#38)59.09 (#38)69.36 (#38)54.75 (#38)80.45 (#38)59.67 (#38)150.09 (#38)125.46 (#38)57.32 (#38)63.75 (#38)
17.26 (#37)29.23 (#37)14.10 (#37)12.83 (#37)13.55 (#37)11.87 (#37)13.53 (#37)25.80 (#37)22.58 (#37)13.63 (#37)15.80 (#37)
8.12 (#34)13.01 (#33)3.17 (#30)4.24 (#32)4.01 (#32)6.73 (#34)4.29 (#33)19.90 (#36)18.17 (#36)5.78 (#34)5.27 (#33)
11.39 (#36)14.97 (#36)7.72 (#36)10.38 (#36)9.33 (#35)7.58 (#35)8.19 (#35)19.31 (#35)17.32 (#35)8.59 (#36)10.12 (#35)
6.85 (#32)9.64 (#32)4.15 (#33)4.07 (#31)4.04 (#33)5.29 (#32)3.84 (#31)16.87 (#34)12.02 (#34)5.09 (#31)4.23 (#31)
6.48 (#31)8.47 (#31)3.17 (#29)4.51 (#33)4.22 (#34)4.99 (#31)4.27 (#32)13.24 (#31)11.27 (#33)5.59 (#33)4.71 (#32)
10.80 (#35)13.88 (#35)7.68 (#35)9.72 (#35)10.77 (#36)9.56 (#36)9.24 (#36)16.25 (#33)11.15 (#32)8.27 (#35)12.52 (#36)
7.36 (#33)13.04 (#34)4.91 (#34)5.83 (#34)3.99 (#31)6.30 (#33)4.54 (#34)14.88 (#32)10.13 (#31)5.42 (#32)5.41 (#34)
3.79 (#29)4.80 (#29)3.24 (#31)3.11 (#29)2.64 (#29)3.07 (#29)2.66 (#29)6.25 (#29)5.93 (#30)2.86 (#29)3.35 (#30)
3.94 (#30)7.55 (#30)4.10 (#32)3.17 (#30)3.10 (#30)3.59 (#30)2.76 (#30)4.36 (#28)4.76 (#29)3.77 (#30)3.26 (#29)
1.95 (#27)2.44 (#27)1.36 (#27)0.74 (#23)0.73 (#24)0.99 (#24)0.71 (#23)6.85 (#30)4.26 (#28)0.92 (#24)0.75 (#23)
1.86 (#26)2.40 (#26)1.38 (#28)1.22 (#26)1.03 (#25)1.51 (#27)1.05 (#25)3.77 (#26)3.80 (#27)1.29 (#27)1.32 (#27)
2.12 (#28)3.80 (#28)1.18 (#26)1.36 (#28)1.22 (#28)1.83 (#28)1.22 (#28)3.86 (#27)3.68 (#26)1.54 (#28)1.50 (#28)
0.99 (#23)1.88 (#25)0.30 (#19)0.48 (#20)0.43 (#20)0.50 (#21)0.40 (#19)2.02 (#24)2.31 (#25)0.67 (#21)0.71 (#22)
1.25 (#24)1.40 (#23)0.46 (#20)0.89 (#25)1.06 (#26)1.13 (#25)1.13 (#26)2.19 (#25)2.27 (#24)1.01 (#25)0.84 (#24)
1.30 (#25)1.68 (#24)0.76 (#25)1.24 (#27)1.21 (#27)1.24 (#26)1.14 (#27)1.49 (#22)1.55 (#23)1.24 (#26)1.25 (#26)
0.71 (#20)1.21 (#22)0.55 (#21)0.32 (#17)0.29 (#16)0.46 (#19)0.29 (#16)1.79 (#23)1.39 (#22)0.53 (#19)0.35 (#18)
0.54 (#19)0.82 (#19)0.13 (#14)0.32 (#15)0.34 (#18)0.38 (#16)0.33 (#17)1.02 (#18)1.27 (#21)0.32 (#16)0.34 (#16)
0.75 (#21)1.00 (#20)0.61 (#22)0.55 (#21)0.52 (#21)0.77 (#22)0.49 (#21)1.23 (#20)1.06 (#20)0.71 (#22)0.59 (#21)
0.91 (#22)1.10 (#21)0.67 (#23)0.81 (#24)0.69 (#23)0.86 (#23)0.95 (#24)1.38 (#21)0.99 (#19)0.85 (#23)0.86 (#25)
0.32 (#13)0.55 (#17)0.11 (#11)0.20 (#12)0.20 (#13)0.22 (#14)0.21 (#13)0.76 (#17)0.69 (#18)0.22 (#13)0.21 (#13)
0.47 (#17)0.67 (#18)0.68 (#24)0.32 (#16)0.29 (#15)0.44 (#18)0.28 (#15)1.03 (#19)0.63 (#17)0.33 (#17)0.28 (#15)
0.27 (#12)0.41 (#15)0.12 (#12)0.13 (#9)0.12 (#12)0.21 (#11)0.13 (#11)0.75 (#16)0.54 (#16)0.22 (#12)0.16 (#11)
0.51 (#18)0.47 (#16)0.20 (#16)0.59 (#22)0.55 (#22)0.49 (#20)0.55 (#22)0.54 (#14)0.51 (#15)0.54 (#20)0.57 (#20)
0.34 (#15)0.38 (#13)0.25 (#18)0.25 (#13)0.24 (#14)0.40 (#17)0.26 (#14)0.58 (#15)0.49 (#14)0.30 (#15)0.26 (#14)
0.21 (#11)0.28 (#10)0.08 (#10)0.13 (#10)0.11 (#10)0.17 (#10)0.12 (#10)0.47 (#11)0.45 (#13)0.17 (#11)0.13 (#10)
0.18 (#9)0.30 (#11)0.12 (#13)0.10 (#8)0.10 (#9)0.11 (#9)0.11 (#9)0.38 (#10)0.37 (#12)0.14 (#9)0.11 (#9)
0.19 (#10)0.18 (#9)0.04 (#5)0.16 (#11)0.12 (#11)0.21 (#12)0.15 (#12)0.36 (#9)0.33 (#11)0.17 (#10)0.17 (#12)
0.38 (#16)0.34 (#12)0.17 (#15)0.47 (#19)0.42 (#19)0.33 (#15)0.42 (#20)0.49 (#12)0.33 (#10)0.35 (#18)0.41 (#19)
0.33 (#14)0.40 (#14)0.20 (#17)0.37 (#18)0.33 (#17)0.22 (#13)0.37 (#18)0.51 (#13)0.31 (#9)0.29 (#14)0.34 (#17)
0.08 (#5)0.15 (#8)0.06 (#9)0.04 (#3)0.04 (#3)0.05 (#4)0.05 (#3)0.17 (#8)0.16 (#8)0.05 (#5)0.05 (#4)
0.07 (#4)0.08 (#4)0.04 (#4)0.05 (#4)0.05 (#4)0.05 (#5)0.05 (#5)0.14 (#5)0.12 (#7)0.05 (#4)0.05 (#5)
0.09 (#7)0.11 (#5)0.05 (#8)0.08 (#6)0.07 (#6)0.08 (#6)0.08 (#6)0.15 (#7)0.12 (#6)0.08 (#6)0.07 (#6)
0.11 (#8)0.12 (#7)0.05 (#7)0.28 (#14)0.08 (#8)0.08 (#7)0.09 (#8)0.15 (#6)0.10 (#5)0.09 (#8)0.11 (#8)
0.08 (#6)0.11 (#6)0.04 (#6)0.09 (#7)0.08 (#7)0.08 (#8)0.08 (#7)0.10 (#4)0.08 (#4)0.08 (#7)0.09 (#7)
0.05 (#3)0.05 (#3)0.03 (#3)0.05 (#5)0.05 (#5)0.04 (#3)0.05 (#4)0.08 (#3)0.05 (#3)0.04 (#3)0.05 (#3)
0.00 (#1)0.00 (#1)0.00 (#1)0.00 (#1)0.00 (#1)0.00 (#1)0.00 (#1)0.00 (#1)0.00 (#1)0.00 (#1)0.00 (#1)
0.00 (#2)0.00 (#2)0.00 (#2)0.00 (#2)0.00 (#2)0.00 (#2)0.00 (#2)0.00 (#2)0.00 (#2)0.00 (#2)0.00 (#2)
AutoBench Run 5 - December 2025 - AutoBench