Back to Archive

AutoBench Run 4 - November 2025

Latest AutoBench run with models Gemini 3 Pro, Gpt 5.1, Grok 4.1 and more

Past
Date
November 28, 2025
Version
2025-11-28
Models
32
New Models
18

Run data

Model
Average (All Topics)CodingCreative WritingCurrent NewsGeneral CultureGrammarHistoryLogicsMathScienceTechnology
0.01 (#1)0.02 (#1)0.01 (#2)0.01 (#1)0.01 (#1)0.01 (#1)0.01 (#1)0.01 (#1)0.01 (#1)0.01 (#1)0.01 (#1)
0.02 (#2)0.03 (#2)0.01 (#1)0.01 (#2)0.01 (#2)0.02 (#2)0.01 (#2)0.02 (#2)0.02 (#2)0.01 (#2)0.01 (#2)
0.04 (#5)0.09 (#8)0.02 (#3)0.02 (#3)0.02 (#3)0.03 (#4)0.02 (#3)0.08 (#7)0.07 (#8)0.02 (#3)0.01 (#3)
0.05 (#6)0.07 (#5)0.03 (#5)0.03 (#5)0.03 (#4)0.04 (#6)0.03 (#4)0.12 (#9)0.10 (#9)0.03 (#5)0.03 (#4)
0.03 (#3)0.05 (#3)0.04 (#7)0.03 (#4)0.04 (#6)0.03 (#3)0.03 (#5)0.04 (#3)0.04 (#3)0.03 (#4)0.03 (#6)
0.04 (#4)0.06 (#4)0.03 (#4)0.03 (#6)0.03 (#5)0.03 (#5)0.03 (#6)0.04 (#4)0.04 (#4)0.03 (#6)0.03 (#5)
0.05 (#7)0.07 (#6)0.04 (#6)0.04 (#7)0.04 (#7)0.05 (#7)0.04 (#7)0.06 (#5)0.07 (#7)0.04 (#7)0.04 (#7)
0.12 (#11)0.18 (#11)0.07 (#10)0.07 (#10)0.05 (#9)0.10 (#10)0.05 (#8)0.29 (#14)0.29 (#15)0.05 (#8)0.06 (#9)
0.06 (#8)0.08 (#7)0.05 (#9)0.06 (#8)0.05 (#8)0.05 (#8)0.05 (#9)0.08 (#6)0.06 (#6)0.05 (#9)0.05 (#8)
0.07 (#9)0.10 (#9)0.05 (#8)0.07 (#9)0.06 (#10)0.06 (#9)0.07 (#10)0.11 (#8)0.06 (#5)0.06 (#10)0.06 (#10)
0.13 (#12)0.23 (#13)0.09 (#12)0.07 (#11)0.07 (#11)0.13 (#13)0.07 (#11)0.29 (#15)0.22 (#13)0.08 (#12)0.06 (#11)
0.10 (#10)0.16 (#10)0.09 (#11)0.08 (#12)0.07 (#12)0.10 (#11)0.07 (#12)0.14 (#10)0.16 (#11)0.08 (#11)0.08 (#12)
0.19 (#17)0.27 (#15)0.12 (#15)0.09 (#13)0.09 (#14)0.24 (#17)0.09 (#13)0.44 (#18)0.35 (#17)0.14 (#16)0.09 (#13)
0.14 (#13)0.22 (#12)0.09 (#14)0.10 (#14)0.08 (#13)0.13 (#14)0.09 (#14)0.23 (#13)0.24 (#14)0.12 (#14)0.09 (#14)
0.18 (#16)0.34 (#18)0.09 (#13)0.11 (#15)0.09 (#15)0.13 (#12)0.11 (#15)0.43 (#17)0.38 (#18)0.11 (#13)0.10 (#15)
0.17 (#15)0.28 (#16)0.19 (#18)0.15 (#16)0.14 (#16)0.17 (#16)0.14 (#16)0.16 (#11)0.17 (#12)0.14 (#15)0.14 (#16)
0.29 (#19)0.40 (#19)0.16 (#17)0.17 (#17)0.15 (#17)0.30 (#19)0.16 (#17)0.63 (#19)0.54 (#19)0.19 (#18)0.15 (#17)
0.17 (#14)0.26 (#14)0.16 (#16)0.20 (#18)0.16 (#18)0.13 (#15)0.19 (#18)0.21 (#12)0.12 (#10)0.15 (#17)0.16 (#18)
0.26 (#18)0.29 (#17)0.28 (#20)0.20 (#19)0.20 (#19)0.36 (#20)0.21 (#19)0.37 (#16)0.31 (#16)0.21 (#19)0.18 (#19)
0.58 (#20)1.28 (#23)0.22 (#19)0.26 (#20)0.23 (#20)0.26 (#18)0.28 (#20)1.23 (#21)1.50 (#22)0.23 (#20)0.29 (#20)
0.89 (#22)1.15 (#21)0.47 (#22)0.43 (#21)0.32 (#21)0.47 (#21)0.39 (#21)2.41 (#24)2.44 (#24)0.41 (#21)0.37 (#21)
0.98 (#24)1.25 (#22)1.25 (#25)0.48 (#22)0.47 (#22)0.81 (#22)0.48 (#22)2.35 (#23)1.64 (#23)0.60 (#22)0.48 (#22)
0.83 (#21)1.12 (#20)0.47 (#21)0.54 (#23)0.48 (#23)0.84 (#23)0.52 (#23)1.61 (#22)1.42 (#21)0.68 (#23)0.57 (#23)
1.42 (#25)2.40 (#25)0.95 (#24)0.86 (#25)0.66 (#24)1.19 (#25)0.71 (#24)2.94 (#25)2.45 (#25)0.98 (#25)0.91 (#25)
0.91 (#23)1.35 (#24)0.75 (#23)0.77 (#24)0.74 (#25)0.86 (#24)0.80 (#25)1.01 (#20)1.21 (#20)0.82 (#24)0.73 (#24)
1.88 (#26)2.78 (#26)1.93 (#26)1.19 (#26)0.95 (#26)1.75 (#26)1.26 (#26)3.15 (#26)2.72 (#26)1.32 (#26)1.67 (#26)
4.62 (#28)6.34 (#27)4.20 (#29)2.91 (#27)2.87 (#28)4.73 (#29)2.90 (#27)8.47 (#29)7.67 (#28)3.54 (#28)2.88 (#27)
4.06 (#27)6.40 (#28)3.73 (#28)3.41 (#29)2.40 (#27)3.59 (#27)3.02 (#28)6.22 (#27)5.70 (#27)2.87 (#27)3.09 (#28)
4.84 (#29)7.18 (#29)3.30 (#27)3.36 (#28)3.22 (#29)4.18 (#28)3.48 (#29)7.79 (#28)8.01 (#30)3.65 (#29)3.75 (#29)
7.53 (#30)8.58 (#30)6.46 (#30)6.78 (#30)4.44 (#30)8.10 (#31)6.70 (#30)13.16 (#31)9.32 (#31)6.26 (#30)5.24 (#30)
7.71 (#31)9.00 (#31)6.58 (#31)7.14 (#31)6.07 (#31)7.54 (#30)7.11 (#31)12.40 (#30)7.93 (#29)7.04 (#31)5.91 (#31)
15.44 (#32)22.50 (#32)10.69 (#32)9.89 (#32)10.06 (#32)11.55 (#32)10.66 (#32)28.64 (#32)30.75 (#32)10.49 (#32)9.12 (#32)
AutoBench Run 4 - November 2025 - AutoBench