Back to Archive

AutoBench Run 4 - November 2025

Latest AutoBench run with models Gemini 3 Pro, Gpt 5.1, Grok 4.1 and more

Past
Date
November 28, 2025
Version
2025-11-28
Models
32
New Models
18

Run data

Model
Average (All Topics)CodingCreative WritingCurrent NewsGeneral CultureGrammarHistoryLogicsMathScienceTechnology
3.55 (#30)3.05 (#32)3.5 (#30)3.77 (#29)4.02 (#27)3.62 (#28)3.9 (#29)2.96 (#30)3.03 (#30)3.84 (#28)3.89 (#29)
3.35 (#32)3.09 (#31)3.48 (#31)3.55 (#32)3.75 (#32)3.25 (#32)3.62 (#32)2.89 (#32)2.56 (#32)3.58 (#32)3.71 (#31)
3.46 (#31)3.09 (#30)3.31 (#32)3.72 (#31)3.86 (#31)3.29 (#31)3.83 (#31)2.91 (#31)3.12 (#29)3.78 (#29)3.7 (#32)
3.7 (#27)3.24 (#29)3.75 (#26)4.14 (#26)4.06 (#26)3.68 (#27)4.13 (#25)3 (#29)2.85 (#31)4.08 (#25)4.11 (#25)
3.61 (#28)3.24 (#28)3.55 (#29)3.76 (#30)3.93 (#30)3.6 (#29)3.87 (#30)3.2 (#27)3.36 (#25)3.77 (#30)3.87 (#30)
3.6 (#29)3.27 (#27)3.6 (#28)3.8 (#28)3.96 (#29)3.34 (#30)3.92 (#28)3.29 (#26)3.25 (#27)3.76 (#31)3.89 (#28)
3.7 (#26)3.37 (#26)3.72 (#27)3.85 (#27)3.96 (#28)3.75 (#25)4.06 (#27)3.18 (#28)3.3 (#26)3.94 (#27)3.93 (#27)
3.81 (#25)3.47 (#25)3.89 (#25)4.14 (#25)4.15 (#25)3.73 (#26)4.1 (#26)3.37 (#25)3.23 (#28)4.02 (#26)4.08 (#26)
3.95 (#24)3.78 (#24)4.02 (#24)4.19 (#23)4.19 (#24)3.92 (#24)4.2 (#24)3.48 (#24)3.42 (#23)4.14 (#24)4.19 (#24)
4.08 (#23)3.88 (#23)4.12 (#23)4.18 (#24)4.25 (#23)4.15 (#20)4.26 (#23)3.8 (#20)3.72 (#21)4.21 (#23)4.22 (#23)
4.17 (#20)3.88 (#22)4.34 (#8)4.4 (#8)4.41 (#15)4.28 (#9)4.43 (#12)3.67 (#22)3.37 (#24)4.43 (#7)4.42 (#11)
4.1 (#22)3.92 (#21)4.14 (#22)4.22 (#21)4.29 (#22)4.06 (#23)4.32 (#22)3.8 (#21)3.77 (#20)4.27 (#21)4.26 (#22)
4.28 (#11)4 (#20)4.3 (#9)4.48 (#5)4.53 (#4)4.27 (#12)4.46 (#7)3.95 (#12)3.91 (#18)4.35 (#17)4.46 (#5)
4.21 (#18)4.06 (#19)4.24 (#16)4.33 (#20)4.41 (#14)4.1 (#21)4.42 (#14)3.9 (#15)3.99 (#16)4.34 (#18)4.35 (#15)
4.23 (#16)4.08 (#18)4.24 (#17)4.35 (#16)4.4 (#16)4.21 (#14)4.42 (#13)3.93 (#14)4 (#15)4.33 (#19)4.38 (#14)
4.27 (#13)4.09 (#17)4.37 (#5)4.4 (#9)4.48 (#8)4.22 (#13)4.44 (#9)3.88 (#17)4.09 (#11)4.36 (#16)4.34 (#17)
4.16 (#21)4.11 (#16)4.23 (#18)4.2 (#22)4.31 (#21)4.08 (#22)4.34 (#21)3.86 (#18)4.01 (#14)4.25 (#22)4.27 (#21)
4.25 (#14)4.12 (#15)4.23 (#19)4.37 (#11)4.5 (#6)4.16 (#19)4.46 (#8)3.82 (#19)4.06 (#12)4.47 (#4)4.33 (#18)
4.21 (#19)4.13 (#14)4.18 (#21)4.55 (#2)4.55 (#3)4.17 (#18)4.48 (#5)3.51 (#23)3.61 (#22)4.42 (#9)4.5 (#3)
4.31 (#9)4.14 (#13)4.41 (#4)4.35 (#14)4.43 (#12)4.32 (#7)4.5 (#3)4.05 (#10)4.14 (#10)4.38 (#14)4.42 (#10)
4.22 (#17)4.16 (#12)4.27 (#12)4.33 (#19)4.38 (#19)4.2 (#16)4.4 (#17)3.9 (#16)3.97 (#17)4.27 (#20)4.35 (#16)
4.24 (#15)4.17 (#11)4.26 (#13)4.36 (#13)4.34 (#20)4.17 (#17)4.37 (#19)4.08 (#7)3.9 (#19)4.39 (#13)4.29 (#20)
4.27 (#12)4.19 (#10)4.28 (#11)4.35 (#15)4.39 (#18)4.27 (#11)4.48 (#4)3.94 (#13)4.03 (#13)4.38 (#15)4.44 (#7)
4.34 (#7)4.27 (#9)4.36 (#7)4.36 (#12)4.46 (#10)4.35 (#4)4.4 (#16)4.15 (#5)4.14 (#9)4.45 (#5)4.45 (#6)
4.32 (#8)4.3 (#8)4.3 (#10)4.37 (#10)4.46 (#11)4.2 (#15)4.42 (#15)4 (#11)4.21 (#6)4.44 (#6)4.46 (#4)
4.39 (#3)4.33 (#7)4.46 (#1)4.42 (#7)4.49 (#7)4.32 (#6)4.44 (#10)4.27 (#3)4.3 (#2)4.42 (#11)4.41 (#12)
4.34 (#6)4.35 (#6)4.26 (#14)4.55 (#1)4.46 (#9)4.27 (#10)4.43 (#11)4.12 (#6)4.17 (#8)4.4 (#12)4.4 (#13)
4.37 (#4)4.36 (#5)4.37 (#6)4.34 (#18)4.52 (#5)4.37 (#3)4.37 (#20)4.21 (#4)4.31 (#1)4.42 (#8)4.43 (#9)
4.3 (#10)4.38 (#4)4.19 (#20)4.35 (#17)4.39 (#17)4.28 (#8)4.38 (#18)4.06 (#9)4.21 (#7)4.42 (#10)4.33 (#19)
4.45 (#2)4.39 (#3)4.42 (#3)4.54 (#3)4.63 (#1)4.4 (#2)4.58 (#2)4.28 (#2)4.23 (#5)4.48 (#3)4.58 (#1)
4.37 (#5)4.42 (#2)4.26 (#15)4.43 (#6)4.42 (#13)4.34 (#5)4.47 (#6)4.06 (#8)4.25 (#4)4.56 (#2)4.43 (#8)
4.49 (#1)4.43 (#1)4.45 (#2)4.54 (#4)4.57 (#2)4.51 (#1)4.59 (#1)4.37 (#1)4.29 (#3)4.59 (#1)4.53 (#2)
AutoBench Run 4 - November 2025 - AutoBench