Back to Archive

AutoBench Agronomy LLM Benchmark - December 2025

The first AutoBench run for the Agronomy domain with models Gemini 3 Pro, Gpt 5.1, Grok 4.1, Opus 4.5 and more

Past
Date
December 10, 2025
Version
2025-12-10
Models
40
New Models
17

Run data

Model
Average (All Topics)CodingCreative WritingCurrent NewsGeneral CultureGrammarHistoryLogicsMathScienceTechnology
4.6 (#6)----------
4.38 (#24)----------
4.56 (#13)----------
4.28 (#28)----------
4.52 (#17)----------
4.56 (#12)----------
4.16 (#31)----------
3.61 (#35)----------
4.52 (#16)----------
4.59 (#8)----------
2.9 (#40)----------
4.59 (#9)----------
4.46 (#19)----------
4.44 (#22)----------
3.68 (#33)----------
4.44 (#21)----------
4.45 (#20)----------
4.54 (#14)----------
4.18 (#30)----------
4.38 (#23)----------
4.47 (#18)----------
4.33 (#26)----------
4.63 (#5)----------
4.64 (#3)----------
4.34 (#25)----------
4.83 (#2)----------
4.59 (#7)----------
4.85 (#1)----------
4.57 (#11)----------
4.32 (#27)----------
4.54 (#15)----------
4.58 (#10)----------
4.64 (#4)----------
4.27 (#29)----------
3.66 (#34)----------
3.91 (#32)----------
3.43 (#39)----------
3.51 (#36)----------
3.48 (#37)----------
3.44 (#38)----------
AutoBench Agronomy LLM Benchmark - December 2025 - AutoBench