Back to Archive

AutoBench Agronomy LLM Benchmark - December 2025

The first AutoBench run for the Agronomy domain with models Gemini 3 Pro, Gpt 5.1, Grok 4.1, Opus 4.5 and more

Past
Date
December 10, 2025
Version
2025-12-10
Models
40
New Models
17

Run data

Model
Average (All Topics)CodingCreative WritingCurrent NewsGeneral CultureGrammarHistoryLogicsMathScienceTechnology
0.01 (#1)----------
0.02 (#2)----------
0.02 (#3)----------
0.02 (#4)----------
0.03 (#5)----------
0.03 (#6)----------
0.03 (#7)----------
0.05 (#8)----------
0.07 (#9)----------
0.07 (#10)----------
0.07 (#11)----------
0.07 (#12)----------
0.08 (#13)----------
0.08 (#14)----------
0.10 (#15)----------
0.10 (#16)----------
0.10 (#17)----------
0.11 (#18)----------
0.13 (#19)----------
0.16 (#20)----------
0.16 (#21)----------
0.21 (#22)----------
0.21 (#23)----------
0.30 (#24)----------
0.33 (#25)----------
0.34 (#26)----------
0.36 (#27)----------
0.40 (#28)----------
0.43 (#29)----------
0.67 (#30)----------
0.80 (#31)----------
0.81 (#32)----------
1.95 (#33)----------
2.08 (#34)----------
3.41 (#35)----------
3.88 (#36)----------
3.95 (#37)----------
5.43 (#38)----------
7.31 (#39)----------
7.70 (#40)----------
AutoBench Agronomy LLM Benchmark - December 2025 - AutoBench