Back to Archive

AutoBench Agronomy LLM Benchmark - December 2025

The first AutoBench run for the Agronomy domain with models Gemini 3 Pro, Gpt 5.1, Grok 4.1, Opus 4.5 and more

Past
Date
December 10, 2025
Version
2025-12-10
Models
40
New Models
17

Run data

Model
Average (All Topics)CodingCreative WritingCurrent NewsGeneral CultureGrammarHistoryLogicsMathScienceTechnology
140.66s (#40)----------
112.19s (#39)----------
74.34s (#38)----------
74.18s (#37)----------
71.34s (#36)----------
70.41s (#35)----------
68.36s (#34)----------
68.03s (#33)----------
66.00s (#32)----------
61.60s (#31)----------
53.70s (#30)----------
52.84s (#29)----------
50.84s (#28)----------
50.43s (#27)----------
46.15s (#26)----------
45.41s (#25)----------
42.23s (#24)----------
35.68s (#23)----------
35.56s (#22)----------
35.26s (#21)----------
34.63s (#20)----------
32.19s (#19)----------
30.64s (#18)----------
29.33s (#17)----------
26.09s (#16)----------
24.09s (#15)----------
23.30s (#14)----------
21.87s (#13)----------
21.11s (#12)----------
19.89s (#11)----------
17.50s (#10)----------
16.98s (#9)----------
15.16s (#8)----------
14.87s (#7)----------
12.37s (#6)----------
12.09s (#5)----------
10.98s (#4)----------
7.84s (#3)----------
7.51s (#2)----------
6.53s (#1)----------
AutoBench Agronomy LLM Benchmark - December 2025 - AutoBench