Back to Archive

AutoBench Agronomy LLM Benchmark - December 2025

The first AutoBench run for the Agronomy domain with models Gemini 3 Pro, Gpt 5.1, Grok 4.1, Opus 4.5 and more

Past
Date
December 10, 2025
Version
2025-12-10
Models
40
New Models
17

Run data

Model
Average (All Topics)CodingCreative WritingCurrent NewsGeneral CultureGrammarHistoryLogicsMathScienceTechnology
66.00s (#32)----------
29.33s (#17)----------
50.84s (#28)----------
35.26s (#21)----------
21.11s (#12)----------
68.03s (#33)----------
35.68s (#23)----------
15.16s (#8)----------
68.36s (#34)----------
61.60s (#31)----------
19.89s (#11)----------
74.18s (#37)----------
21.87s (#13)----------
32.19s (#19)----------
12.37s (#6)----------
52.84s (#29)----------
42.23s (#24)----------
53.70s (#30)----------
26.09s (#16)----------
71.34s (#36)----------
16.98s (#9)----------
10.98s (#4)----------
50.43s (#27)----------
46.15s (#26)----------
30.64s (#18)----------
112.19s (#39)----------
74.34s (#38)----------
140.66s (#40)----------
34.63s (#20)----------
23.30s (#14)----------
70.41s (#35)----------
24.09s (#15)----------
45.41s (#25)----------
35.56s (#22)----------
12.09s (#5)----------
7.51s (#2)----------
17.50s (#10)----------
6.53s (#1)----------
7.84s (#3)----------
14.87s (#7)----------
AutoBench Agronomy LLM Benchmark - December 2025 - AutoBench