Back to Archive

AutoBench Agronomy LLM Benchmark - December 2025

The first AutoBench run for the Agronomy domain with models Gemini 3 Pro, Gpt 5.1, Grok 4.1, Opus 4.5 and more

Past
Date
December 10, 2025
Version
2025-12-10
Models
40
New Models
17

Run data

Model
Average (All Topics)CodingCreative WritingCurrent NewsGeneral CultureGrammarHistoryLogicsMathScienceTechnology
6.53s (#1)----------
7.51s (#2)----------
7.84s (#3)----------
10.98s (#4)----------
12.09s (#5)----------
12.37s (#6)----------
14.87s (#7)----------
15.16s (#8)----------
16.98s (#9)----------
17.50s (#10)----------
19.89s (#11)----------
21.11s (#12)----------
21.87s (#13)----------
23.30s (#14)----------
24.09s (#15)----------
26.09s (#16)----------
29.33s (#17)----------
30.64s (#18)----------
32.19s (#19)----------
34.63s (#20)----------
35.26s (#21)----------
35.56s (#22)----------
35.68s (#23)----------
42.23s (#24)----------
45.41s (#25)----------
46.15s (#26)----------
50.43s (#27)----------
50.84s (#28)----------
52.84s (#29)----------
53.70s (#30)----------
61.60s (#31)----------
66.00s (#32)----------
68.03s (#33)----------
68.36s (#34)----------
70.41s (#35)----------
71.34s (#36)----------
74.18s (#37)----------
74.34s (#38)----------
112.19s (#39)----------
140.66s (#40)----------
AutoBench Agronomy LLM Benchmark - December 2025 - AutoBench