Back to Archive

AutoBench Agronomy LLM Benchmark - December 2025

The first AutoBench run for the Agronomy domain with models Gemini 3 Pro, Gpt 5.1, Grok 4.1, Opus 4.5 and more

Past
Date
December 10, 2025
Version
2025-12-10
Models
40
New Models
17

Run data

Model
Average (All Topics)CodingCreative WritingCurrent NewsGeneral CultureGrammarHistoryLogicsMathScienceTechnology
381.40s (#40)----------
365.38s (#39)----------
360.26s (#38)----------
347.66s (#37)----------
312.34s (#36)----------
283.44s (#35)----------
254.75s (#34)----------
238.55s (#33)----------
238.10s (#32)----------
224.19s (#31)----------
219.87s (#30)----------
200.73s (#29)----------
186.92s (#28)----------
176.91s (#27)----------
174.47s (#26)----------
166.08s (#25)----------
162.15s (#24)----------
159.18s (#23)----------
155.68s (#22)----------
152.50s (#21)----------
144.36s (#20)----------
143.02s (#19)----------
143.01s (#18)----------
142.96s (#17)----------
126.90s (#16)----------
111.58s (#15)----------
100.63s (#14)----------
97.02s (#13)----------
90.11s (#12)----------
88.99s (#11)----------
86.47s (#10)----------
79.85s (#9)----------
73.19s (#8)----------
65.33s (#7)----------
64.98s (#6)----------
60.13s (#5)----------
59.91s (#4)----------
56.42s (#3)----------
45.94s (#2)----------
41.75s (#1)----------
AutoBench Agronomy LLM Benchmark - December 2025 - AutoBench