Back to Archive

AutoBench Agronomy LLM Benchmark - December 2025

The first AutoBench run for the Agronomy domain with models Gemini 3 Pro, Gpt 5.1, Grok 4.1, Opus 4.5 and more

Past
Date
December 10, 2025
Version
2025-12-10
Models
40
New Models
17

Run data

Model
Average (All Topics)CodingCreative WritingCurrent NewsGeneral CultureGrammarHistoryLogicsMathScienceTechnology
238.10s (#32)----------
155.68s (#22)----------
200.73s (#29)----------
144.36s (#20)----------
86.47s (#10)----------
360.26s (#38)----------
162.15s (#24)----------
60.13s (#5)----------
238.55s (#33)----------
143.01s (#18)----------
142.96s (#17)----------
254.75s (#34)----------
174.47s (#26)----------
126.90s (#16)----------
73.19s (#8)----------
365.38s (#39)----------
283.44s (#35)----------
159.18s (#23)----------
100.63s (#14)----------
381.40s (#40)----------
90.11s (#12)----------
79.85s (#9)----------
186.92s (#28)----------
143.02s (#19)----------
111.58s (#15)----------
312.34s (#36)----------
224.19s (#31)----------
347.66s (#37)----------
152.50s (#21)----------
97.02s (#13)----------
219.87s (#30)----------
64.98s (#6)----------
176.91s (#27)----------
166.08s (#25)----------
65.33s (#7)----------
56.42s (#3)----------
88.99s (#11)----------
41.75s (#1)----------
45.94s (#2)----------
59.91s (#4)----------
AutoBench Agronomy LLM Benchmark - December 2025 - AutoBench