Back to Archive

AutoBench Agronomy LLM Benchmark - December 2025

The first AutoBench run for the Agronomy domain with models Gemini 3 Pro, Gpt 5.1, Grok 4.1, Opus 4.5 and more

Past
Date
December 10, 2025
Version
2025-12-10
Models
40
New Models
17

Run data

Model
Average (All Topics)
41.75s (#1)
45.94s (#2)
56.42s (#3)
59.91s (#4)
60.13s (#5)
64.98s (#6)
65.33s (#7)
73.19s (#8)
79.85s (#9)
86.47s (#10)
88.99s (#11)
90.11s (#12)
97.02s (#13)
100.63s (#14)
111.58s (#15)
126.90s (#16)
142.96s (#17)
143.01s (#18)
143.02s (#19)
144.36s (#20)
152.50s (#21)
155.68s (#22)
159.18s (#23)
162.15s (#24)
166.08s (#25)
174.47s (#26)
176.91s (#27)
186.92s (#28)
200.73s (#29)
219.87s (#30)
224.19s (#31)
238.10s (#32)
238.55s (#33)
254.75s (#34)
283.44s (#35)
312.34s (#36)
347.66s (#37)
360.26s (#38)
365.38s (#39)
381.40s (#40)
AutoBench Agronomy LLM Benchmark - December 2025 - AutoBench