Back to Archive

AutoBench Agronomy LLM Benchmark - December 2025

The first AutoBench run for the Agronomy domain with models Gemini 3 Pro, Gpt 5.1, Grok 4.1, Opus 4.5 and more

Past
Date
December 10, 2025
Version
2025-12-10
Models
40
New Models
17

Run data

Model
Average (All Topics)
140.66s (#40)
112.19s (#39)
74.34s (#38)
74.18s (#37)
71.34s (#36)
70.41s (#35)
68.36s (#34)
68.03s (#33)
66.00s (#32)
61.60s (#31)
53.70s (#30)
52.84s (#29)
50.84s (#28)
50.43s (#27)
46.15s (#26)
45.41s (#25)
42.23s (#24)
35.68s (#23)
35.56s (#22)
35.26s (#21)
34.63s (#20)
32.19s (#19)
30.64s (#18)
29.33s (#17)
26.09s (#16)
24.09s (#15)
23.30s (#14)
21.87s (#13)
21.11s (#12)
19.89s (#11)
17.50s (#10)
16.98s (#9)
15.16s (#8)
14.87s (#7)
12.37s (#6)
12.09s (#5)
10.98s (#4)
7.84s (#3)
7.51s (#2)
6.53s (#1)
AutoBench Agronomy LLM Benchmark - December 2025 - AutoBench