Back to Archive
AutoBench Agronomy LLM Benchmark - December 2025
The first AutoBench run for the Agronomy domain with models Gemini 3 Pro, Gpt 5.1, Grok 4.1, Opus 4.5 and more
Past
Date
December 10, 2025
Version
2025-12-10
Models
40
New Models
17
Run data
Model | Average (All Topics) |
|---|---|
| 0.01 (#1) | |
| 0.02 (#2) | |
| 0.02 (#3) | |
| 0.02 (#4) | |
| 0.03 (#5) | |
| 0.03 (#6) | |
| 0.03 (#7) | |
| 0.05 (#8) | |
| 0.07 (#9) | |
| 0.07 (#10) | |
| 0.07 (#11) | |
| 0.07 (#12) | |
| 0.08 (#13) | |
| 0.08 (#14) | |
| 0.10 (#15) | |
| 0.10 (#16) | |
| 0.10 (#17) | |
| 0.11 (#18) | |
| 0.13 (#19) | |
| 0.16 (#20) | |
| 0.16 (#21) | |
| 0.21 (#22) | |
| 0.21 (#23) | |
| 0.30 (#24) | |
| 0.33 (#25) | |
| 0.34 (#26) | |
| 0.36 (#27) | |
| 0.40 (#28) | |
| 0.43 (#29) | |
| 0.67 (#30) | |
| 0.80 (#31) | |
| 0.81 (#32) | |
| 1.95 (#33) | |
| 2.08 (#34) | |
| 3.41 (#35) | |
| 3.88 (#36) | |
| 3.95 (#37) | |
| 5.43 (#38) | |
| 7.31 (#39) | |
| 7.70 (#40) |