Back to Archive

AutoBench Agronomy LLM Benchmark - December 2025

The first AutoBench run for the Agronomy domain with models Gemini 3 Pro, Gpt 5.1, Grok 4.1, Opus 4.5 and more

Past
Date
December 10, 2025
Version
2025-12-10
Models
40
New Models
17

Run data

Model
AutoBenchChatbot Ar.AAI IndexMMLU Index
2.9 (#40)---
3.43 (#39)---
3.44 (#38)---
3.48 (#37)---
3.51 (#36)---
3.61 (#35)---
3.66 (#34)---
3.68 (#33)---
3.91 (#32)---
4.16 (#31)---
4.18 (#30)---
4.27 (#29)---
4.28 (#28)---
4.32 (#27)---
4.33 (#26)---
4.34 (#25)---
4.38 (#24)---
4.38 (#23)---
4.44 (#22)---
4.45 (#21)---
4.45 (#20)---
4.46 (#19)---
4.47 (#18)---
4.52 (#17)---
4.52 (#16)---
4.54 (#15)---
4.54 (#14)---
4.56 (#13)---
4.56 (#12)---
4.57 (#11)---
4.58 (#10)---
4.58 (#9)---
4.59 (#8)---
4.59 (#7)---
4.6 (#6)---
4.63 (#5)---
4.64 (#4)---
4.64 (#3)---
4.83 (#2)---
4.85 (#1)---
AutoBench Agronomy LLM Benchmark - December 2025 - AutoBench