AutoBench Run 5 - December 2025
Date
December 19, 2025
Version
2025-12-19
Models
38
New Models
3
Latest AutoBench run with models Gpt 5.2, Claude Opus 4.5, Gemini 3 Flash and more
View Results→Latest AutoBench run with models Gpt 5.2, Claude Opus 4.5, Gemini 3 Flash and more
View Results→Latest AutoBench run with models Gpt 5.2, Claude Opus 4.5, DeepSeek 3.2 Speciale and more
View Results→The first AutoBench run for the Agronomy domain with models Gemini 3 Pro, Gpt 5.1, Grok 4.1, Opus 4.5 and more
View Results→Latest AutoBench run with models Gemini 3 Pro, Gpt 5.1, Grok 4.1 and more
View Results→Latest AutoBench run with enhanced metrics including evaluation iterations and fail rates
View Results→Second major AutoBench run with o4-mini, GPT-4.1-mini, Gemini 2.5 Pro Preview, Claude 3.7 Sonnet:thinking, etc.
View Results→