AutoBench Run 3 - August 2025
Date
August 14, 2025
Version
2025-08-14
Models
33
New Models
26
Latest AutoBench run with enhanced metrics including evaluation iterations and fail rates
View Results→Latest AutoBench run with enhanced metrics including evaluation iterations and fail rates
View Results→Second major AutoBench run with o4-mini, GPT-4.1-mini, Gemini 2.5 Pro Preview, Claude 3.7 Sonnet:thinking, etc.
View Results→