ARCHIVE

Discover benchmark runs conducted by Autobench, showcasing results across all the available models.

Most Recent Run

AutoBench Run 5 - December 2025

Date
December 19, 2025
Version
2025-12-19
Models
38
New Models
3

Latest AutoBench run with models Gpt 5.2, Claude Opus 4.5, Gemini 3 Flash and more

View Results

Past Runs

AutoBench Run 5 - December 2025

Date
December 16, 2025
Version
2025-12-16
Models
35
New Models
13

Latest AutoBench run with models Gpt 5.2, Claude Opus 4.5, DeepSeek 3.2 Speciale and more

View Results

AutoBench Agronomy LLM Benchmark - December 2025

Date
December 10, 2025
Version
2025-12-10
Models
40
New Models
17

The first AutoBench run for the Agronomy domain with models Gemini 3 Pro, Gpt 5.1, Grok 4.1, Opus 4.5 and more

View Results

AutoBench Run 4 - November 2025

Date
November 28, 2025
Version
2025-11-28
Models
32
New Models
18

Latest AutoBench run with models Gemini 3 Pro, Gpt 5.1, Grok 4.1 and more

View Results

AutoBench Run 3 - August 2025

Date
August 14, 2025
Version
2025-08-14
Models
33
New Models
26

Latest AutoBench run with enhanced metrics including evaluation iterations and fail rates

View Results

AutoBench Run 2 - April 2025

Date
April 25, 2025
Version
2025-04-25
Models
24
New Models
24

Second major AutoBench run with o4-mini, GPT-4.1-mini, Gemini 2.5 Pro Preview, Claude 3.7 Sonnet:thinking, etc.

View Results
Archive - AutoBench