Back to Archive

AutoBench Run 2 - April 2025

Second major AutoBench run with o4-mini, GPT-4.1-mini, Gemini 2.5 Pro Preview, Claude 3.7 Sonnet:thinking, etc.

Past
Date
April 25, 2025
Version
2025-04-25
Models
24
New Models
24

Run data

Model
Average (All Topics)CodingCreative WritingCurrent NewsGeneral CultureGrammarHistoryLogicsMathScienceTechnology
5.22s (#1)6.84s (#1)4.60s (#2)5.93s (#3)4.74s (#1)4.62s (#1)4.55s (#1)4.67s (#1)6.30s (#1)4.74s (#2)5.24s (#3)
5.65s (#2)9.29s (#2)6.08s (#5)4.63s (#1)5.19s (#3)5.01s (#2)4.59s (#2)5.11s (#2)7.41s (#2)4.64s (#1)4.54s (#1)
5.76s (#3)10.77s (#3)3.26s (#1)4.94s (#2)4.90s (#2)5.14s (#3)4.94s (#3)5.33s (#3)7.55s (#3)5.50s (#3)5.24s (#2)
8.49s (#4)15.31s (#5)5.74s (#4)7.21s (#4)7.80s (#4)6.92s (#4)7.76s (#6)8.66s (#5)11.21s (#4)7.25s (#4)7.04s (#6)
9.76s (#5)19.09s (#9)5.48s (#3)7.29s (#6)8.48s (#6)7.92s (#5)7.71s (#5)11.39s (#8)15.29s (#7)7.93s (#6)6.97s (#5)
10.69s (#6)15.17s (#4)7.85s (#6)7.25s (#5)8.29s (#5)8.80s (#6)7.57s (#4)15.26s (#11)22.95s (#11)7.32s (#5)6.45s (#4)
10.80s (#7)15.48s (#6)11.25s (#9)10.95s (#9)10.37s (#8)9.97s (#8)10.82s (#9)8.59s (#4)11.27s (#5)9.60s (#8)9.76s (#7)
11.74s (#8)16.88s (#8)8.21s (#7)9.83s (#7)10.24s (#7)9.54s (#7)10.44s (#8)12.20s (#10)20.29s (#9)9.47s (#7)10.32s (#8)
12.17s (#9)16.86s (#7)11.60s (#10)10.77s (#8)11.06s (#9)10.29s (#9)10.93s (#10)11.29s (#7)18.05s (#8)10.20s (#9)10.68s (#9)
13.99s (#10)20.56s (#10)13.11s (#11)13.60s (#11)11.50s (#10)11.19s (#10)10.28s (#7)11.95s (#9)22.39s (#10)12.28s (#11)13.10s (#11)
15.38s (#11)24.38s (#12)9.05s (#8)11.06s (#10)11.79s (#11)14.19s (#12)12.07s (#11)17.77s (#12)30.85s (#12)11.08s (#10)11.55s (#10)
15.53s (#12)23.57s (#11)16.24s (#13)14.73s (#12)16.55s (#13)13.08s (#11)17.49s (#13)10.55s (#6)14.46s (#6)13.77s (#12)14.83s (#13)
19.10s (#13)25.56s (#13)14.83s (#12)14.82s (#13)15.96s (#12)19.49s (#13)15.95s (#12)21.58s (#13)34.85s (#14)14.75s (#13)13.26s (#12)
25.04s (#14)35.44s (#14)17.30s (#14)21.43s (#14)23.43s (#15)23.41s (#15)23.64s (#15)24.97s (#15)37.67s (#15)21.89s (#15)21.21s (#14)
29.18s (#15)52.36s (#18)24.34s (#19)26.90s (#19)29.83s (#19)22.78s (#14)28.73s (#20)25.14s (#16)33.72s (#13)19.13s (#14)28.87s (#19)
30.03s (#16)57.30s (#20)18.12s (#15)26.05s (#17)21.70s (#14)24.51s (#16)25.17s (#16)23.42s (#14)40.69s (#16)34.57s (#24)28.76s (#18)
31.03s (#17)42.84s (#15)19.57s (#16)26.71s (#18)33.20s (#23)26.50s (#17)27.23s (#18)31.80s (#17)42.40s (#18)32.56s (#21)27.52s (#16)
33.94s (#18)44.10s (#16)28.57s (#21)28.82s (#22)30.47s (#20)35.20s (#24)30.32s (#23)37.70s (#20)42.02s (#17)26.85s (#19)35.39s (#23)
34.57s (#19)71.53s (#24)28.24s (#20)28.23s (#20)32.61s (#22)27.73s (#18)26.69s (#17)32.57s (#18)44.95s (#19)23.61s (#16)29.47s (#20)
34.73s (#20)52.60s (#19)40.48s (#24)28.34s (#21)28.70s (#17)29.42s (#19)28.43s (#19)33.31s (#19)49.76s (#20)26.57s (#18)29.67s (#21)
36.57s (#21)51.62s (#17)23.10s (#17)25.99s (#16)29.23s (#18)32.35s (#22)29.55s (#21)49.82s (#22)68.76s (#21)27.30s (#20)27.97s (#17)
42.28s (#22)60.10s (#21)29.61s (#22)38.87s (#24)31.23s (#21)31.30s (#20)30.26s (#22)49.36s (#21)70.08s (#22)34.02s (#23)47.99s (#25)
43.84s (#23)70.17s (#22)23.70s (#18)23.43s (#15)24.11s (#16)31.86s (#21)21.37s (#14)80.50s (#24)116.39s (#24)24.50s (#17)22.37s (#15)
45.80s (#24)71.38s (#23)35.11s (#23)30.36s (#23)34.95s (#24)34.29s (#23)39.00s (#24)58.43s (#23)85.34s (#23)34.01s (#22)35.17s (#22)
84.77s (#25)132.90s (#25)72.63s (#25)45.19s (#25)49.89s (#25)63.33s (#25)45.66s (#25)136.12s (#25)205.27s (#25)49.02s (#25)47.69s (#24)
AutoBench Run 2 - April 2025 - AutoBench