Back to Models
Llama 4 Maverick 17B Instruct
FP8-quantized 17B Llama 4 Maverick model optimized for deployment efficiency and speed
Parameters
170 B
Context
128,000 tokens
Released
Apr 1, 2025
Leaderboards
QUALITY
Average Score combining domain-specific Autobench scores; Higher is better
- 4.51
- 4.49
- 4.48
- 4.42
- 4.41
- 4.33
- 4.32
- 4.31
- 4.27
- 4.24
- 4.18
- 4.18
- 4.18
- 4.17
- 4.17
- 4.06
- 3.98
- 3.98
- 3.95
- 3.95
- 3.88
- 3.66
- 3.64
- 3.59
- 3.54
- 3.49
PRICE
USD cent per average answer; Lower is better
- 0.02
- 0.02
- 0.03
- 0.05
- 0.08
- 0.08
- 0.09
- 0.12
- 0.14
- 0.18
- 0.24
- 0.24
- 0.36
- 0.45
- 0.63
- 0.63
- 0.64
- 0.83
- 0.87
- 0.91
- 1.59
- 1.71
- 1.85
- 2.92
- 4.37
- 9.13
LATENCY
Average Latency in Seconds; Lower is better
- 5.29s
- 7.53s
- 7.74s
- 10.65s
- 11.52s
- 17.54s
- 19.16s
- 24.36s
- 26.12s
- 27.01s
- 29.72s
- 32.86s
- 33.67s
- 39.05s
- 40.3s
- 48.62s
- 48.71s
- 60.96s
- 61.12s
- 63.9s
- 65.02s
- 65.03s
- 65.9s
- 66.5s
- 68.34s
- 72.64s
- 80.74s
- 90s
- 119.17s
Performance vs. Industry Average
Context Window
Llama 4 Maverick 17B Instruct has a smaller context window than average (246k tokens), with a context window of 128k tokens.