Back to Models
Llama 3.1 Nemotron 70B Instruct
NVIDIA tuned 70B Llama 3.1 model with enhanced instruction following and helpfulness
Parameters
700 B
Context
128,000 tokens
Released
Nov 1, 2024
Leaderboards
QUALITY
Average Score combining domain-specific Autobench scores; Higher is better
- 4.48
- 4.43
- 4.39
- 4.38
- 4.32
- 4.29
- 4.29
- 4.20
- 4.18
- 4.17
- 4.17
- 4.13
- 4.12
- 4.11
- 4.11
- 4.06
- 4.06
- 3.99
- 3.88
- 3.86
- 3.78
- 3.47
PRICE
USD cent per average answer; Lower is better
- 0.07
- 0.08
- 0.09
- 0.11
- 0.33
- 0.34
- 0.54
- 0.71
- 0.91
- 0.99
- 1.25
- 1.30
- 1.86
- 2.12
- 3.79
- 3.94
- 6.48
- 7.36
- 8.12
- 10.80
- 11.39
- 17.26
- 81.88
LATENCY
Average Latency in Seconds; Lower is better
- 20.42s
- 23.60s
- 30.08s
- 31.40s
- 38.77s
- 45.56s
- 51.84s
- 52.25s
- 61.46s
- 65.62s
- 66.78s
- 69.24s
- 75.48s
- 76.11s
- 82.80s
- 86.80s
- 89.96s
- 93.49s
- 99.62s
- 104.78s
- 110.95s
- 122.42s
- 124.57s
- 130.10s
- 136.96s
- 144.01s
- 163.15s
- 169.73s
- 171.50s
- 180.11s
- 187.43s
- 227.43s
- 247.97s
- 261.38s
- 310.39s
Performance vs. Industry Average
Context Window
Llama 3.1 Nemotron 70B Instruct has a smaller context window than average (351k tokens), with a context window of 128k tokens.