Nemotron 3 Super 120B A12B
Nemotron 3 Super is a 120B parameter hybrid Mamba-Transformer model (12B active). It utilizes LatentMoE and Multi-Token Prediction (MTP) to maximize compute efficiency for complex RAG and IT ticket automation.
Leaderboards
QUALITY
Average Score combining domain-specific Autobench scores; Higher is better
- 3.17
- 3.06
- 3.02
- 2.99
- 2.99
- 2.92
- 2.92
- 2.92
- 2.92
- 2.91
- 2.90
- 2.84
- 2.83
- 2.78
- 2.76
- 2.72
- 2.70
- 2.69
- 2.66
- 2.65
- 2.55
- 2.27
PRICE
USD cent per average answer; Lower is better
- 0.01
- 0.02
- 0.02
- 0.03
- 0.05
- 0.05
- 0.06
- 0.09
- 0.11
- 0.12
- 0.13
- 0.14
- 0.20
- 0.32
- 0.43
- 0.50
- 0.86
- 1.54
- 1.56
- 2.03
- 2.56
- 5.82
LATENCY
Average Latency in Seconds; Lower is better
- 9.00s
- 11.00s
- 13.00s
- 13.00s
- 14.00s
- 14.00s
- 18.00s
- 26.00s
- 26.00s
- 28.00s
- 36.00s
- 37.00s
- 38.00s
- 41.00s
- 43.00s
- 44.00s
- 44.00s
- 46.00s
- 48.00s
- 52.00s
- 54.00s
- 55.00s
- 57.00s
- 66.00s
- 85.00s
- 87.00s
- 93.00s
- 103.00s
- 129.00s
Performance vs. Industry Average
Intelligence
Nemotron 3 Super 120B A12B is of lower intelligence compared to average (2.8), with an intelligence score of 2.7.
Price
Nemotron 3 Super 120B A12B is cheaper compared to average ($0.67 per 1M Tokens) with a price of $0.07 per 1M Tokens.
Latency
Nemotron 3 Super 120B A12B has a higher average latency compared to average (45.95s), with an average latency of 71.87s.
P99 Latency
Nemotron 3 Super 120B A12B has a higher P99 latency compared to average (131.50s), taking 245.44s to receive the first token at P99 (TTFT).
Context Window
Nemotron 3 Super 120B A12B has a smaller context window than average (401k tokens), with a context window of 262k tokens.