Llama 3.3 nemotron super 49b v1.5

Llama 3.3 Nemotron Super 49B is a reasoning model derived from Llama 3.3 70B. It is post-trained for agentic workflows, RAG, and tool calling.

Thinking Mode

Parameters

49000000000 B

Context

131,072 tokens

Released

Invalid Date

Leaderboards

Average Score combining domain-specific Autobench scores; Higher is better

USD cent per average answer; Lower is better

Average Latency in Seconds; Lower is better

Llama 3.3 nemotron super 49b v1.5 is of lower intelligence compared to average (4.1), with an intelligence score of 3.8.

Llama 3.3 nemotron super 49b v1.5 is cheaper compared to average ($4.58 per 1M Tokens) with a price of $0.18 per 1M Tokens.

Llama 3.3 nemotron super 49b v1.5 has a lower average latency compared to average (116.45s), with an average latency of 76.48s.

Llama 3.3 nemotron super 49b v1.5 has a lower P99 latency compared to average (339.37s), taking 240.32s to receive the first token at P99 (TTFT).

Llama 3.3 nemotron super 49b v1.5 has a smaller context window than average (351k tokens), with a context window of 131k tokens.