DeepSeek V3

Efficient MoE model with 671B parameters trained with FP8, achieving strong benchmark results

Parameters

671 B

Context

128,000 tokens

Released

Dec 26, 2024

Leaderboards

Average Score combining domain-specific Autobench scores; Higher is better

DeepSeek V3 has a smaller context window than average (406k tokens), with a context window of 128k tokens.