deepseek-ai/deepseek-v3
Serve with deepseek-ai/deepseek-v3 models.The use of DeepSeek-V3 Base/Chat models is subject to the DeepSeek Model License.
On top of the efficient architecture of DeepSeek-V2, an auxiliary-loss-free strategy is pioneered for load balancing, which minimizes the performance degradation that arises from encouraging load balancing.
A Multi-Token Prediction (MTP) objective is investigated and proven beneficial to model performance. It can also be used for speculative decoding for inference acceleration.
An FP8 mixed precision training framework is designed and, for the first time, the feasibility and effectiveness of FP8 training are validated on an extremely large-scale model. Through co-design of algorithms, frameworks, and hardware, the communication bottleneck is overcomed in cross-node MoE training, nearly achieving full computation-communication overlap.
This significantly enhances the training efficiency and reduces the training costs, enabling the team to further scale up the model size without additional overhead. At an economical cost of only 2.664M H800 GPU hours, the pre-training of DeepSeek-V3 is completed on 14.8T tokens, producing the currently strongest open-source base model. The subsequent training stages after pre-training require only 0.1M GPU hours.
An innovative methodology is introduced to distill reasoning capabilities from the long-Chain-of-Thought (CoT) model, specifically from one of the DeepSeek R1 series models, into standard LLMs, particularly DeepSeek-V3. The pipeline elegantly incorporates the verification and reflection patterns of R1 into DeepSeek-V3 and notably improves its reasoning performance. Meanwhile, a control is maintained over the output style and length of DeepSeek-V3.
Example inference (Python)
Resource ID | Release Date | Release Stage | Description |
---|---|---|---|
deepseek-ai/DeepSeek-V3 | 2/12/2025 | Public Preview | Serving |
deepseek-ai/DeepSeek-V3-base | 2/12/2025 | Public Preview | Serving |
The use of DeepSeek-V3 Base/Chat models is subject to the DeepSeek Model License.
deepseek-ai/DeepSeek-v3 in github
deepseek-ai/DeepSeek-v3 in huggingface
Google Cloud Console has failed to load JavaScript sources from www.gstatic.com.
Possible reasons are: