DeepSeek Researchers Introduce DeepSeek-v3.2 and DeepSeek-v3.2-Special for Long Context Reasoning and Agentic Workloads

How do you get GPT-5-level logic on real long-context, tool-use workloads without paying the quadratic attention and GPU costs that typically make those systems impractical? DeepSeek Research Introduces deepseek-v3.2 and deepseek-v3.2-specialThey are logic-first models built for agents and target high-quality logic, long context, and agent workflows with open weight and production APIs, The models combine DeepSeek Sparse Attention (DSA), a scaled GRPO reinforcement learning stack, and an agent native tool protocol, and report comparable performance to GPT5, deepseek-v3.2-special Reaching Gemini 3.0 Pro level logic on public benchmarks and competitions.

https://huggingface.co/depseek-ai/DeepSeek-V3.2/blob/main/assets/paper.pdf

Sparse attention with near linear long reference cost

Both deepseek-v3.2 and deepseek-v3.2-special Use Expert Transformer’s DeepSeek-v3 mixin with approximately 671B total parameters and 37B active parameters per token, inherited from v3.1 Terminus. The only structural change is DeepSeek Sparse Attention, which is introduced through continuous pre-training.

DeepSeek splits sparse attention 2 componentsA power index Runs a small number of low precision heads on all token pairs and generates a relevance score. A fine grain selector The head-of-key value per query holds the state, and the main attention path runs multi-query-attention and multi-head-latent-attention on this sparse set.

This changes the key complexity from O(L²) to O(kL), where L is the sequence length and k is the number of selected tokens and is much smaller than L. Based on the benchmark, DeepSeek-V3.2 With fast throughput and low memory usage on H800 class hardware and VLLM and SGlang backends, long context inference matches the dense terminus baseline on accuracy while reducing costs by approximately 50 percent.

Continuous pre-training for DeepSeek sparse attention

deepseek sparse attention (DSA) is introduced by continuous pre-training on top DeepSeek-V3.2 Terminus. In the dense warm up phase, dense attention remains active, all backbone parameters are frozen and only the Lightning indexer is trained with Kullback Leibler loss to match the dense attention distribution on 128K reference sequences. This phase uses a small number of steps and about 2B tokens, which is enough for the indexer to learn a useful score.

In the sparse phase, the selector keeps 2048 key-value entries per query, the backbone is unfrozen and the model continues training on approximately 944B tokens. The gradients for the indexer still come from alignment loss with close attention to only selected positions. it creates a schedule deepseek sparse attention (DSA) behaves as degradation in the replacement of dense attention with similar quality and lower long reference cost.

GRPO with more than 10 percent RL compute

On top of the sparse architecture, DeepSeek-v3.2 uses Group Relative Policy Optimization (GRPO) as the main reinforcement learning method. The research team says that after the training reinforcement learning The RL calculation is more than 10 percent of the pre-training calculation.

RL is organized around expert domains. The research team trains dedicated runs for mathematics, competitive programming, general logical reasoning, browsing and agent tasks, and security, then distributes these to experts in a shared 685B parameter base. deepseek-v3.2 and deepseek-v3.2-specialGRPO is implemented with an unbiased KL estimator, off policy sequence masking and mechanism mix of experts (MOE) Routing and sampling masks are consistent between training and sampling.

Agent Data, Thinking Modes, and Tool Protocols

The DeepSeek research team created a large synthetic agent dataset by generating more than 1,800 environments and more than 85,000 actions across code agents, search agents, common tools, and code interpreter setups. The tasks are designed to be hard to solve and easy to verify, and are used as RL targets along with real coding and search traces.

At the time of estimation, DeepSeek-V3.2 Introduces clear thinking and non-thinking modes. The DeepSeek-Reasoner endpoint exposes thinking mode by default, where the model generates an internal chain of thought before a final answer. Thinking with tool guides explains how logic content is placed in tool calls and cleared when a new user message arrives, and how tool calls and tool results remain in context even when logic text is trimmed for budget.

The chat template is updated to reflect this behavior. DeepSeek-V3.2 Special The repository ships the Python encoder and decoder helpers instead of the Jinja template. Messages can have content as well as a reasoning_content field, which is controlled by a reasoning parameter. A developer role is reserved for search agents and is not accepted in the normal chat flow by the official API, which protects this channel from accidental abuse.

Benchmarks, contests and open artifacts

On standard logic and coding benchmarks, DeepSeek-V3.2 and especially DeepSeek-V3.2 Specific It is reported to be comparable to GPT-5 and close to Gemini-3.0 Pro on suites such as AIME 2025, HMMT 2025, GPQA and LiveCodeBench, with better cost efficiency on long context workloads.

For formal competitions, the DeepSeek research team says that DeepSeek-V3.2 Specific Achieved gold medal level performance at the International Mathematical Olympiad 2025, Chinese Mathematical Olympiad 2025 and International Olympiad in Informatics 2025 and competitive gold medal level performance at the ICPC World Finals 2025.

key takeaways

DeepSeek-v3.2 adds DeepSeek Sparse Attention, which brings close to linear O(KL) attention cost and provides approximately 50% lower long context API cost than previous DeepSeek models while keeping the same quality as the DeepSeek-v3.1 terminus.
The model family retains the 671B parameter MOE backbone with 37B active parameters per token and exposes a full 128K context window into the production API, making long documents, multi step chains, and large tool traces practical rather than just a lab convenience.
Post-training group relative policy optimization (GRPO) is used with a computation budget that is more than 10 percent of pre-training, focused on mathematics, code, general logic, browsing or agent workload and security, along with competition style experts whose cases are released for external validation.
DeepSeek-v3.2 is the first model in the DeepSeek family to integrate thinking directly into tool use, supporting both thinking and non-thinking tool modes and a protocol where internal logic persists across tool calls and is only reset upon new user messages.

check it out paper And model weightFeel free to check us out GitHub page for tutorials, code, and notebooksAlso, feel free to follow us Twitter And don’t forget to join us 100k+ ml subreddit and subscribe our newsletterwait! Are you on Telegram? Now you can also connect with us on Telegram.

Asif Razzaq Marktechpost Media Inc. Is the CEO of. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. Their most recent endeavor is the launch of MarketTechPost, an Artificial Intelligence media platform, known for its in-depth coverage of Machine Learning and Deep Learning news that is technically robust and easily understood by a wide audience. The platform boasts of over 2 million monthly views, which shows its popularity among the audience.

🙌 Follow MarketTechPost: Add us as a favorite source on Google.

DeepSeek Researchers Introduce DeepSeek-v3.2 and DeepSeek-v3.2-Special for Long Context Reasoning and Agentic Workloads

Sparse attention with near linear long reference cost

Continuous pre-training for DeepSeek sparse attention

GRPO with more than 10 percent RL compute

Agent Data, Thinking Modes, and Tool Protocols

Benchmarks, contests and open artifacts

key takeaways

An AI model trained on prison phone calls now looks for planned crimes in those calls

How to Create a Vibe Code on a Budget

Related Articles

Leave a Comment Cancel Reply