DeepSeek introduces revolutionary V3.2-Exp model with breakthrough sparse attention technology

Chinese AI company DeepSeek has released its latest experimental model, V3.2-Exp, marking a significant step toward next-generation artificial intelligence architecture. The Hangzhou-based firm announced on September 29, 2025, that this model introduces DeepSeek Sparse Attention (DSA) technology, which dramatically reduces computational costs while maintaining performance quality.

Contents

Revolutionary sparse attention mechanism changes the game Massive cost reduction with maintained performance Benchmark performance maintains high standards Technical architecture builds on proven foundation Intermediate step toward next-generation AI Industry impact and competitive positioning Availability and deployment options Future implications for AI development

Revolutionary sparse attention mechanism changes the game

The core innovation of DeepSeek-V3.2-Exp lies in its sparse attention mechanism. Traditional AI models must process every word in relation to all other words in a text. This becomes incredibly expensive when dealing with long documents. DeepSeek’s new system selects only the most relevant parts of text for processing, cutting computational requirements by more than half.

According to DeepSeek’s technical documentation, DSA achieves “fine-grained sparse attention for the first time”. This means the system can focus on specific parts of long texts without losing important information. The technology works by creating a hierarchical structure that combines both local and global information processing.

“DSA achieves fine-grained sparse attention with minimal impact on output quality — boosting long-context performance & reducing compute cost,” DeepSeek announced on their official Twitter account.

Massive cost reduction with maintained performance

The most striking benefit of V3.2-Exp is its dramatically reduced pricing. DeepSeek has cut API costs by over 50% compared to their previous model. The new pricing structure offers:

Input tokens (cache hit): $0.028 per million tokens
Input tokens (cache miss): $0.28 per million tokens
Output tokens: $0.42 per million tokens

These prices represent a significant reduction from the previous V3.1-Terminus model. For comparison, input prices dropped from $0.5 per million tokens to $0.2 per million tokens for cache hits, and output prices fell from $12 per million tokens to just $3 per million tokens in Chinese yuan equivalent.

Despite these cost reductions, performance remains virtually identical to the previous model. DeepSeek deliberately aligned the training configurations of V3.2-Exp with V3.1-Terminus to ensure fair comparison. Benchmark results show the models perform nearly identically across multiple test categories.

Benchmark performance maintains high standards

DeepSeek-V3.2-Exp demonstrates performance on par with its predecessor across various benchmarks. In reasoning tasks without tool use, both models scored identical 85.0 points on MMLU-Pro. On programming challenges, V3.2-Exp actually performed slightly better, achieving 2121 points on Codeforces compared to V3.1-Terminus’s 2046 points.

Other notable benchmark scores include:

AIME 2025: 89.3 points (vs 88.4 for V3.1-Terminus)
GPQA-Diamond: 79.9 points (minimal decrease from 80.7)
LiveCodeBench: 74.1 points (slight decrease from 74.9)

In agentic tool use categories, the new model showed improvements in some areas. BrowseComp-zh scored 47.9 points compared to 45.0 for the previous version, and SimpleQA achieved 97.1 points versus 96.8.

Technical architecture builds on proven foundation

V3.2-Exp builds upon DeepSeek’s successful V3.1-Terminus architecture while introducing the sparse attention innovation. The model maintains the same 671 billion total parameters with 37 billion activated per token structure that made previous DeepSeek models efficient.

The company developed specialized CUDA kernels to support the sparse attention mechanism. These kernels are available through multiple platforms:

TileLang for research purposes with better readability
DeepGEMM for high-performance indexer logit kernels
FlashMLA for sparse attention kernels

DeepSeek has made these tools open-source to help developers maximize their use of sparse attention technology. The model operates under an MIT license, allowing both commercial and academic use.

Intermediate step toward next-generation AI

DeepSeek describes V3.2-Exp as an “intermediate step toward our next-generation architecture”. This experimental release represents the company’s ongoing research into more efficient transformer architectures, particularly for processing extended text sequences.

“This experimental release represents our ongoing research into more efficient transformer architectures, particularly focusing on improving computational efficiency when processing extended text sequences,” the company stated in their official announcement.

The sparse attention technology could revolutionize how AI models handle long documents. Traditional attention mechanisms require exponentially more computing power as text length increases. DeepSeek’s innovation maintains quality while dramatically reducing these computational requirements.

Industry impact and competitive positioning

The release of V3.2-Exp reinforces DeepSeek’s position as a major disruptor in the AI industry. The company has consistently delivered high-performance models at significantly lower costs than competitors like OpenAI and Google.

DeepSeek’s approach differs fundamentally from established AI companies. While OpenAI charges $15-60 per million tokens for their o1 model, DeepSeek’s pricing remains a fraction of that cost. This democratizes access to advanced AI capabilities, making it feasible for startups and academic institutions with limited budgets.

The sparse attention breakthrough could pressure other AI companies to develop similar efficiency improvements. Long-context processing has been a major computational bottleneck for the entire industry, and DeepSeek’s solution addresses this challenge directly.

Availability and deployment options

DeepSeek-V3.2-Exp is immediately available through multiple channels:

DeepSeek’s web interface and mobile app
API access with new reduced pricing
HuggingFace platform for developers
vLLM with day-0 support

For organizations wanting to run the model locally, DeepSeek has released complete inference code. The model works on various hardware configurations, from Nvidia H200 to AMD chips. However, converting from HuggingFace model weights to local use requires adjustments for GPU configuration and expert settings.

The company has temporarily retained API access for V3.1-Terminus through October 15, 2025, to facilitate comparative testing by users. This allows developers to directly compare the two models before fully transitioning to the new version.

Future implications for AI development

The success of DeepSeek’s sparse attention mechanism could accelerate research into more efficient AI architectures. The technology addresses one of the industry’s most pressing challenges: processing long texts efficiently without sacrificing quality.

DeepSeek’s continued innovation demonstrates that significant performance improvements are possible without massive increases in computational resources. This approach contrasts with the industry trend of simply scaling up model size and training compute.

The open-source nature of DeepSeek’s kernels and documentation also supports broader research community adoption. By making their innovations accessible, DeepSeek enables other researchers and companies to build upon their work, potentially accelerating overall progress in AI efficiency.

DeepSeek-V3.2-Exp represents more than just another model release. It showcases a fundamentally different approach to AI development that prioritizes efficiency and accessibility alongside performance. As the technology matures, it could reshape expectations for what’s possible in artificial intelligence deployment and cost management.