DeepSeek Open-Sources DSpark, Claiming 60-85% Faster Generation

DeepSeek released DSpark, a 'semi-parallel' speculative-decoding module for its DeepSeek-V4 Flash and Pro models, and open-sourced DeepSpec, a full codebase for training and evaluating speculative-decoding draft models. The company claims generation runs 60-85% faster on Flash and 57-78% faster on Pro at the same throughput -- a fresh reminder that the open-weight challengers keep narrowing the efficiency gap.

DSpark (speculative decoding)

Module

DeepSpec codebase (MIT)

Open-Sourced

60-85% faster generation

Flash Speedup

57-78% faster

Pro Speedup

DeepSeek-V4 Flash & Pro

Targets

Trace Cohen

Early-stage VC & angel · Founder, New York Venture Partners

June 26, 2026

2 min read

DeepSeek has released DSpark, a 'semi-parallel' speculative-decoding module for its DeepSeek-V4 Flash and Pro checkpoints, and simultaneously open-sourced DeepSpec, a full-stack codebase for training and evaluating the draft models that power speculative decoding. The company says the technique speeds up generation by roughly 60-85% on Flash and 57-78% on Pro while holding throughput constant, with the enhanced checkpoints posted to Hugging Face under a permissive license.

Speculative decoding is one of the most important levers for cutting the cost of running large models. The idea is to use a small, fast 'draft' model to propose multiple tokens at once, then have the larger model verify them in parallel -- producing the same output far faster than generating one token at a time. DSpark's twist combines a heavy parallel head with a small sequential head, and DeepSeek reports it beats established methods like Eagle3 and DFlash on acceptance length, the key metric for how many proposed tokens survive verification.

“Speculative decoding is one of the most important levers for cutting the cost of running large models.”

The strategic significance is in the openness. By open-sourcing not just the result but the tooling to build speculative-decoding systems, DeepSeek hands the entire ecosystem a way to make inference cheaper -- and applies pressure to closed providers whose pricing depends partly on proprietary efficiency. It is the same playbook that made DeepSeek a disruptive force: compete on cost and open weights rather than chasing the absolute capability frontier.

The context sharpens the contrast with this week's other AI news. As Washington gates access to the most capable American models and the New York Times fights in court over how they were trained, a Chinese lab is giving away the means to run open models faster and cheaper. That dynamic -- frontier access tightening in the US while open-weight efficiency improves abroad -- is exactly the opening that could push developers and enterprises toward models they can self-host. It competes with the inference economics of OpenAI, Anthropic and the specialized serving stacks of Groq and Baseten.

The bear case is verification: headline speedup claims need independent reproduction, real-world gains vary by workload and hardware, and aggressive speculative decoding can trade away quality if acceptance rates drop on harder prompts. What to watch: third-party benchmarks of DSpark across diverse tasks, whether the open-source DeepSpec tooling gets adopted by other model providers, and how Western labs respond to a steadily closing efficiency gap.

DeepSeek Open-Sources DSpark, Claiming 60-85% Faster Generation

DSpark (speculative decoding)

Module

DeepSpec codebase (MIT)

Open-Sourced

60-85% faster generation

Flash Speedup

57-78% faster

Pro Speedup

DeepSeek-V4 Flash & Pro

Targets

Trace Cohen

Early-stage VC & angel · Founder, New York Venture Partners

June 26, 2026

2 min read

“Speculative decoding is one of the most important levers for cutting the cost of running large models.”

DeepSeek Open-Sources DSpark, Claiming 60-85% Faster Generation

Markets Now

Read Next

Liquid AI's Tiny LFM2.5-230M Beats Models 4x Its Size and Runs Anywhere

OpenAI's Updated GPT-5.5 Instant Gets Better at Shopping and Complex Constraints

The 'Software Factory' Myth: AI Is Helping Companies Ship Bugs Faster

DeepSeek Open-Sources DSpark, Claiming 60-85% Faster Generation

Markets Now

Read Next

Liquid AI's Tiny LFM2.5-230M Beats Models 4x Its Size and Runs Anywhere

OpenAI's Updated GPT-5.5 Instant Gets Better at Shopping and Complex Constraints

The 'Software Factory' Myth: AI Is Helping Companies Ship Bugs Faster