VC
Value Add VC
⚡HomePulse⚡Helpful Apps📝Blog
← Value Add PulseAI

DeepSeek Open-Sources DSpark, Claiming 60-85% Faster Generation

DeepSeek released DSpark, a 'semi-parallel' speculative-decoding module for its DeepSeek-V4 Flash and Pro models, and open-sourced DeepSpec, a full codebase for training and evaluating speculative-decoding draft models. The company claims generation runs 60-85% faster on Flash and 57-78% faster on Pro at the same throughput -- a fresh reminder that the open-weight challengers keep narrowing the efficiency gap.

DSpark (speculative decoding)
Module
DeepSpec codebase (MIT)
Open-Sourced
60-85% faster generation
Flash Speedup
57-78% faster
Pro Speedup
DeepSeek-V4 Flash & Pro
Targets
TC
Trace Cohen
Early-stage VC & angel · Founder, New York Venture Partners
June 26, 2026
2 min read
KEY TAKEAWAYS FOR VCs & FOUNDERS
1

Open-sourcing inference speedups hands the whole ecosystem cheaper generation, pressuring closed providers

2

Speculative decoding is becoming the default lever for cutting AI serving costs

3

It reinforces DeepSeek's strategy of competing on efficiency and openness, not just raw scale

4

Faster open-weight inference strengthens the case for running models on your own hardware

TC
The VC Read · Trace's TakeTrace Cohen

While the US locks the frontier behind a government list and lawyers fight over training data, DeepSeek just gave the world a way to run open models 60-85% faster for free -- that asymmetry is the whole story. Efficiency, not raw capability, is where the open-weight camp keeps winning, and speculative decoding is the cheapest cost lever in AI right now. The catch is always the same with DeepSeek claims: reproduce the benchmarks before you believe them, because faster generation can quietly cost you quality on hard prompts. If DeepSpec gets adopted by other providers, the closed labs' pricing power erodes a little more.

🤖 AI Landscape →⚡ AI Chip Wars →

DeepSeek has released DSpark, a 'semi-parallel' speculative-decoding module for its DeepSeek-V4 Flash and Pro checkpoints, and simultaneously open-sourced DeepSpec, a full-stack codebase for training and evaluating the draft models that power speculative decoding. The company says the technique speeds up generation by roughly 60-85% on Flash and 57-78% on Pro while holding throughput constant, with the enhanced checkpoints posted to Hugging Face under a permissive license.

Speculative decoding is one of the most important levers for cutting the cost of running large models. The idea is to use a small, fast 'draft' model to propose multiple tokens at once, then have the larger model verify them in parallel -- producing the same output far faster than generating one token at a time. DSpark's twist combines a heavy parallel head with a small sequential head, and DeepSeek reports it beats established methods like Eagle3 and DFlash on acceptance length, the key metric for how many proposed tokens survive verification.

“Speculative decoding is one of the most important levers for cutting the cost of running large models.”

The strategic significance is in the openness. By open-sourcing not just the result but the tooling to build speculative-decoding systems, DeepSeek hands the entire ecosystem a way to make inference cheaper -- and applies pressure to closed providers whose pricing depends partly on proprietary efficiency. It is the same playbook that made DeepSeek a disruptive force: compete on cost and open weights rather than chasing the absolute capability frontier.

The context sharpens the contrast with this week's other AI news. As Washington gates access to the most capable American models and the New York Times fights in court over how they were trained, a Chinese lab is giving away the means to run open models faster and cheaper. That dynamic -- frontier access tightening in the US while open-weight efficiency improves abroad -- is exactly the opening that could push developers and enterprises toward models they can self-host. It competes with the inference economics of OpenAI, Anthropic and the specialized serving stacks of Groq and Baseten.

The bear case is verification: headline speedup claims need independent reproduction, real-world gains vary by workload and hardware, and aggressive speculative decoding can trade away quality if acceptance rates drop on harder prompts. What to watch: third-party benchmarks of DSpark across diverse tasks, whether the open-source DeepSpec tooling gets adopted by other model providers, and how Western labs respond to a steadily closing efficiency gap.

ShareXLinkedInEmail
More onDeepSeek →

Originally reported by DeepSeek / DeepSpec. Analysis and editorial commentary by Value Add Pulse.

← Back to Pulse

Markets Now

live
SPCX▲+0.75%
$234.85
CBRS▼-0.92%
$257.40
SPY▲+0.16%
5,961.80
QQQ▲+0.19%
20,098.50
NVDA▼-0.99%
$150.60
MSFT▼-0.52%
$480.10
GOOGL▲+0.57%
$210.30
META▲+0.38%
$657.90

Read Next

AI

Liquid AI's Tiny LFM2.5-230M Beats Models 4x Its Size and Runs Anywhere

Liquid AI released LFM2.5-230M, its smallest model yet, which the company says outperforms models four times its size at data extraction while being small enough to run 'anywhere' -- including phones, laptops and edge devices. The release advances the counter-narrative to ever-larger models: that efficient, specialized small models can win on the tasks enterprises actually run at scale.

AI

OpenAI's Updated GPT-5.5 Instant Gets Better at Shopping and Complex Constraints

OpenAI shipped an updated GPT-5.5 Instant that is better at shopping, handling complex constraints, and understanding user intent -- and it's already live in the API. The release sharpens OpenAI's fast, low-latency tier for agentic and commerce use cases, even as the company's more powerful new GPT-5.6 family sits behind a government-vetted gate.

AI

The 'Software Factory' Myth: AI Is Helping Companies Ship Bugs Faster

A widely shared analysis argues that most enterprises adopting AI coding tools to build a 'software factory' are really just shipping bugs faster: AI accelerates code production, but downstream testing, review and CI/CD don't scale with it, so defects and incidents climb. Data cited from Faros AI shows developer throughput up sharply -- but incidents and bugs rising even faster.

@Trace_Cohen·t@nyvp.com