VC
Value Add VC
⚡HomePulse⚡Helpful Apps📝Blog
← Value Add PulseAIUp to 85% faster

DeepSeek and Peking University Open-Source DSpark, Speeding LLM Inference by Up to 85%

DeepSeek and Peking University open-sourced DSpark, a speculative-decoding framework that boosts per-user LLM generation speed by 60-85% -- and up to 661% throughput under tight latency constraints -- without hardware upgrades or model retraining. Released under the MIT license and already live in DeepSeek's V4-Flash and V4-Pro production models, it is another Chinese efficiency breakthrough aimed at slashing the cost of serving AI.

60-85% per user
Speedup
Up to 661%
Throughput Gain
MIT
License
DeepSeek V4-Flash / V4-Pro
Live On
Peking University
Co-developer
TC
Trace Cohen
Early-stage VC & angel · Founder, New York Venture Partners
June 29, 2026
2 min read
KEY TAKEAWAYS FOR VCs & FOUNDERS
1

Cutting inference cost by software alone undercuts the need for ever-more expensive accelerators

2

It is a second major Chinese AI release in 48 hours, alongside Meituan's LongCat-2.0

3

MIT-licensed and model-agnostic (supports Qwen, Gemma), it can spread across the ecosystem

4

Efficiency, not just scale, is becoming China's counter to the US chip embargo

TC
The VC Read · Trace's TakeTrace Cohen

Pair DSpark with Meituan's LongCat the same week and you see China's actual strategy under the chip embargo: if you can't buy more compute, extract more from what you have. An 85% inference speedup given away for free is a direct shot at the 'just buy more GPUs' reflex -- and because DeepSpec is model-agnostic, the gains leak into Qwen, Gemma and the whole open ecosystem. For founders, cheaper inference is pure margin upside, whoever ships it. The caveat is the usual one with vendor numbers: speculative decoding can trade accuracy for speed, so wait for independent benchmarks before you rip out your serving stack.

⚡ AI Chip Wars →🤖 AI Landscape →

DeepSeek and Peking University have jointly open-sourced DSpark, a speculative-decoding framework that accelerates large language model inference by 60% to 85% per user -- and delivers up to a 661% throughput gain under strict latency constraints -- with no hardware upgrades or model retraining required, according to VentureBeat. It is released under the MIT license on GitHub, alongside DeepSpec, a general-purpose codebase for training custom draft models.

The technical approach attacks a known limitation of speculative decoding. Classic methods train a separate, smaller 'draft' model to propose tokens that the larger target model then verifies -- effective, but costly to build and maintain. DSpark instead grafts the speculative head directly onto the target model, reducing layer duplication, and pairs a 'semi-autoregressive generation' method with a 'confidence-scheduled verification' system. The deployed configuration, 'DSpark-5,' improves per-user generation speed by 60-85% on DeepSeek-V4-Flash and 57-78% on V4-Pro.

“The deployed configuration, 'DSpark-5,' improves per-user generation speed by 60-85% on DeepSeek-V4-Flash and 57-78% on V4-Pro.”

The significance is economic. Inference -- the perpetual cost of serving a model with every query -- is the dominant and fastest-growing line item in AI, the same dynamic driving billion-dollar bets on Baseten, Groq and Upscale AI. A free, open framework that wrings 60-85% more speed out of existing hardware attacks that cost from the software side, reducing the pressure to buy ever-more accelerators. For a Chinese ecosystem constrained by US export controls on the most advanced GPUs, squeezing more out of available compute is a strategic necessity, not just an optimization.

The timing makes a pattern. DSpark landed within 48 hours of Meituan open-sourcing the 1.6-trillion-parameter LongCat-2.0 -- two MIT-licensed releases from Chinese players in the same window, both aimed at efficiency and openness. Crucially, DeepSpec is model-agnostic, with configurations supporting Alibaba's Qwen and Google's Gemma, so DSpark's gains can spread well beyond DeepSeek's own models and into the broader open-source community.

The bear case is that vendor-reported speedups need independent verification, real-world gains vary by workload, and speculative decoding can trade accuracy for speed if poorly tuned. Western enterprises may also hesitate to build inference infrastructure around Chinese-origin frameworks regardless of license. What to watch: independent benchmarks of DSpark across model families, how quickly the open-source community adopts DeepSpec, and whether efficiency breakthroughs like this meaningfully blunt the impact of US chip restrictions.

ShareXLinkedInEmail
More onDeepSeek →

Originally reported by VentureBeat. Analysis and editorial commentary by Value Add Pulse.

← Back to Pulse

Markets Now

live
S&P 500▲+1.13%
7,440.43
NASDAQ▲+1.82%
25,820.15
NVDA▼-0.39%
$194.97
MSFT▲+4.46%
$368.57
AAPL▲+2.40%
$281.74
GOOGL▲+2.89%
$353.65
META▲+3.63%
$562.60
AMZN▲+5.78%
$240.14

Read Next

AIFree for U.S.

Google Makes Gemini's Personalized AI Image Generation Free for All U.S. Users

Google opened Gemini's personalized AI image generation -- powered by its 'Nano Banana' model and able to pull context from a user's Gmail, Google Photos, YouTube and Search -- to all U.S. users for free, after previously gating it behind Plus, Pro and Ultra subscriptions. With Gemini past 750 million monthly active users, making a premium, data-personalized feature free is an aggressive distribution play against OpenAI and a bet that personal context is Google's structural edge.

AIHardware tease

OpenAI Teases Dedicated Hardware for Its Codex Coding Agent

OpenAI is teasing new hardware built around Codex, its AI coding agent, signaling ambitions to move beyond software and into purpose-built devices for developers. The hints fold into OpenAI's broader hardware push -- the secretive 'io' effort it built with former Apple design chief Jony Ive -- and suggest the company sees agentic coding as a flagship use case worth its own physical form factor.

AIMobile launch

Cursor Launches a Mobile App to Steer Your Coding Agent on the Go

Cursor released a mobile app that lets developers monitor and direct their AI coding agents from a phone -- kicking off tasks, reviewing changes and steering long-running work away from the desktop. It reflects a shift in how software gets built: as agents do more autonomous coding, the human role moves toward supervision, which no longer needs to be tethered to an IDE.

@Trace_Cohen·t@nyvp.com