DeepSeek and Peking University Open-Source DSpark, Speeding LLM Inference by Up to 85%

DeepSeek and Peking University open-sourced DSpark, a speculative-decoding framework that boosts per-user LLM generation speed by 60-85% -- and up to 661% throughput under tight latency constraints -- without hardware upgrades or model retraining. Released under the MIT license and already live in DeepSeek's V4-Flash and V4-Pro production models, it is another Chinese efficiency breakthrough aimed at slashing the cost of serving AI.

60-85% per user

Speedup

Up to 661%

Throughput Gain

MIT

License

DeepSeek V4-Flash / V4-Pro

Live On

Peking University

Co-developer

Trace Cohen

Early-stage VC & angel · Founder, New York Venture Partners

June 29, 2026

2 min read

DeepSeek and Peking University have jointly open-sourced DSpark, a speculative-decoding framework that accelerates large language model inference by 60% to 85% per user -- and delivers up to a 661% throughput gain under strict latency constraints -- with no hardware upgrades or model retraining required, according to VentureBeat. It is released under the MIT license on GitHub, alongside DeepSpec, a general-purpose codebase for training custom draft models.

The technical approach attacks a known limitation of speculative decoding. Classic methods train a separate, smaller 'draft' model to propose tokens that the larger target model then verifies -- effective, but costly to build and maintain. DSpark instead grafts the speculative head directly onto the target model, reducing layer duplication, and pairs a 'semi-autoregressive generation' method with a 'confidence-scheduled verification' system. The deployed configuration, 'DSpark-5,' improves per-user generation speed by 60-85% on DeepSeek-V4-Flash and 57-78% on V4-Pro.

“The deployed configuration, 'DSpark-5,' improves per-user generation speed by 60-85% on DeepSeek-V4-Flash and 57-78% on V4-Pro.”

The significance is economic. Inference -- the perpetual cost of serving a model with every query -- is the dominant and fastest-growing line item in AI, the same dynamic driving billion-dollar bets on Baseten, Groq and Upscale AI. A free, open framework that wrings 60-85% more speed out of existing hardware attacks that cost from the software side, reducing the pressure to buy ever-more accelerators. For a Chinese ecosystem constrained by US export controls on the most advanced GPUs, squeezing more out of available compute is a strategic necessity, not just an optimization.

The timing makes a pattern. DSpark landed within 48 hours of Meituan open-sourcing the 1.6-trillion-parameter LongCat-2.0 -- two MIT-licensed releases from Chinese players in the same window, both aimed at efficiency and openness. Crucially, DeepSpec is model-agnostic, with configurations supporting Alibaba's Qwen and Google's Gemma, so DSpark's gains can spread well beyond DeepSeek's own models and into the broader open-source community.

The bear case is that vendor-reported speedups need independent verification, real-world gains vary by workload, and speculative decoding can trade accuracy for speed if poorly tuned. Western enterprises may also hesitate to build inference infrastructure around Chinese-origin frameworks regardless of license. What to watch: independent benchmarks of DSpark across model families, how quickly the open-source community adopts DeepSpec, and whether efficiency breakthroughs like this meaningfully blunt the impact of US chip restrictions.

“The deployed configuration, 'DSpark-5,' improves per-user generation speed by 60-85% on DeepSeek-V4-Flash and 57-78% on V4-Pro.”

DeepSeek and Peking University Open-Source DSpark, Speeding LLM Inference by Up to 85%

Markets Now

Read Next

Google Makes Gemini's Personalized AI Image Generation Free for All U.S. Users

OpenAI Teases Dedicated Hardware for Its Codex Coding Agent

Cursor Launches a Mobile App to Steer Your Coding Agent on the Go

DeepSeek and Peking University Open-Source DSpark, Speeding LLM Inference by Up to 85%

Markets Now

Read Next

Google Makes Gemini's Personalized AI Image Generation Free for All U.S. Users

OpenAI Teases Dedicated Hardware for Its Codex Coding Agent

Cursor Launches a Mobile App to Steer Your Coding Agent on the Go