VC
Value Add VC
⚑HomePulse⚑Helpful AppsπŸ“Blog
← Value Add PulseAI>99% token reduction

Alibaba's SkillWeaver Framework Cuts AI Agent Token Consumption by Over 99%

Alibaba researchers published SkillWeaver, a framework that decomposes complex agent tasks and retrieves only the relevant tools from a library rather than loading an entire tool catalog into context, cutting token consumption by more than 99% compared to naive approaches, according to VentureBeat reporting July 2. Tested on a custom 300-query benchmark against a library of 2,209 real-world MCP tools, SkillWeaver's feedback-loop technique boosted task-decomposition accuracy from 51% to 92% with a larger model, while reducing per-query context use from an estimated 884,000 tokens to roughly 1,160.

>99%
Token Reduction vs. Naive Approach
~1,160 (from ~884,000)
Context Tokens Per Query (After)
51.0%
Decomposition Accuracy (7B model, vanilla)
67.7% (7B) / 92% (Qwen-Max)
Decomposition Accuracy (with SAD feedback loop)
CompSkillBench, 300 multi-step queries
Benchmark
TC
Trace Cohen
Early-stage VC & angel Β· Founder, New York Venture Partners
July 2, 2026
3 min read
ShareXLinkedInEmail
KEY TAKEAWAYS FOR VCs & FOUNDERS
1

A 99.9% reduction in context tokens per query (from ~884,000 to ~1,160) translates directly into dramatically lower API costs and faster response times for any enterprise running agents against large tool libraries

2

The finding that larger, more expensive models can perform worse than smaller ones when unguided challenges the assumption that scaling model size alone solves tool-routing problems

3

Built and tested against 2,209 real-world tools from the public MCP ecosystem, directly relevant to the growing number of enterprises building multi-tool agentic workflows via Model Context Protocol

4

Comes without released source code, but the authors' prompt templates and reliance on off-the-shelf embedding models mean the approach is realistically reproducible by other teams

TC
The VC Read Β· Trace's TakeTrace Cohen

The finding that a bigger, more expensive model actually got worse at tool routing when unguided is the part every founder building agent infrastructure should sit with β€” it's a direct rebuttal to the lazy assumption that scaling model size alone solves orchestration problems, and it means the real moat in enterprise agentic AI is increasingly retrieval and planning architecture, not access to the biggest frontier model. A 99.9% cut in per-query token consumption is the kind of unit-economics improvement that actually determines whether an enterprise agent deployment is profitable at scale, not a marginal optimization. For founders building on MCP or similar multi-tool ecosystems, this is a clear signal that investing in skill-aware decomposition and retrieval now will matter more than chasing the next model upgrade. Watch whether Alibaba releases the code β€” if this becomes a widely adopted open technique, it resets the baseline cost expectation for every agent vendor competing on price.

🏒 Enterprise AI Adoption β†’πŸ€– AI Landscape β†’

Alibaba researchers introduced SkillWeaver, a framework designed to solve one of the more persistent problems in enterprise AI agent deployment: routing complex, multi-step tasks to the correct tools from a library that can contain hundreds or thousands of options, according to VentureBeat reporting published July 2, 2026. The framework's headline result is a token-consumption reduction of more than 99% compared to the naive approach of exposing an agent to an entire tool library at once.

The underlying problem is straightforward but costly at scale: as enterprise agents integrate with massive tool ecosystems like the Model Context Protocol (MCP), accurately routing a query to the right tool becomes difficult, and simply feeding an LLM an entire tool library to let it figure out the right one is highly inefficient β€” it quickly overwhelms context limits and consumes hundreds of thousands of tokens per query. Most existing tool-use frameworks treat this as a single-skill selection problem, which breaks down for real-world business requests like "download the dataset, transform it, and create visual reports" that require sequencing multiple distinct tools into a cohesive plan.

SkillWeaver addresses this through three stages β€” Decompose, Retrieve, and Compose β€” plus a technique the researchers call Iterative Skill-Aware Decomposition (SAD). An LLM first breaks a complex query into a sequence of atomic sub-tasks; an embedding-based retriever then pulls a shortlist of the best-matching tools for each sub-task from the library; and a final planning stage checks for compatibility between tools and assembles an executable plan as a directed acyclic graph, allowing independent steps to run in parallel. SAD's feedback loop is the key innovation: rather than a one-shot decomposition, the LLM drafts an initial plan, runs a preliminary search, and then uses the retrieved (often more technically precise) tool names to rewrite its decomposition so its vocabulary matches what's actually available in the library.

β€œOn the hardest tasks requiring four to five distinct skills, SAD improved accuracy by 50%.”

To evaluate the approach, the researchers built a custom benchmark, CompSkillBench, consisting of 300 multi-step queries against a library of 2,209 real-world tools sourced from the public MCP ecosystem across 24 functional categories including cloud infrastructure, finance and databases. Using a lightweight 7-billion-parameter model (Qwen2.5-7B-Instruct) for decomposition, the SAD feedback loop lifted decomposition accuracy from 51.0% in a vanilla setup to 67.7%; with the larger Qwen-Max model, accuracy reached 92%. On the hardest tasks requiring four to five distinct skills, SAD improved accuracy by 50%.

One of the more counterintuitive findings: larger models can actually perform worse than smaller ones when unguided, because they tend to over-decompose tasks into unnecessarily granular steps β€” a 14-billion-parameter model's accuracy fell below the 7B model's in the vanilla setup, until SAD's retrieved tool hints anchored it back to the actual available tools. A brute-force baseline that stuffed all tool names directly into a large model's prompt only retrieved the correct tool category 21.1% of the time despite near-perfect task-breakdown capability, while a traditional ReAct-style agent loop achieved 0% decomposition accuracy on the benchmark, collapsing multi-step plans into isolated actions.

The practical token savings are the headline number for enterprise adoption: SkillWeaver's targeted retrieve-and-route approach reduced estimated context consumption from roughly 884,000 tokens down to about 1,160 tokens per query β€” a 99.9% reduction that translates directly into lower API costs and faster response times for any team running agents against large, real-world tool libraries. The researchers have not yet released source code, but shared prompt templates and relied on off-the-shelf, easily reproducible components (a MiniLM-based embedding retriever with a FAISS index), meaning other teams can realistically implement the approach themselves using standard orchestration libraries.

For founders and engineering teams building enterprise AI agents, SkillWeaver's core lesson is that task-decomposition granularity, not raw model size, is the actual bottleneck in tool-routing accuracy β€” a finding with direct cost implications for any company running agents against MCP-scale tool libraries. For investors in agent infrastructure and orchestration tooling, this is a reminder that meaningful efficiency gains in agentic AI are increasingly coming from smarter retrieval and planning architecture rather than simply waiting for cheaper, faster frontier models.

What to watch: whether Alibaba releases SkillWeaver's source code for broader adoption and benchmarking, how the approach performs on tool libraries larger than the 2,209 tested here, and whether other labs or enterprise AI vendors adopt similar skill-aware decomposition techniques as agentic tool libraries continue to grow in scale.

ShareXLinkedInEmail

Originally reported by VentureBeat. Analysis and editorial commentary by Value Add Pulse.

← Back to Pulse

Read Next

AI5% equity proposed

OpenAI Proposed Donating 5% of Its Equity to a US Sovereign Wealth Fund

OpenAI CEO Sam Altman proposed donating 5% of the company's equity to a US sovereign wealth fund, the Financial Times reported July 2, with other AI companies expected to make similar contributions so the public can share directly in AI-driven financial gains. The proposal follows a April 2026 OpenAI policy paper arguing such a fund could 'invest directly in AI labs and companies deploying their technology,' and lands alongside competing legislation from Senator Bernie Sanders proposing a 50% one-time tax on systemically important AI companies instead.

AIEarly-stage chip talks

Anthropic Is Discussing a New Custom AI Chip With Samsung

Anthropic is in early discussions with Samsung to develop a custom AI chip, The Information reported July 2, though the companies have not finalized specifications, use cases, or how powerful the chip would be. The move follows OpenAI's custom inference processor 'JalapeΓ±o,' announced the prior week through a partnership with Broadcom, and comes as Anthropic separately explores ways to address chip shortages it flagged as early as April 2026.

AI$145B 2026 AI spend

Zuckerberg Tells Staff AI Agents Haven't Progressed as Quickly as He'd Hoped

Meta CEO Mark Zuckerberg told employees at an internal town hall this week that AI agent development has not 'accelerated in the way' executives previously anticipated, and that benefits from Meta's AI-focused reorganization 'haven't come to fruition yet.' The admission follows a restructuring earlier in 2026 that laid off roughly 8,000 employees (10% of the workforce) while reassigning 7,000 more into AI groups including 'Agent Transformation,' as Meta spends an expected $145 billion this year on AI infrastructure.

@Trace_CohenΒ·t@nyvp.com