Alibaba's SkillWeaver Framework Cuts AI Agent Token Consumption by Over 99%

Alibaba researchers published SkillWeaver, a framework that decomposes complex agent tasks and retrieves only the relevant tools from a library rather than loading an entire tool catalog into context, cutting token consumption by more than 99% compared to naive approaches, according to VentureBeat reporting July 2. Tested on a custom 300-query benchmark against a library of 2,209 real-world MCP tools, SkillWeaver's feedback-loop technique boosted task-decomposition accuracy from 51% to 92% with a larger model, while reducing per-query context use from an estimated 884,000 tokens to roughly 1,160.

>99%

Token Reduction vs. Naive Approach

~1,160 (from ~884,000)

Context Tokens Per Query (After)

51.0%

Decomposition Accuracy (7B model, vanilla)

67.7% (7B) / 92% (Qwen-Max)

Decomposition Accuracy (with SAD feedback loop)

CompSkillBench, 300 multi-step queries

Benchmark

Trace Cohen

Early-stage VC & angel · Founder, New York Venture Partners

July 2, 2026

3 min read

Alibaba researchers introduced SkillWeaver, a framework designed to solve one of the more persistent problems in enterprise AI agent deployment: routing complex, multi-step tasks to the correct tools from a library that can contain hundreds or thousands of options, according to VentureBeat reporting published July 2, 2026. The framework's headline result is a token-consumption reduction of more than 99% compared to the naive approach of exposing an agent to an entire tool library at once.

The underlying problem is straightforward but costly at scale: as enterprise agents integrate with massive tool ecosystems like the Model Context Protocol (MCP), accurately routing a query to the right tool becomes difficult, and simply feeding an LLM an entire tool library to let it figure out the right one is highly inefficient — it quickly overwhelms context limits and consumes hundreds of thousands of tokens per query. Most existing tool-use frameworks treat this as a single-skill selection problem, which breaks down for real-world business requests like "download the dataset, transform it, and create visual reports" that require sequencing multiple distinct tools into a cohesive plan.

SkillWeaver addresses this through three stages — Decompose, Retrieve, and Compose — plus a technique the researchers call Iterative Skill-Aware Decomposition (SAD). An LLM first breaks a complex query into a sequence of atomic sub-tasks; an embedding-based retriever then pulls a shortlist of the best-matching tools for each sub-task from the library; and a final planning stage checks for compatibility between tools and assembles an executable plan as a directed acyclic graph, allowing independent steps to run in parallel. SAD's feedback loop is the key innovation: rather than a one-shot decomposition, the LLM drafts an initial plan, runs a preliminary search, and then uses the retrieved (often more technically precise) tool names to rewrite its decomposition so its vocabulary matches what's actually available in the library.

“On the hardest tasks requiring four to five distinct skills, SAD improved accuracy by 50%.”

To evaluate the approach, the researchers built a custom benchmark, CompSkillBench, consisting of 300 multi-step queries against a library of 2,209 real-world tools sourced from the public MCP ecosystem across 24 functional categories including cloud infrastructure, finance and databases. Using a lightweight 7-billion-parameter model (Qwen2.5-7B-Instruct) for decomposition, the SAD feedback loop lifted decomposition accuracy from 51.0% in a vanilla setup to 67.7%; with the larger Qwen-Max model, accuracy reached 92%. On the hardest tasks requiring four to five distinct skills, SAD improved accuracy by 50%.

One of the more counterintuitive findings: larger models can actually perform worse than smaller ones when unguided, because they tend to over-decompose tasks into unnecessarily granular steps — a 14-billion-parameter model's accuracy fell below the 7B model's in the vanilla setup, until SAD's retrieved tool hints anchored it back to the actual available tools. A brute-force baseline that stuffed all tool names directly into a large model's prompt only retrieved the correct tool category 21.1% of the time despite near-perfect task-breakdown capability, while a traditional ReAct-style agent loop achieved 0% decomposition accuracy on the benchmark, collapsing multi-step plans into isolated actions.

The practical token savings are the headline number for enterprise adoption: SkillWeaver's targeted retrieve-and-route approach reduced estimated context consumption from roughly 884,000 tokens down to about 1,160 tokens per query — a 99.9% reduction that translates directly into lower API costs and faster response times for any team running agents against large, real-world tool libraries. The researchers have not yet released source code, but shared prompt templates and relied on off-the-shelf, easily reproducible components (a MiniLM-based embedding retriever with a FAISS index), meaning other teams can realistically implement the approach themselves using standard orchestration libraries.

For founders and engineering teams building enterprise AI agents, SkillWeaver's core lesson is that task-decomposition granularity, not raw model size, is the actual bottleneck in tool-routing accuracy — a finding with direct cost implications for any company running agents against MCP-scale tool libraries. For investors in agent infrastructure and orchestration tooling, this is a reminder that meaningful efficiency gains in agentic AI are increasingly coming from smarter retrieval and planning architecture rather than simply waiting for cheaper, faster frontier models.

What to watch: whether Alibaba releases SkillWeaver's source code for broader adoption and benchmarking, how the approach performs on tool libraries larger than the 2,209 tested here, and whether other labs or enterprise AI vendors adopt similar skill-aware decomposition techniques as agentic tool libraries continue to grow in scale.

Alibaba's SkillWeaver Framework Cuts AI Agent Token Consumption by Over 99%

>99%

Token Reduction vs. Naive Approach

~1,160 (from ~884,000)

Context Tokens Per Query (After)

51.0%

Decomposition Accuracy (7B model, vanilla)

67.7% (7B) / 92% (Qwen-Max)

Decomposition Accuracy (with SAD feedback loop)

CompSkillBench, 300 multi-step queries

Benchmark

Trace Cohen

Early-stage VC & angel · Founder, New York Venture Partners

July 2, 2026

3 min read

“On the hardest tasks requiring four to five distinct skills, SAD improved accuracy by 50%.”

Alibaba's SkillWeaver Framework Cuts AI Agent Token Consumption by Over 99%

Read Next

OpenAI Proposed Donating 5% of Its Equity to a US Sovereign Wealth Fund

Anthropic Is Discussing a New Custom AI Chip With Samsung

Zuckerberg Tells Staff AI Agents Haven't Progressed as Quickly as He'd Hoped

Alibaba's SkillWeaver Framework Cuts AI Agent Token Consumption by Over 99%

Read Next

OpenAI Proposed Donating 5% of Its Equity to a US Sovereign Wealth Fund

Anthropic Is Discussing a New Custom AI Chip With Samsung

Zuckerberg Tells Staff AI Agents Haven't Progressed as Quickly as He'd Hoped