A New Agentic-Memory Framework Uses 118K Tokens Per Query -- LangMem Burns 3.26M

A new agentic-memory framework reportedly answers queries using about 118,000 tokens, versus roughly 3.26 million for the widely used LangMem approach -- a nearly 28x reduction in token consumption. As AI agents proliferate, how efficiently they remember and retrieve context is becoming a decisive driver of cost, latency and viability.

~118K tokens / query

New Framework

~3.26M tokens / query

LangMem

~28x fewer tokens

Reduction

Agentic memory

Domain

Trace Cohen

Early-stage VC & angel · Founder, New York Venture Partners

June 26, 2026

2 min read

A newly detailed agentic-memory framework reportedly handles queries using roughly 118,000 tokens, compared with about 3.26 million tokens for the popular LangMem approach -- a nearly 28-fold reduction in token consumption, according to VentureBeat. In a world where every token has a price and adds latency, that gap is the difference between an agent that is economically viable at scale and one that is not.

The problem it addresses is becoming central to building real AI agents. Agents need memory -- the ability to recall prior context, facts and interactions across long-running tasks -- but naive approaches balloon the amount of text fed into the model on every step, driving up cost and slowing responses. As agents take on multi-step, persistent workflows, inefficient memory can make them prohibitively expensive, turning an impressive demo into an untenable product.

“The problem it addresses is becoming central to building real AI agents.”

The efficiency framing reflects a maturing of the agent ecosystem. The early phase celebrated what agents could do; the current phase is obsessed with what they cost to run reliably. Token-per-query is becoming a headline metric precisely because it maps directly to gross margin for anyone deploying agents at volume -- the same economic logic driving investment into inference chips and serving software from Groq to Baseten.

Memory architecture is now a genuine competitive battleground. Frameworks like LangMem, along with the memory layers inside agent platforms from LangChain, LlamaIndex and a field of startups, are competing on how intelligently they store, compress and retrieve context. A framework claiming a near-30x efficiency advantage, if it holds up, is the kind of infrastructure improvement that quietly determines which agent products can actually make money.

The bear case is the usual caveat on benchmark claims: a single comparison may not generalize, token savings can trade off against answer quality or recall, and real-world workloads vary. What to watch: independent reproductions of the efficiency numbers across diverse tasks, whether the framework preserves answer quality at lower token counts, and how quickly efficient memory becomes standard in production agent stacks.

“The problem it addresses is becoming central to building real AI agents.”

A New Agentic-Memory Framework Uses 118K Tokens Per Query -- LangMem Burns 3.26M

Markets Now

Read Next

OpenAI Unveils GPT-5.6 Sol, Terra and Luna -- but Only for Government-Vetted Preview Partners

Claude Code Turned Every Engineer Into Three -- Now Companies Need More Product Thinkers

SoftBank's CEO Isn't Alone in Doubting Elon Musk's Orbital Data Center Hype

A New Agentic-Memory Framework Uses 118K Tokens Per Query -- LangMem Burns 3.26M

Markets Now

Read Next

OpenAI Unveils GPT-5.6 Sol, Terra and Luna -- but Only for Government-Vetted Preview Partners

Claude Code Turned Every Engineer Into Three -- Now Companies Need More Product Thinkers

SoftBank's CEO Isn't Alone in Doubting Elon Musk's Orbital Data Center Hype