A New Agentic Memory Framework Uses 118K Tokens Per Query -- LangMem Burns 3.26M

A newly detailed agentic memory framework reportedly answers queries using about 118,000 tokens, versus roughly 3.26 million for the widely used LangMem approach -- a ~28x efficiency gain. As AI agents take on longer, multi-step tasks, how they store and retrieve memory is becoming a decisive factor in both cost and capability.

~118K tokens/query

New Framework

~3.26M tokens/query

LangMem

~28x

Efficiency Gain

Agentic memory / orchestration

Domain

Trace Cohen

Early-stage VC & angel · Founder, New York Venture Partners

June 26, 2026

2 min read

A new agentic memory framework is drawing attention for a striking efficiency claim: it answers queries using roughly 118,000 tokens, compared with about 3.26 million tokens for LangMem, a widely adopted memory approach -- a reduction of nearly 28x. As AI agents move from single-shot prompts to long, multi-step tasks, the way they manage memory has quietly become one of the most consequential design choices in the stack.

The issue is that agents need to remember context across many steps -- prior actions, retrieved facts, intermediate results -- and naive approaches keep stuffing more into the model's context window, ballooning token consumption. Since every token costs money and adds latency, a memory system that achieves the same outcome with a fraction of the tokens translates directly into cheaper, faster, more scalable agents. At production volumes, a 28x difference is the gap between an economically viable agent and one that bleeds cash.

“At production volumes, a 28x difference is the gap between an economically viable agent and one that bleeds cash.”

The finding lands amid a broader reckoning in enterprise AI orchestration. The same week, VentureBeat highlighted warnings that many companies believe they are building a 'software factory' when they are really just shipping bugs faster -- a reminder that naive agent deployment can multiply cost and error rather than value. Efficient memory is part of the discipline that separates real production systems from impressive demos.

The competitive landscape spans the fast-growing agent-infrastructure layer: LangChain's LangMem, frameworks from LlamaIndex, memory layers like Mem0, and the orchestration tooling baked into platforms from the major labs. Memory efficiency is becoming a benchmark category of its own, and frameworks that demonstrably cut token overhead while preserving task quality will have a real edge as agentic workloads scale.

The bear case: a single efficiency benchmark can be cherry-picked, real-world memory needs vary widely by task, and aggressive token reduction can trade away recall or accuracy. What to watch: independent benchmarks across diverse agent tasks, whether the efficiency holds without degrading performance, and how quickly memory optimization becomes a standard requirement rather than a differentiator.

“At production volumes, a 28x difference is the gap between an economically viable agent and one that bleeds cash.”

A New Agentic Memory Framework Uses 118K Tokens Per Query -- LangMem Burns 3.26M

Markets Now

Read Next

OpenAI Launches GPT-5.6 -- But the Government Decides Who Gets to Use It

Liquid AI's Tiny LFM2.5-230M Beats Models 4x Its Size and Runs Anywhere

OpenAI's Updated GPT-5.5 Instant Gets Better at Shopping and Complex Constraints

A New Agentic Memory Framework Uses 118K Tokens Per Query -- LangMem Burns 3.26M

Markets Now

Read Next

OpenAI Launches GPT-5.6 -- But the Government Decides Who Gets to Use It

Liquid AI's Tiny LFM2.5-230M Beats Models 4x Its Size and Runs Anywhere

OpenAI's Updated GPT-5.5 Instant Gets Better at Shopping and Complex Constraints