VC
Value Add VC
⚡HomePulse⚡Helpful Apps📝Blog
← Value Add PulseAI

A New Agentic Memory Framework Uses 118K Tokens Per Query -- LangMem Burns 3.26M

A newly detailed agentic memory framework reportedly answers queries using about 118,000 tokens, versus roughly 3.26 million for the widely used LangMem approach -- a ~28x efficiency gain. As AI agents take on longer, multi-step tasks, how they store and retrieve memory is becoming a decisive factor in both cost and capability.

~118K tokens/query
New Framework
~3.26M tokens/query
LangMem
~28x
Efficiency Gain
Agentic memory / orchestration
Domain
TC
Trace Cohen
Early-stage VC & angel · Founder, New York Venture Partners
June 26, 2026
2 min read
KEY TAKEAWAYS FOR VCs & FOUNDERS
1

Token efficiency is real money: a ~28x reduction directly cuts the cost of running long-horizon agents

2

Memory architecture is emerging as a core differentiator in agent performance, not an afterthought

3

Bloated memory is a hidden tax on every agentic application -- efficiency unlocks viable economics

4

It signals the agent stack maturing from prototypes toward production-grade cost discipline

TC
The VC Read · Trace's TakeTrace Cohen

Nobody tweets about memory architecture, but a 28x token reduction is the kind of unglamorous engineering that decides whether an agent company has a business or a burn problem. At production scale, memory bloat is a hidden tax on every query, and the teams that get it right ship agents that are actually economical to run. This is the agent stack growing up -- moving from 'look what it can do' to 'what does it cost per task.' Watch whether the efficiency holds without gutting recall; the easy way to use fewer tokens is to remember less, and that's not a win.

🤖 AI Landscape →Enterprise AI Agents →

A new agentic memory framework is drawing attention for a striking efficiency claim: it answers queries using roughly 118,000 tokens, compared with about 3.26 million tokens for LangMem, a widely adopted memory approach -- a reduction of nearly 28x. As AI agents move from single-shot prompts to long, multi-step tasks, the way they manage memory has quietly become one of the most consequential design choices in the stack.

The issue is that agents need to remember context across many steps -- prior actions, retrieved facts, intermediate results -- and naive approaches keep stuffing more into the model's context window, ballooning token consumption. Since every token costs money and adds latency, a memory system that achieves the same outcome with a fraction of the tokens translates directly into cheaper, faster, more scalable agents. At production volumes, a 28x difference is the gap between an economically viable agent and one that bleeds cash.

“At production volumes, a 28x difference is the gap between an economically viable agent and one that bleeds cash.”

The finding lands amid a broader reckoning in enterprise AI orchestration. The same week, VentureBeat highlighted warnings that many companies believe they are building a 'software factory' when they are really just shipping bugs faster -- a reminder that naive agent deployment can multiply cost and error rather than value. Efficient memory is part of the discipline that separates real production systems from impressive demos.

The competitive landscape spans the fast-growing agent-infrastructure layer: LangChain's LangMem, frameworks from LlamaIndex, memory layers like Mem0, and the orchestration tooling baked into platforms from the major labs. Memory efficiency is becoming a benchmark category of its own, and frameworks that demonstrably cut token overhead while preserving task quality will have a real edge as agentic workloads scale.

The bear case: a single efficiency benchmark can be cherry-picked, real-world memory needs vary widely by task, and aggressive token reduction can trade away recall or accuracy. What to watch: independent benchmarks across diverse agent tasks, whether the efficiency holds without degrading performance, and how quickly memory optimization becomes a standard requirement rather than a differentiator.

ShareXLinkedInEmail

Originally reported by VentureBeat. Analysis and editorial commentary by Value Add Pulse.

← Back to Pulse

Markets Now

live
SPCX▲+0.73%
$233.10
CBRS▲+0.66%
$259.80
SPY▲+0.07%
5,952.40
QQQ▲+0.12%
20,061.30
NVDA▼-0.98%
$152.10
MSFT▲+0.29%
$482.60
GOOGL▲+0.34%
$209.10
META▲+0.23%
$655.40

Read Next

AIGPT-5.6 (Sol/Terra/Luna)

OpenAI Launches GPT-5.6 -- But the Government Decides Who Gets to Use It

OpenAI unveiled its most capable models yet -- GPT-5.6 Sol, Terra and Luna -- but at the Trump administration's request limited the initial release to a small group of trusted partners whose identities are shared with the government. OpenAI complied while publicly objecting, warning that 'this kind of government access process should not become the long-term default.' The move mirrors restrictions placed on Anthropic's frontier models and marks the first time the US government is effectively vetting who can access America's leading AI systems.

AI

Liquid AI's Tiny LFM2.5-230M Beats Models 4x Its Size and Runs Anywhere

Liquid AI released LFM2.5-230M, its smallest model yet, which the company says outperforms models four times its size at data extraction while being small enough to run 'anywhere' -- including phones, laptops and edge devices. The release advances the counter-narrative to ever-larger models: that efficient, specialized small models can win on the tasks enterprises actually run at scale.

AI

OpenAI's Updated GPT-5.5 Instant Gets Better at Shopping and Complex Constraints

OpenAI shipped an updated GPT-5.5 Instant that is better at shopping, handling complex constraints, and understanding user intent -- and it's already live in the API. The release sharpens OpenAI's fast, low-latency tier for agentic and commerce use cases, even as the company's more powerful new GPT-5.6 family sits behind a government-vetted gate.

@Trace_Cohen·t@nyvp.com