PatternTechIndividual & TeamVerified

Memory Engineering

Coding agents are stateless by default -- every session starts cold. Memory engineering is the discipline of building the durable layer that persists between runs, so an agent accumulates experience instead of relearning it. Where context engineering manages the live window, memory engineering manages what outlives it and gets retrieved back in.

The Pattern

"Agent memory is the mechanisms that we are implementing to make sure that state persists in our AI application -- so agents are able to accumulate information, turn data into memory, and have it inform the next execution step." -- Richmond Alake, MongoDB (source)

Coding agents are stateless by default: every session starts cold, and whatever the agent learned last time is gone. Memory engineering is the discipline of building the durable layer that persists between runs, so an agent accumulates experience instead of relearning it. Where context engineering manages the live, ephemeral window, memory engineering manages what outlives it and gets retrieved back in.

It borrows its structure from human memory (Richmond Alake):

Episodic -- what happened: past sessions, a lessons-learned log.
Semantic -- facts and conventions: an AGENTS.md, a knowledge base.
Procedural -- skills and routines: a reusable skill library.

Each has its own lifecycle -- generate, store, retrieve, update, forget.

Why It Matters

Memory is what turns a fast agent into one that improves. Voyager made the case: an agent that kept every skill it learned as executable code discovered 3.3x more, reached milestones up to 15.3x faster, and carried its skills into a new world where baseline agents started from zero (Wang et al.). The constraint is the one loop engineering keeps hitting: the context window is a cache that resets, not a memory, so anything that must survive has to live outside the model (Jazz Tong).

From RAG to files, and beyond

Where that memory lives has moved through three stages:

RAG first. Chunk the code, embed it, retrieve the nearest vectors. It backfired for code -- "when you chunk code for embeddings, you're literally tearing apart its logic," and a codebase that changes every commit sends the index stale, forcing constant re-embedding (Cline). Vector databases still suit long-term semantic recall at scale -- millions of embeddings retrieved by meaning -- which is where Milvus and its kin still earn their place (Milvus).
Files won. Agents now read and grep files just-in-time; Letta's Is a Filesystem All You Need? even beat a graph memory on a multi-session recall benchmark with plain files (codepointer). The cost is rediscovery -- no index, so the agent re-explores the same ground each session.
Richer memory (the frontier). Structured stores that do more than recall text: entity extraction and the relations between entities in a knowledge graph, episodic history, and the code's AST. Zep's Graphiti builds a temporal graph of episodic, entity, and semantic nodes (Zep); Mem0, Cognee, and Letta each sit at a different point on the spectrum (comparison).

Two problems stay open: no one has scaled a shared memory to the organization without it rotting, and it does not fit a regular database -- episodic streams, temporal graphs, and code structure each want their own store. Memory is the part of the agentic stack still most clearly in research.

Sources

Last reviewed: 2026-06-25