6.3 · Case Study: Giving AI Agents a Long Memory

The Persistent Local Store (MB - GB)

How AI coding agents avoid hallucinations and "rediscovery" using embedded SQL databases. The same SQLite-class engine from 6.2, doing a brand-new job.

Why does Claude keep forgetting what we built last week?

Goal: Understand how AI agents use persistent local databases to manage context across sessions.

On Monday you and Claude Code build a custom auth module together. On Tuesday you open a new session and ask Claude to extend it. Claude confidently calls a function verifyJWTWithScope() that does not exist anywhere in your codebase. With no memory of yesterday's session, the agent filled the gap with a plausible-looking name from its training priors.

That's a hallucination: a confident, fluent output that happens to be wrong. The agent is doing its best in the absence of context, and without memory, "its best" includes inventing things that sound right.


The Window Is Not Enough

Modern AI agents work inside a context window, the chunk of text the model can see at one time. Claude's window is around 200,000 to 1,000,000 tokens. That sounds enormous until you compare it to a real project: a single mid-sized codebase plus a few weeks of design discussion can run into the tens of millions of tokens.

Worse, every new session starts empty. Yesterday's plans, decisions, and bug fixes evaporate the moment the chat window closes.

The result is two failure modes that anyone using a coding agent has hit:

  • Hallucinations. The agent fills the gap by inventing names, signatures, and APIs that "look right." Sometimes they exist. Sometimes they do not.

  • Rediscovery. The agent re-derives the same pattern, the same bug fix, the same architectural decision, every session. Your time and your tokens both pay for it.

What is missing is what every other multi-session app on your computer already has: a persistent local database.


Tobi Lütke's Weekend Project

In late 2025, Tobi Lütke (CEO of Shopify, a $150B company) ran into exactly this pain on his own coding agent. Over a weekend, he built QMD, an open-source local search engine that gives AI agents persistent memory across sessions (github.com/tobi/qmd).

QMD indexes whatever you point it at (docs, meeting notes, codebases) and retrieves the few snippets that actually matter using a hybrid of keyword search, vector embeddings, and LLM reranking. It exposes itself over MCP so agents like Claude Code can call it mid-session.

The premise: an agent with persistent searchable memory does not hallucinate the name of a function it could just look up.


The Storage Layer: SQLite for Everyone

QMD is one of several tools in this space. The claude-mem plugin runs alongside Claude Code and passively logs observations during your sessions (decisions, patterns, todos) for later recall. Aider tackles a related problem: helping the agent understand the current codebase rather than the session history.

All three converge on the same storage backend: SQLite.

  • QMD's index lives at ~/.cache/qmd/index.sqlite with the sqlite-vec extension and SQLite's built-in FTS5 module.

  • claude-mem keeps sessions and observations in SQLite, paired with a ChromaDB sidecar for vector search.

  • Aider's repo-map cache uses diskcache, itself SQLite under the hood.

The reason is the same calculus that put SQLite into Spotify Mobile in case study 6.2:

  • Zero config. The agent runs locally. The database has to start up the moment the agent does, with no separate server to launch or admin.

  • Embedded. It runs inside the agent's process. No network hop, no auth surface to harden.

  • Transactional. When the agent writes an observation mid-task, it should not corrupt the store if the laptop sleeps or the process dies.

  • One file. You can copy it, version it, mail it to a colleague, or open it in any SQLite client and run queries against it.

A simplified observation store fits in one schema:

CREATE TABLE observations (
    id           INTEGER PRIMARY KEY,
    session_id   TEXT,
    project      TEXT,
    created_at   TIMESTAMP,
    kind         TEXT,        -- 'decision' | 'pattern' | 'bug-fix' | 'todo'
    content      TEXT,
    embedding    BLOB         -- vector for semantic search
);

CREATE INDEX idx_session ON observations(session_id);
CREATE INDEX idx_project ON observations(project);

Three Ranking Signals

Once the storage is set, agent memory becomes a question of what to rank by. Three signals show up across every serious tool in this space, and a single agent often combines all three.

1. Keyword search (BM25, short for "Best Matching 25", a standard text-ranking algorithm). Classical full-text search, weighted by term frequency and inverse document frequency. SQLite has it built in via the FTS5 module. Best for exact recall: "find me where we mentioned verifyJWTWithScope." Plain SQL gets you most of the way:

-- "What did we decide about auth in this project?"
SELECT content FROM observations
WHERE project = 'cs145'
  AND kind = 'decision'
  AND content LIKE '%auth%'
ORDER BY created_at DESC
LIMIT 10;

2. Vector similarity. Embedding-based semantic search. The user's question is converted to a vector by an external embedding model, and the agent retrieves observations whose vectors point in the same direction. Best for paraphrase: "what are we doing about token refresh?" surfaces any observation about session lifetime, even if "refresh" never appears verbatim. QMD uses sqlite-vec; claude-mem uses ChromaDB:

-- "What are we doing about token refresh?"
-- :query_embedding is computed beforehand by an external embedding model.
SELECT content FROM observations
WHERE project = 'cs145'
ORDER BY vec_distance_cosine(embedding, :query_embedding) ASC
LIMIT 10;

3. Graph centrality (PageRank). A signal for code-aware retrieval rather than session memory. Parse the codebase with tree-sitter, treat files as nodes and symbol references as edges, run personalized PageRank with the seed set to whatever the user is currently asking about. The output is a ranked list of structurally important symbols, compressed into a tight token budget (Aider's default is 1K tokens). Aider's RepoMap is the canonical implementation; claude-mem ships its own version as the smart-explore skill. The PageRank computation lives in application code (Aider uses NetworkX), but the symbol graph itself sits in the same SQLite store.

The three signals are orthogonal. BM25 catches what you said. Vectors catch what you meant. PageRank catches what is structurally relevant in your code right now. Most production systems mix at least two, but the right mix is workload-specific (a research codebase wants different ranking than a CRUD app; a long-running session wants different ranking than a fresh one), and the only way to find it is to experiment.

Experimentation is cheap because the storage layer speaks SQL. Tuning the retrieval is a query edit, not a rebuild. Want to weight BM25 hits twice as heavily as vector matches? Edit the ORDER BY. Want to filter by recency before ranking? Add a WHERE. Want to A/B two ranking schemes against the same observation set? Run them side by side. This is the same declarative power you have been working with all course (tell the engine what you want, let it figure out how), and SQLite is what delivers that power inside the agent's process. The same property that lets a data analyst iterate on a report by editing one line of SQL is what lets an agent builder iterate on memory by editing one line of SQL.

When the agent receives a new question, it runs one or more of these retrievals first, stuffs the results into the prompt, then answers. The facts are now grounded in something the agent looked up rather than something it guessed. Hallucinations drop. Rediscovery stops.


Takeaway

The fundamentals compound. The embedded SQL database you met in week 2 is the same load-bearing primitive that the most senior engineer in commerce reached for to give his AI agent a memory. You do not need a fancier database to solve a frontier problem. You need to choose the simplest one that fits.