Recovery Intuition: Change Tracking

Concept. Recovery works by writing every state change to a durable log before it touches the data, so on restart the log alone is enough to reconstruct exactly which transactions committed and which didn't.

Intuition. Backing up Spotify's entire database after every transaction is a non-starter: billions of rows copied for the sake of one user upgrade. So the database does what git and Google Docs do: log only the diff. When Mickey buys Premium, the log appends BEGIN, charged $9.99, set is_premium=true, COMMIT, with each entry hitting disk before the corresponding row update. If the server crashes after charged $9.99 but before set is_premium=true, the log alone tells recovery Mickey never committed, so undo the charge.

You Already Know This

Google Docs Version History

Google Docs version history showing change tracking

Key insight: Google Docs isn't hoarding 50 versions of your document. It logs the changes between them.

Git Version Control

Git diff showing line-by-line changes

Key insight: Git doesn't save every file version. It captures diffs: what changed, when, and by whom.


Example Change Tracking in DB

Let's apply this to a ticket resale scenario. The DB transitions states (left, FSM); meanwhile every change is appended to the log (right, the tracking data structure that lets us replay or undo on crash).

DB State Machine

FSM: Pre State to T56 transaction to Post State (COMMIT) or back to Pre State (ABORT)

T56: rewrites buyer_id on three seats. 42: ab12→zx198, 44: cd34→pq342, 51: ef56→st567.

WAL Log (tracking data structure)

UNDO/REDO log table excerpt for T56

Instead of storing full database snapshots, every log entry records:

  • What changed: specific rows and columns, with old and new values.

  • When it changed: transaction timing and sequence.

  • Who changed it: which transaction made the change.

The log isn't decorative. It's the only thing recovery needs after a crash. Everything else (in-memory state, half-flushed pages) can be reconstructed from it.


Next

UNDO and REDO → Now that we know the log records every change with old and new values, here are the two operations recovery uses to put the database back together: UNDO (uses old values to roll back) and REDO (uses new values to reapply).