Write-Ahead Logging (WAL)

The logging system

Write-Ahead Logging keeps tabs on every database change across four synchronized states: WAL log, RAM log, RAM state, and durable DB state.


Understanding the Three States

Every database operation juggles three synchronized states:

WAL Log (Write-Ahead Log)

Purpose: A durable record of every change before it happens.
Speed: Sequential writes, very fast.
Rule: Log entry written BEFORE any data change.

RAM State + RAM Log (Volatile)

Purpose: Fast access to current data for active transactions.
Risk: Lost on crash - not durable.
Benefit: Updates happen immediately for performance.

DB State (Durable)

Purpose: The actual data. Persistent storage, survives crashes.
Timing: Updated periodically for efficiency.
Guarantee: Eventually consistent with COMMITted/ABORTed transactions.

Intuition: Collaborative Google Docs

Imagine multiple people editing a shared document:
Why not edit directly in cloud?
Every keystroke β†’ 200ms round trip = terrible lag!
Solution: Edit locally, sync periodically.

Google's Revision History = WAL Log

β€’ Stored in cloud permanently.
β€’ Tracks EVERY change for weeks.
β€’ "Changed 'affect' to 'effect' at 10:32am by Alice."
β€’ Can revert any change, even months later.

Your Browser = RAM Log + RAM State

β€’ Browser undo buffer = RAM Log (Ctrl+Z history).
β€’ What you see = RAM State (current document).
β€’ Super fast, but lost if browser crashes.
β€’ Limited history (last 100 edits).

Published Doc = DB State

β€’ What collaborators see when they open.
β€’ Auto-saved every 30 seconds.
β€’ May lag behind your edits.
β€’ Survives crashes, always available.

The partial update problem: After editing for hours (thousands of changes), your browser can't hold infinite undo history. Google must sync some changes to cloud even though you're still typing - just like databases flushing to disk during long transactions!


Simple WAL Trace Table: T1 Updates A and B

Example: T1 increases A and B by 20%.

Simple WAL trace table showing T1 updating variables A and B

How to Read the Trace

Red columns (RAM): Change immediately when operations execute.
Example: 'A' changes to 22.8 at timestamp (TS)=15.
Blue columns (DB): Updated later, after IO operations complete.
Example: 'A' is reflected in DB only at TS=20.
WAL log entries: Record every change with timestamp, transaction ID, old/new values.
Example: At TS=15, WAL logs "A: 19β†’22.8 by T1."

Performance Benefits

WAL's Sequential writes to ONE place: WAL just appends all changes to a single log page/file.
Example: "A: 19β†’22.8, B: 23β†’27.6, C: 17β†’20.4..." all go to the SAME log page.
No page hunting: Don't need to find/read the pages where A, B, C actually live.
Without WAL: Must read page 17 (has A), page 492 (has B), page 1038 (has C)...
Bulk efficiency: Can stuff 64MB of updates into log pages sequentially.
Later: Batch apply all changes to their actual pages when convenient.

Recovery Capability

Complete history: WAL log has all changes, even if DB is partially updated.
Crash resilient: Can reconstruct exact state from any point using WAL.
UNDO/REDO: Old values for rollback, new values for recovery.

Why the WAL is Special: Instantaneous AND Durable

Hardware optimizations make this possible:

  • NVRAM (Non-Volatile RAM): Special (more expensive) battery-backed RAM that survives power loss - writes at RAM speed, durability of disk.
  • Replicated RAM: Write to RAM on 3+ machines simultaneously - if one crashes, others have the data.
  • Sequential SSD writes: Modern SSDs excel at sequential appends - 100x faster than random writes.

The key insight: WAL transforms random updates into sequential appends - the ONE operation that all hardware does blazingly fast!


Long-Running Transaction Challenge

What happens when transactions are too big to fit in RAM?

Long-running transaction TLong updating 1000 variables

The problem: TLong updates A1, A2, ..., A1000 but RAM can't hold all changes!

⚠ Memory Pressure

Solution: Write changes to DB (DB_State) before COMMIT.
Note: A0's update to 22.8 appears in DB_state early. At TS=92, even though COMMIT at TS=1000.
Result: Actual COMMIT happens much later, after all updates logged.

ABORT Scenario: Undoing Long Transactions

What if TLong needs to abort after 1000 updates?

Note: For simplicity, we’ll show ABORTs happening at one time unit. In reality, they’d also span multiple time units.

Long transaction TLong aborting with UNDO operations

ABORT process:

  1. W-abort entries: Log UNDO operations for each change.

  2. Use old_values: A0: 22.8β†’19, A1000: 32.4β†’27.

  3. Update DB_state: Restore original values.

  4. Final abort record: Transaction officially cancelled.

UNDO Process

Reverse order: Undo changes from most recent to oldest.
Goal: Database returns to exact start state.
Efficiency: WAL log provides all needed old values.

WAL Protocol Rules

The Write-Ahead Logging protocol ensures recovery correctness:

WAL Rules

  1. Log before data: WAL entry written before any data change.

  2. Force log on commit: All log entries must be durable before COMMIT.

  3. UNDO from log: Use old_values to reverse changes.

  4. REDO from log: Use new_values to re-apply changes.

Recovery Benefits


Practice Understanding

Test Your Knowledge

Question: Why does A0's change appear in DB_state before TLong commits?
Answer: Memory pressure forces early writes to disk, but WAL log ensures we can still ABORT if needed.
Key insight: WAL log is the source of truth, not the current DB state.