Write-Ahead Logging (WAL)

The logging system

Write-Ahead Logging keeps tabs on every database change across four synchronized states: WAL log, RAM log, RAM state, and durable DB state.

Understanding the Three States

Every database operation juggles three synchronized states:

WAL Log (Write-Ahead Log)

Purpose: A durable record of every change before it happens.

Speed: Sequential writes, very fast.

Rule: Log entry written BEFORE any data change.

RAM State + RAM Log (Volatile)

Purpose: Fast access to current data for active transactions.

Risk: Lost on crash - not durable.

Benefit: Updates happen immediately for performance.

DB State (Durable)

Purpose: The actual data. Persistent storage, survives crashes.

Timing: Updated periodically for efficiency.

Guarantee: Eventually consistent with COMMITted/ABORTed transactions.

Intuition: Collaborative Google Docs

Imagine multiple people editing a shared document:

Why not edit directly in cloud?
Every keystroke → 200ms round trip = terrible lag!
Solution: Edit locally, sync periodically.

Google's Revision History = WAL Log

• Stored in cloud permanently.
• Tracks EVERY change for weeks.
• "Changed 'affect' to 'effect' at 10:32am by Alice."
• Can revert any change, even months later.

Your Browser = RAM Log + RAM State

• Browser undo buffer = RAM Log (Ctrl+Z history).
• What you see = RAM State (current document).
• Super fast, but lost if browser crashes.
• Limited history (last 100 edits).

Published Doc = DB State

• What collaborators see when they open.
• Auto-saved every 30 seconds.
• May lag behind your edits.
• Survives crashes, always available.

The partial update problem: After editing for hours (thousands of changes), your browser can't hold infinite undo history. Google must sync some changes to cloud even though you're still typing - just like databases flushing to disk during long transactions!

Simple WAL Trace Table: T1 Updates A and B

Example: T1 increases A and B by 20%.

Simple WAL trace table showing T1 updating variables A and B

How to Read the Trace

Red columns (RAM): Change immediately when operations execute.
Example: 'A' changes to 22.8 at timestamp (TS)=15.

Blue columns (DB): Updated later, after IO operations complete.
Example: 'A' is reflected in DB only at TS=20.

WAL log entries: Record every change with timestamp, transaction ID, old/new values.
Example: At TS=15, WAL logs "A: 19→22.8 by T1."

Performance Benefits

WAL's Sequential writes to ONE place: WAL just appends all changes to a single log page/file.
Example: "A: 19→22.8, B: 23→27.6, C: 17→20.4..." all go to the SAME log page.

No page hunting: Don't need to find/read the pages where A, B, C actually live.
Without WAL: Must read page 17 (has A), page 492 (has B), page 1038 (has C)...

Bulk efficiency: Can stuff 64MB of updates into log pages sequentially.
Later: Batch apply all changes to their actual pages when convenient.

Recovery Capability

Complete history: WAL log has all changes, even if DB is partially updated.

Crash resilient: Can reconstruct exact state from any point using WAL.

UNDO/REDO: Old values for rollback, new values for recovery.

Why the WAL is Special: Instantaneous AND Durable

Hardware optimizations make this possible:

NVRAM (Non-Volatile RAM): Special (more expensive) battery-backed RAM that survives power loss - writes at RAM speed, durability of disk.
Replicated RAM: Write to RAM on 3+ machines simultaneously - if one crashes, others have the data.
Sequential SSD writes: Modern SSDs excel at sequential appends - 100x faster than random writes.

The key insight: WAL transforms random updates into sequential appends - the ONE operation that all hardware does blazingly fast!

Long-Running Transaction Challenge

What happens when transactions are too big to fit in RAM?

Long-running transaction TLong updating 1000 variables

The problem: TLong updates A1, A2, ..., A1000 but RAM can't hold all changes!

⚠ Memory Pressure

Solution: Write changes to DB (DB_State) before COMMIT.

Note: A0's update to 22.8 appears in DB_state early. At TS=92, even though COMMIT at TS=1000.

Result: Actual COMMIT happens much later, after all updates logged.

ABORT Scenario: Undoing Long Transactions

What if TLong needs to abort after 1000 updates?

Note: For simplicity, we’ll show ABORTs happening at one time unit. In reality, they’d also span multiple time units.

Long transaction TLong aborting with UNDO operations

ABORT process:

W-abort entries: Log UNDO operations for each change.
Use old_values: A0: 22.8→19, A1000: 32.4→27.
Update DB_state: Restore original values.
Final abort record: Transaction officially cancelled.

UNDO Process

Reverse order: Undo changes from most recent to oldest.

Goal: Database returns to exact start state.

Efficiency: WAL log provides all needed old values.

WAL Protocol Rules

The Write-Ahead Logging protocol ensures recovery correctness:

WAL Rules

Log before data: WAL entry written before any data change.
Force log on commit: All log entries must be durable before COMMIT.
UNDO from log: Use old_values to reverse changes.
REDO from log: Use new_values to re-apply changes.

Recovery Benefits

Crash safety: Can reconstruct any state from log.
Performance: Changes can happen in fast RAM first.
Flexibility: Support both COMMIT and ABORT operations.
Auditability: Complete history of all database changes.

Practice Understanding

Test Your Knowledge

Question: Why does A0's change appear in DB_state before TLong commits?

Answer: Memory pressure forces early writes to disk, but WAL log ensures we can still ABORT if needed.

Key insight: WAL log is the source of truth, not the current DB state.