Write-Ahead Logging (WAL)

Concept. Write-Ahead Logging requires that the log record describing a change be flushed to durable storage before the modified data page is, so the on-disk log is always at least as fresh as the on-disk data.

Intuition. When Mickey's is_premium = true change is being persisted, WAL forces the log entry LSN 4711: set Mickey.is_premium = true to disk before the user-table page moves. If the server crashes between those two writes, the log still knows the intent and recovery can re-apply it. Without WAL the data page could land first and the log entry never make it. Leaving an unrecoverable inconsistency.

Understanding the Three States

Every database operation juggles three synchronized states:

WAL Log (Write-Ahead Log)

Purpose: A durable record of every change before it happens.

Speed: Sequential writes, very fast.

Rule: Log entry written BEFORE any data change.

RAM State + RAM Log (Volatile)

Purpose: Fast access to current data for active transactions.

Risk: Lost on crash - not durable.

Benefit: Updates happen immediately for performance.

DB State (Durable)

Purpose: The actual data. Persistent storage, survives crashes.

Timing: Updated periodically for efficiency.

Guarantee: Eventually consistent with COMMITted/ABORTed transactions.

Intuition: Collaborative Google Docs

Imagine multiple people editing a shared document. Why not edit directly in the cloud? Every keystroke would be a 200 ms round trip. Terrible lag. The solution: edit locally, sync periodically. The same trick maps onto WAL / RAM / DB:

Google's Revision History = WAL Log

• Stored in cloud permanently.
• Tracks EVERY change for weeks.
• "Changed 'affect' to 'effect' at 10:32am by Alice."
• Can revert any change, even months later.

Your Browser = RAM Log + RAM State

• Browser undo buffer = RAM Log (Ctrl+Z history).
• What you see = RAM State (current document).
• Super fast, but lost if browser crashes.
• Limited history (last 100 edits).

Published Doc = DB State

• What collaborators see when they open.
• Auto-saved every 30 seconds.
• May lag behind your edits.
• Survives crashes, always available.

The partial update problem: After editing for hours (thousands of changes), your browser can't hold infinite undo history. Google must sync some changes to cloud even though you're still typing - just like databases flushing to disk during long transactions!

Simple WAL Trace Table: T1 Updates A and B

Example: T1 increases A and B by 20%.

Simple WAL trace table showing T1 updating variables A and B

How to Read the Trace

Red columns (RAM): Change immediately when operations execute.
Example: 'A' changes to 22.8 at timestamp (TS)=15.

Blue columns (DB): Updated later, after IO operations complete.
Example: 'A' is reflected in DB only at TS=20.

WAL log entries: Record every change with timestamp, transaction ID, old/new values.
Example: At TS=15, WAL logs "A: 19→22.8 by T1."

Performance Benefits

WAL's Sequential writes to ONE place: WAL just appends all changes to a single log page/file.
Example: "A: 19→22.8, B: 23→27.6, C: 17→20.4..." all go to the SAME log page.

No page hunting: Don't need to find/read the pages where A, B, C actually live.
Without WAL: Must read page 17 (has A), page 492 (has B), page 1038 (has C)...

Bulk efficiency: Can stuff 64MB of updates into log pages sequentially.
Later: Batch apply all changes to their actual pages when convenient.

Recovery Capability

Complete history: WAL log has all changes, even if DB is partially updated.

Crash resilient: Can reconstruct exact state from any point using WAL.

UNDO/REDO: Old values for rollback, new values for recovery.

Why the WAL is Special: Instantaneous AND Durable

Hardware optimizations make this possible:

NVRAM (Non-Volatile RAM): Special (more expensive) battery-backed RAM that survives power loss - writes at RAM speed, durability of disk.
Replicated RAM: Write to RAM on 3+ machines simultaneously - if one crashes, others have the data.
Sequential SSD writes: Modern SSDs excel at sequential appends - 100x faster than random writes.

The key insight: WAL transforms random updates into sequential appends - the ONE operation that all hardware does blazingly fast!

Long-Running Transaction Challenge

What happens when transactions are too big to fit in RAM?

Long-running transaction TLong updating 1000 variables

The problem: TLong updates A1, A2, ..., A1000 but RAM can't hold all changes!

⚠ Memory Pressure

Solution: Write changes to DB (DB_State) before COMMIT.

Note: A0's update to 22.8 appears in DB_state early. At TS=92, even though COMMIT at TS=1000.

Result: Actual COMMIT happens much later, after all updates logged.

ABORT Scenario: Undoing Long Transactions

What if TLong needs to abort after 1000 updates?

Note: For simplicity, we’ll show ABORTs happening at one time unit. In reality, they’d also span multiple time units.

Long transaction TLong aborting with UNDO operations

ABORT process:

W-abort entries: Log UNDO operations for each change.
Use old_values: A0: 22.8→19, A1000: 32.4→27.
Update DB_state: Restore original values.
Final abort record: Transaction officially cancelled.

UNDO Process

Reverse order: Undo changes from most recent to oldest.

Goal: Database returns to exact start state.

Efficiency: WAL log provides all needed old values.

WAL Protocol Rules

The Write-Ahead Logging protocol ensures recovery correctness:

WAL Rules

Log before data: WAL entry written before any data change.
Force log on commit: All log entries must be durable before COMMIT.
UNDO from log: Use old_values to reverse changes.
REDO from log: Use new_values to re-apply changes.

Recovery Benefits

Crash safety: Can reconstruct any state from log.
Performance: Changes can happen in fast RAM first.
Flexibility: Support both COMMIT and ABORT operations.
Auditability: Complete history of all database changes.

Practice Understanding

Test Your Knowledge

Question: Why does A0's change appear in DB_state before TLong commits?

Answer: Memory pressure forces early writes to disk, but WAL log ensures we can still ABORT if needed.

Key insight: WAL log is the source of truth, not the current DB state.

Transaction Manager: The Architecture → WAL is the durability primitive. Now we'll see how it sits inside the Transaction Manager alongside the scheduler, lock manager, deadlock detector, and recovery manager.