Write-Ahead Logging (WAL) πŸ“‹

🎯 The logging system

Write-Ahead Logging tracks every change to DB in four synchronized states: WAL log, RAM log, RAM state, and durable DB state.


Understanding the Three States πŸ”„

Every database operation affects three synchronized states:

πŸ“ WAL Log (Write-Ahead Log)

πŸ’Ύ Purpose: Durable record of every change before it happens
⚑ Speed: Sequential writes, very fast
πŸ”’ Rule: Log entry written BEFORE any data change

πŸ”₯ RAM State + RAM Log (Volatile)

⚑ Purpose: Fast access to current data for active transactions
πŸ’₯ Risk: Lost on crash - not durable
🎯 Benefit: Updates happen immediately for performance

πŸ’Ύ DB State (Durable)

πŸ”’ Purpose: The actual data. Persistent storage, survives crashes
⏰ Timing: Updated periodically for efficiency
🎯 Guarantee: Eventually consistent with COMMITted/ABORTed transactions

πŸ’‘ Intuition: Collaborative Google Docs

Imagine multiple people editing a shared document:
Why not edit directly in cloud?
Every keystroke β†’ 200ms round trip = terrible lag!
Solution: Edit locally, sync periodically

πŸ“ Google's Revision History = WAL Log

β€’ Stored in cloud permanently
β€’ Tracks EVERY change for weeks
β€’ "Changed 'affect' to 'effect' at 10:32am by Alice"
β€’ Can revert any change, even months later

πŸ–₯️ Your Browser = RAM Log + RAM State

β€’ Browser undo buffer = RAM Log (Ctrl+Z history)
β€’ What you see = RAM State (current document)
β€’ Super fast, but lost if browser crashes
β€’ Limited history (last 100 edits)

☁️ Published Doc = DB State

β€’ What collaborators see when they open
β€’ Auto-saved every 30 seconds
β€’ May lag behind your edits
β€’ Survives crashes, always available

The partial update problem: After editing for hours (thousands of changes), your browser can't hold infinite undo history. Google must sync some changes to cloud even though you're still typing - exactly like databases flushing to disk during long transactions!


Simple WAL Trace Table: T1 Updates A and B

Example: T1 increases A and B by 20%.

Simple WAL trace table showing T1 updating variables A and B

πŸ“– How to Read the Trace

Red columns (RAM): Change immediately when operations execute
Example: 'A' changes to 22.8 at timestamp (TS)=15
Blue columns (DB): Updated later, after IO operations complete
Example: 'A' is reflected in DB only at TS=20
WAL log entries: Record every change with timestamp, transaction ID, old/new values
Example: At TS=15, WAL logs "A: 19β†’22.8 by T1"

⚑ Performance Benefits

WAL's Sequential writes to ONE place: WAL just appends all changes to a single log page/file
Example: "A: 19β†’22.8, B: 23β†’27.6, C: 17β†’20.4..." all go to the SAME log page
No page hunting: Don't need to find/read the pages where A, B, C actually live
Without WAL: Must read page 17 (has A), page 492 (has B), page 1038 (has C)...
Bulk efficiency: Can stuff 64MB of updates into log pages sequentially
Later: Batch apply all changes to their actual pages when convenient

πŸ”„ Recovery Capability

Complete history: WAL log has all changes, even if DB is partially updated
Crash resilient: Can reconstruct exact state from any point using WAL
UNDO/REDO: Old values for rollback, new values for recovery

Why WAL's WAL Log is Special: Instantaneous AND Durable

Hardware optimizations make this possible:

  • NVRAM (Non-Volatile RAM): Special (more expensive) battery-backed RAM that survives power loss - writes at RAM speed, durability of disk
  • Replicated RAM: Write to RAM on 3+ machines simultaneously - if one crashes, others have the data
  • Sequential SSD writes: Modern SSDs excel at sequential appends - 100x faster than random writes

The key insight: WAL transforms random updates into sequential appends - the ONE operation that all hardware does blazingly fast!


Long-Running Transaction Challenge πŸƒβ€β™‚οΈ

What happens when transactions are too big to fit in RAM?

Long-running transaction TLong updating 1000 variables

The problem: TLong updates A1, A2, ..., A1000 but RAM can't hold all changes!

⚠️ Memory Pressure

Solution: Write changes to DB (DB_State) before COMMIT
Note: A0's update to 22.8 appears in DB_state early. At TS=92, even though COMMIT at TS=1000
Result: Actual COMMIT happens much later, after all updates logged

ABORT Scenario: Undoing Long Transactions ❌

What if TLong needs to abort after 1000 updates? [For simplicity, we’ll show ABORTs happening at one time unit. In reality, they’d also span multiple time units.]

Long transaction TLong aborting with UNDO operations

ABORT process:

  1. W-abort entries: Log UNDO operations for each change

  2. Use old_values: A0: 22.8β†’19, A1000: 32.4β†’27

  3. Update DB_state: Restore original values

  4. Final abort record: Transaction officially cancelled

πŸ”„ UNDO Process

Reverse order: Undo changes from most recent to oldest
Goal: Database returns to exact start state
Efficiency: WAL log provides all needed old values

WAL Protocol Rules πŸ“‹

The Write-Ahead Logging protocol ensures recovery correctness:

πŸ”’ WAL Rules

  1. Log before data: WAL entry written before any data change
  2. Force log on commit: All log entries must be durable before COMMIT
  3. UNDO from log: Use old_values to reverse changes
  4. REDO from log: Use new_values to re-apply changes

🎯 Recovery Benefits

  • Crash safety: Can reconstruct any state from log
  • Performance: Changes can happen in fast RAM first
  • Flexibility: Support both COMMIT and ABORT operations
  • Auditability: Complete history of all database changes

Practice Understanding

Test Your Knowledge

Question: Why does A0's change appear in DB_state before TLong commits?
Answer: Memory pressure forces early writes to disk, but WAL log ensures we can still ABORT if needed
🎯 Key insight: WAL log is the source of truth, not the current DB state