Write-Ahead Logging (WAL) 📋

🎯 The logging system

Write-Ahead Logging tracks every change to DB in four synchronized states: WAL log, RAM log, RAM state, and durable DB state.

Understanding the Three States 🔄

Every database operation affects three synchronized states:

📝 WAL Log (Write-Ahead Log)

💾 Purpose: Durable record of every change before it happens

⚡ Speed: Sequential writes, very fast

🔒 Rule: Log entry written BEFORE any data change

🔥 RAM State + RAM Log (Volatile)

⚡ Purpose: Fast access to current data for active transactions

💥 Risk: Lost on crash - not durable

🎯 Benefit: Updates happen immediately for performance

💾 DB State (Durable)

🔒 Purpose: The actual data. Persistent storage, survives crashes

⏰ Timing: Updated periodically for efficiency

🎯 Guarantee: Eventually consistent with COMMITted/ABORTed transactions

💡 Intuition: Collaborative Google Docs

Imagine multiple people editing a shared document:

Why not edit directly in cloud?
Every keystroke → 200ms round trip = terrible lag!
Solution: Edit locally, sync periodically

📝 Google's Revision History = WAL Log

• Stored in cloud permanently
• Tracks EVERY change for weeks
• "Changed 'affect' to 'effect' at 10:32am by Alice"
• Can revert any change, even months later

🖥️ Your Browser = RAM Log + RAM State

• Browser undo buffer = RAM Log (Ctrl+Z history)
• What you see = RAM State (current document)
• Super fast, but lost if browser crashes
• Limited history (last 100 edits)

☁️ Published Doc = DB State

• What collaborators see when they open
• Auto-saved every 30 seconds
• May lag behind your edits
• Survives crashes, always available

The partial update problem: After editing for hours (thousands of changes), your browser can't hold infinite undo history. Google must sync some changes to cloud even though you're still typing - exactly like databases flushing to disk during long transactions!

Simple WAL Trace Table: T1 Updates A and B

Example: T1 increases A and B by 20%.

Simple WAL trace table showing T1 updating variables A and B

📖 How to Read the Trace

Red columns (RAM): Change immediately when operations execute
Example: 'A' changes to 22.8 at timestamp (TS)=15

Blue columns (DB): Updated later, after IO operations complete
Example: 'A' is reflected in DB only at TS=20

WAL log entries: Record every change with timestamp, transaction ID, old/new values
Example: At TS=15, WAL logs "A: 19→22.8 by T1"

⚡ Performance Benefits

WAL's Sequential writes to ONE place: WAL just appends all changes to a single log page/file
Example: "A: 19→22.8, B: 23→27.6, C: 17→20.4..." all go to the SAME log page

No page hunting: Don't need to find/read the pages where A, B, C actually live
Without WAL: Must read page 17 (has A), page 492 (has B), page 1038 (has C)...

Bulk efficiency: Can stuff 64MB of updates into log pages sequentially
Later: Batch apply all changes to their actual pages when convenient

🔄 Recovery Capability

Complete history: WAL log has all changes, even if DB is partially updated

Crash resilient: Can reconstruct exact state from any point using WAL

UNDO/REDO: Old values for rollback, new values for recovery

Why WAL's WAL Log is Special: Instantaneous AND Durable

Hardware optimizations make this possible:

NVRAM (Non-Volatile RAM): Special (more expensive) battery-backed RAM that survives power loss - writes at RAM speed, durability of disk
Replicated RAM: Write to RAM on 3+ machines simultaneously - if one crashes, others have the data
Sequential SSD writes: Modern SSDs excel at sequential appends - 100x faster than random writes

The key insight: WAL transforms random updates into sequential appends - the ONE operation that all hardware does blazingly fast!

Long-Running Transaction Challenge 🏃‍♂️

What happens when transactions are too big to fit in RAM?

Long-running transaction TLong updating 1000 variables

The problem: TLong updates A1, A2, ..., A1000 but RAM can't hold all changes!

⚠️ Memory Pressure

Solution: Write changes to DB (DB_State) before COMMIT

Note: A0's update to 22.8 appears in DB_state early. At TS=92, even though COMMIT at TS=1000

Result: Actual COMMIT happens much later, after all updates logged

ABORT Scenario: Undoing Long Transactions ❌

What if TLong needs to abort after 1000 updates? [For simplicity, we’ll show ABORTs happening at one time unit. In reality, they’d also span multiple time units.]

Long transaction TLong aborting with UNDO operations

ABORT process:

W-abort entries: Log UNDO operations for each change
Use old_values: A0: 22.8→19, A1000: 32.4→27
Update DB_state: Restore original values
Final abort record: Transaction officially cancelled

🔄 UNDO Process

Reverse order: Undo changes from most recent to oldest

Goal: Database returns to exact start state

Efficiency: WAL log provides all needed old values

WAL Protocol Rules 📋

The Write-Ahead Logging protocol ensures recovery correctness:

🔒 WAL Rules

Log before data: WAL entry written before any data change
Force log on commit: All log entries must be durable before COMMIT
UNDO from log: Use old_values to reverse changes
REDO from log: Use new_values to re-apply changes

🎯 Recovery Benefits

Crash safety: Can reconstruct any state from log
Performance: Changes can happen in fast RAM first
Flexibility: Support both COMMIT and ABORT operations
Auditability: Complete history of all database changes

Practice Understanding

Test Your Knowledge

Question: Why does A0's change appear in DB_state before TLong commits?

Answer: Memory pressure forces early writes to disk, but WAL log ensures we can still ABORT if needed

🎯 Key insight: WAL log is the source of truth, not the current DB state