Recovery: Why We Need It

Quick Recall: Transaction Outcomes

✅ COMMIT

Transaction completed successfully → All changes become permanent

❌ ABORT

Transaction failed or cancelled (by user or logic)
If a transaction doesn't COMMIT before a crash (machine reboot, disk/network crash), it's also treated as an ABORT

→ All changes must be undone

The Three Key Problems

Every DB must solve these fundamental challenges:

⚛️ The Atomicity Problem

Scenario: Payment transaction fails after debiting account A but before crediting account B
Disaster: Money disappears! Account A lost $500, but account B never received it
Critical Question: How do we UNDO partial changes when a transaction ABORTs?

💾 The Durability Problem

Scenario: User successfully buys concert ticket, gets confirmation, but server crashes
Disaster: User paid money, got confirmation, but ticket purchase "never happened"
Critical Question: How do we ensure COMMMITed data survives crashes? If changes are lost, how do we REDO?

⚡ The Performance Problem

Scenario: Database with billions of rows needs to track changes for millions of transactions
Disaster: Naive approach of copying entire database for each transaction = system paralysis
Critical Question: How do we track changes without massive overhead?

Recovery is the DB's ability to restore DB to a consistent state. It's like having a time machine.


Real-World Examples

💳 Banking Transfer Gone Wrong

BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 500 WHERE id = 'alice';
-- CRASH! Server fails here
UPDATE accounts SET balance = balance + 500 WHERE id = 'bob';
COMMIT;

Problem: Alice lost $500, Bob got nothing. Money vanished!

Recovery solution: Must UNDO the first UPDATE to restore Alice's balance.

🎫 Concert Ticket Purchase

BEGIN TRANSACTION;
INSERT INTO purchases VALUES ('user123', 'ticket456', 'paid');
UPDATE seats SET status = 'sold' WHERE id = 'ticket456';
COMMIT;
-- User sees "Success! Ticket purchased"
-- CRASH! Server fails after COMMIT

Problem: User has receipt, paid money, but restart shows ticket as available.

Recovery solution: Must REDO all changes to honor the committed purchase.


Example: DB State after Single Transaction (T56)

Consider a single transaction (T56) in which we resell tickets from 3 users. After T56 is done, the DB can only be in one of two states. The Transaction Manager coordinates everything here.

Transaction T56 lifecycle showing commit/abort paths and database states

🎯 Transaction Goals

⚛️ Atomicity: All changes happen, or none do
💾 Durability: Committed changes survive crashes
🔄 Two Paths: COMMIT (make permanent) or ABORT (undo everything)

Why Traditional Approaches Fail

🐌 "Copy Everything" Approach

Idea: Make complete database copy before each transaction
Reality: 1TB database + 1000 transactions/sec = 1000TB/sec copying
Verdict: Impossible at scale

💾 "Save After Every Change" Approach

Idea: Write every change immediately to permanent storage (e.g. disk). E.g., write to page17, page49, page2013, etc based on the values changed
Reality: Disk IOs are slow, RAM access takes 100ns (50,000x slower!)
Verdict: Performance death sentence

The Recovery Solution Preview 🔍

Modern databases solve this with an elegant approach:

Change Tracking (not copying)

Insight: Track what changed, not entire state
Efficiency: Log 10 changes vs. copy 10 million rows

⚡ Write-Ahead Logging

Insight: Write changes to fast sequential log before slow random storage
Guarantee: Can reconstruct any state from logged changes