Recovery: Why We Need It
Quick Recall: Transaction Outcomes
✅ COMMIT
Transaction completed successfully → All changes become permanent
❌ ABORT
Transaction failed or cancelled (by user or logic). If a transaction doesn't COMMIT before a crash (machine reboot, disk/network crash), it's also treated as an ABORT.
→ All changes must be undone
→ All changes must be undone
The Three Key Problems
Every database must tackle these core issues:
The Atomicity Problem
If a payment transaction fails after debiting account A but before crediting account B, the system risks a partial update where money vanishes.
We must be able to UNDO partial changes when a transaction ABORTs to ensure no data is corrupted.
The Durability Problem
If a user pays for a ticket and receives confirmation, but the server immediately crashes, the system must retain that the transaction occurred.
We must ensure COMMITed data survives crashes. If changes are lost from memory, the system must be able to REDO them on reboot.
The Performance Problem
A database with billions of rows must track changes for millions of concurrent transactions. A naive approach of copying the database before each transaction would cause system paralysis.
The system must be able to meticulously track operations and log changes without imposing massive performance overheads.
Recovery is the database's ability to restore itself to a consistent state. It's like having a time machine.
Real-World Examples
Banking Transfer Gone Wrong
Problem: Alice lost $500, Bob got nothing. Money vanished!
Recovery solution: Must UNDO the first UPDATE to restore Alice's balance.
Concert Ticket Purchase
Problem: User has receipt, paid money, but restart shows ticket as available.
Recovery solution: Must REDO all changes to honor the committed purchase.
Example: DB State after Single Transaction (T56)
Consider a single transaction (T56) in which we resell tickets from 3 users. After T56 is done, the database can only be in one of two states. The Transaction Manager coordinates everything here.

Transaction Goals
Atomicity: All changes happen, or none do
Durability: Committed changes survive crashes
Two Paths: COMMIT (make permanent) or ABORT (undo everything)
Why Traditional Approaches Fail
"Copy Everything" Approach
Idea: Make complete database copy before each transaction
Reality: 1TB database + 1000 transactions/sec = 1000TB/sec copying
Verdict: Impossible at scale
"Save After Every Change" Approach
Idea: Write every change immediately to permanent storage (e.g., disk)
Reality: Disk IOs are slow, RAM access takes 100ns (50,000x slower!)
Verdict: Performance death sentence
The Recovery Solution Preview
Modern databases solve this with an elegant approach:
Change Tracking (not copying)
Insight: Track what changed, not entire state
Efficiency: Log 10 changes vs. copy 10 million rows
Write-Ahead Logging
Insight: Write changes to fast sequential log before slow random storage
Guarantee: Can reconstruct any state from logged changes