IO Algorithms: Cost Analysis & Intuition
CS145 Equations Sheet
For detailed explanations and examples, see IO Cost Reference.
The Building Blocks of Database Systems
📌 Important: Below examples and visuals assume C_r = C_w = 1 (equal cost for reads and writes) to show the intuition. For the JOINs below, we always add OUT (for the size of the output result). Always use the Equation Sheet for the full formulae.
Real-World Examples: Choosing the Right Join Algorithm
Test Setup: B = 100 pages (≈ 6.4 GB RAM), Page Size = 64 MB
Scenario Overview
| Example | Description | Table R | Table S | Special Conditions | Key Insight |
|---|---|---|---|---|---|
| Ex1 | Small Tables | Songs 10 GB |
Listens 10 GB |
— | Tables fit mostly in RAM |
| Ex2 | Large Tables | Songs 100 GB |
Listens 2 TB |
— | Tables >> buffer size |
| Ex3 | Pre-sorted Tables | Songs ↑ 100 GB sorted |
Listens ↑ 2 TB sorted |
✓ Pre-sorted | SMJ skips sorting phase |
| Ex4 | Self-Join | Listens 2 TB |
Listens 2 TB (same) |
⚡ Self-join | Can't optimize table order |
| Ex5 | Song Similarity | Listens 100B rows |
Listens 100B rows |
🔥 Massive | Real-world similarity query |
Algorithm Performance Comparison (Log Scale)
- SMJ-BadBackup: When many duplicates exist, SMJ may need to backup and re-scan
- HPJ-BadSkew: When data is heavily skewed, some hash partitions become too large
- These edge cases can increase costs by 2-10× in worst cases
Key Observations
🎯 Example 1: Small Tables Win with BNLJ
-
Songs (10GB) and Listens (10GB) are small relative to buffer size (6.4GB)
-
BNLJ performs the join mostly in RAM without preprocessing overhead
-
No sorting or partitioning needed → Lower cost than SMJ/HPJ
📈 Examples 2-5: Large Tables Need Smart Algorithms
-
When tables >> buffer size, BNLJ becomes prohibitively expensive
-
SMJ and HPJ scale much better with data size
-
Pre-sorted data (Ex3) gives SMJ a huge advantage
🔄 Self-Joins Are Common and Expensive
-
Examples 4-5 show real patterns (finding similar songs, user correlations)
-
Self-joins can't optimize by choosing smaller outer relation
-
SMJ/HPJ are essential for these massive operations
The Query Optimizer's Decision Process
-- The optimizer evaluates all plans and picks the cheapest
EXPLAIN SELECT * FROM Songs s JOIN Listens l ON s.id = l.song_id;
-- Factors considered:
-- 1. Table sizes and statistics
-- 2. Available memory (B)
-- 3. Whether data is pre-sorted or indexed
-- 4. Expected output size
-- 5. Data distribution (skew detection)
Practical Takeaways
-
Small data (< RAM): BNLJ often wins due to simplicity
-
Large data: SMJ/HPJ dominate, often by 10-100×
-
Pre-processing matters: Sorted data makes SMJ much cheaper
-
Self-joins are special: Can't optimize with table order
-
Real optimizers adapt: They track statistics and adjust for edge cases
Exercise 2: Impact of Machine Configuration on IO Costs
Scenario: Songs3 (100GB) × Listens3 (1TB)
-
Songs3: 100GB = 1,600 pages
-
Listens3: 1TB = 16,384 pages
-
Row size: 1024 bytes
Cost Analysis Across Different Configurations
| Buffer Size (B) |
C_r | C_w | BNLJ | BNLJ-rev | SMJ-best | HPJ-best | Winner |
|---|---|---|---|---|---|---|---|
| 10 (tiny) |
1 | 1 | 2.6M | 2.6M | 159K | 54K | ✅ HPJ |
| 1 | 10 | 2.6M | 2.6M | 792K | 216K | ✅ HPJ | |
| 10 | 1 | 26M | 26M | 954K | 378K | ✅ HPJ | |
| 10 | 10 | 26M | 26M | 1.6M | 540K | ✅ HPJ | |
| 100 (small) |
1 | 1 | 264K | 279K | 90K | 54K | ✅ HPJ |
| 1 | 10 | 264K | 279K | 414K | 216K | ✅ HPJ | |
| 10 | 1 | 2.6M | 2.8M | 575K | 378K | ✅ HPJ | |
| 10 | 10 | 2.6M | 2.8M | 899K | 540K | ✅ HPJ | |
| 1,000 (medium) |
1 | 1 | 34K | 44K | 87K | 54K | ✅ BNLJ |
| 1 | 10 | 34K | 44K | 396K | 216K | ✅ BNLJ | |
| 10 | 1 | 344K | 436K | 558K | 378K | ✅ BNLJ | |
| 10 | 10 | 344K | 436K | 867K | 540K | ✅ BNLJ | |
| 10,000 (large) |
1 | 1 | 18K | 20K | 54K | 54K | ✅ BNLJ |
| 1 | 10 | 18K | 20K | 216K | 216K | ✅ BNLJ | |
| 10 | 1 | 180K | 196K | 378K | 378K | ✅ BNLJ | |
| 10 | 10 | 180K | 196K | 540K | 540K | ✅ BNLJ |
Key Observations
📊 Buffer Size Impact
- Small B (10-100): HPJ dominates
- Medium B (1,000): BNLJ becomes competitive
- Large B (10,000): BNLJ wins decisively
⚡ Read/Write Cost Impact
- High C_w: Penalizes algorithms with heavy writes (SMJ sorting phase)
- High C_r: All costs scale up proportionally
- Asymmetric costs: Can change winner even with same B
Practical Implications
-
Cloud Environments: With variable RAM allocation per query, HPJ often wins due to its consistent performance
-
SSDs vs HDDs: SSDs have C_r ≈ C_w, while HDDs often have C_w > C_r
-
Memory-Rich Systems: BNLJ becomes optimal when B is large enough to hold significant chunks
-
Multi-Query Workloads: Conservative B estimates favor HPJ's predictable performance
Real-World Context
These Algorithms Are Everywhere!
-- PostgreSQL EXPLAIN shows which algorithm it chose
EXPLAIN SELECT * FROM users u
JOIN listens l ON u.id = l.user_id;
-- Might show:
Hash Join (cost=3.25..10.70)
Hash Cond: (l.user_id = u.id)
-> Seq Scan on listens l
-> Hash
-> Seq Scan on users u
Key Takeaways
-
Just 5 Algorithms power most database operations
-
Memory (B) determines which algorithm is best
-
Size matters: Small tables → Basic Join, Large tables → Hash/Sort
-
No memorization needed: Formulas follow intuition
-
Real systems use these exact algorithms at massive scale
📚 These formulas are on your system sheet - focus on understanding when to use each one!
Next: See these in action with Join Algorithms