IO Algorithms: Cost Analysis & Intuition

CS145 Equations Sheet

For detailed explanations and examples, see IO Cost Reference.

The Building Blocks of Database Systems

📌 Important: Below examples and visuals assume C_r = C_w = 1 (equal cost for reads and writes) to show the intuition. For the JOINs below, we always add OUT (for the size of the output result). Always use the Equation Sheet for the full formulae.

IO Cost Formulas: Building Blocks for Big Data Single Table Operations Hash Partitioning P(R) h(key)%B B files Cost = 2P(R) Read once + Write once = 2P(R) Each page handled exactly twice BigSort Unsorted B-way merge Sorted Cost = 2P(R)⌈1 + log_B(P(R)/2B)⌉ log_B passes, each reads+writes P(R) Includes repacking optimization 💡 The Intuition Hash Partitioning: Split data so each piece fits in RAM • Like sorting mail by ZIP code • Each pile can be handled independently • Cost is just input + output BigSort: Sort data that doesn't fit in memory • Like tournament brackets • Merge B sorted runs at a time until done • Log_B passes needed Join Operations Basic Join (BNL) R S Load R chunk Scan all S Matches Cost = P(R) + P(R)×⌈P(S)/B⌉ Read R once + Scan S for each R chunk Works always, slow if B small Hash Partition Join R S Partition Join pairs Cost = 3(P(R) + P(S)) Partition (2×) + Join (1×) = 3× total Same keys → same partition Sort-Merge Join (SMJ) R unsorted 89,42,17 Sort R sorted 17,42,89 S unsorted 56,17,42 Sort S sorted 17,42,56 Two-pointer merge scan Result Sorted! Cost = BigSort(R) + BigSort(S) + Merge Sort both tables, then one-pass merge 💡 Rules of Thumb for Choosing Join Algorithms Basic Join (BNL): The brute force approach • Load chunks of R, scan entire S for each chunk • Great when one table is tiny (fits in B-2 pages) ✓ Rule of thumb: For non-equality joins; no good join key Hash Partition Join: Divide and conquer • Partition both tables by join key using same hash • Join matching partitions (R_i with S_i only) ✓ Rule of thumb: Tables similarly sized, good for equi-joins Sort-Merge Join: Sort once, join efficiently • Sort both tables, then merge like mergesort • Two pointers advance through sorted data • Bonus: Output is sorted (useful for ORDER BY) ✓ Rule of thumb: Need sorted output or data pre-sorted Remember: Cost-based optimizers use the actual formulas and statistics to choose the best algorithm automatically! Above are simplified rules of thumb - real optimizers do the math Use these as guidelines when manually choosing algorithms or understanding optimizer decisions

Real-World Examples: Choosing the Right Join Algorithm

Test Setup: B = 100 pages (≈ 6.4 GB RAM), Page Size = 64 MB

📌 Note: These examples use realistic table sizes from actual database workloads. Costs shown are simplified for clarity (actual optimizer costs include more factors).

Scenario Overview

Example Description Table R Table S Special Conditions Key Insight
Ex1 Small Tables Songs
10 GB
Listens
10 GB
Tables fit mostly in RAM
Ex2 Large Tables Songs
100 GB
Listens
2 TB
Tables >> buffer size
Ex3 Pre-sorted Tables Songs ↑
100 GB sorted
Listens ↑
2 TB sorted
✓ Pre-sorted SMJ skips sorting phase
Ex4 Self-Join Listens
2 TB
Listens
2 TB (same)
⚡ Self-join Can't optimize table order
Ex5 Song Similarity Listens
100B rows
Listens
100B rows
🔥 Massive Real-world similarity query

Algorithm Performance Comparison (Log Scale)

📊 Visualization Note: Bar heights show relative performance on a logarithmic scale. Shorter bars = better performance. Green highlights indicate the winning algorithm for each scenario.
Relative Algorithm Performance Across Scenarios (Log Scale) 10² 10³ 10⁴ 10⁵ 10⁶ 10⁷ Ex1: Small 10GB × 10GB Ex2: Large 100GB × 2TB Ex3: Sorted Pre-sorted Ex4: Self-Join 2TB × 2TB Ex5: Similarity 100B × 100B BNLJ BNLJ-rev SMJ HPJ (Green = Winner) IO Cost (Log Scale)
⚠️ Edge Cases Not Shown:
  • SMJ-BadBackup: When many duplicates exist, SMJ may need to backup and re-scan
  • HPJ-BadSkew: When data is heavily skewed, some hash partitions become too large
  • These edge cases can increase costs by 2-10× in worst cases

Key Observations

🎯 Example 1: Small Tables Win with BNLJ

📈 Examples 2-5: Large Tables Need Smart Algorithms

🔄 Self-Joins Are Common and Expensive

The Query Optimizer's Decision Process

-- The optimizer evaluates all plans and picks the cheapest
EXPLAIN SELECT * FROM Songs s JOIN Listens l ON s.id = l.song_id;

-- Factors considered:
-- 1. Table sizes and statistics
-- 2. Available memory (B)
-- 3. Whether data is pre-sorted or indexed
-- 4. Expected output size
-- 5. Data distribution (skew detection)

Practical Takeaways

  1. Small data (< RAM): BNLJ often wins due to simplicity

  2. Large data: SMJ/HPJ dominate, often by 10-100×

  3. Pre-processing matters: Sorted data makes SMJ much cheaper

  4. Self-joins are special: Can't optimize with table order

  5. Real optimizers adapt: They track statistics and adjust for edge cases


Exercise 2: Impact of Machine Configuration on IO Costs

Scenario: Songs3 (100GB) × Listens3 (1TB)

🔧 Key Insight: Different machines have varying RAM (32GB to 640GB) and IO devices with different C_r and C_w costs. Query optimizers may allocate only a portion of RAM per query (20%-60%) when handling parallel workloads.

Cost Analysis Across Different Configurations

Buffer
Size (B)
C_r C_w BNLJ BNLJ-rev SMJ-best HPJ-best Winner
10
(tiny)
1 1 2.6M 2.6M 159K 54K ✅ HPJ
1 10 2.6M 2.6M 792K 216K ✅ HPJ
10 1 26M 26M 954K 378K ✅ HPJ
10 10 26M 26M 1.6M 540K ✅ HPJ
100
(small)
1 1 264K 279K 90K 54K ✅ HPJ
1 10 264K 279K 414K 216K ✅ HPJ
10 1 2.6M 2.8M 575K 378K ✅ HPJ
10 10 2.6M 2.8M 899K 540K ✅ HPJ
1,000
(medium)
1 1 34K 44K 87K 54K ✅ BNLJ
1 10 34K 44K 396K 216K ✅ BNLJ
10 1 344K 436K 558K 378K ✅ BNLJ
10 10 344K 436K 867K 540K ✅ BNLJ
10,000
(large)
1 1 18K 20K 54K 54K ✅ BNLJ
1 10 18K 20K 216K 216K ✅ BNLJ
10 1 180K 196K 378K 378K ✅ BNLJ
10 10 180K 196K 540K 540K ✅ BNLJ

Key Observations

📊 Buffer Size Impact

  • Small B (10-100): HPJ dominates
  • Medium B (1,000): BNLJ becomes competitive
  • Large B (10,000): BNLJ wins decisively

⚡ Read/Write Cost Impact

  • High C_w: Penalizes algorithms with heavy writes (SMJ sorting phase)
  • High C_r: All costs scale up proportionally
  • Asymmetric costs: Can change winner even with same B

Practical Implications

  1. Cloud Environments: With variable RAM allocation per query, HPJ often wins due to its consistent performance

  2. SSDs vs HDDs: SSDs have C_r ≈ C_w, while HDDs often have C_w > C_r

  3. Memory-Rich Systems: BNLJ becomes optimal when B is large enough to hold significant chunks

  4. Multi-Query Workloads: Conservative B estimates favor HPJ's predictable performance


Real-World Context

MySQL
Uses all 5 algorithms
PostgreSQL
Optimizer picks for you
Spark
Distributed versions
BigQuery
Same concepts at scale

These Algorithms Are Everywhere!

-- PostgreSQL EXPLAIN shows which algorithm it chose
EXPLAIN SELECT * FROM users u 
JOIN listens l ON u.id = l.user_id;

-- Might show:
Hash Join  (cost=3.25..10.70)
  Hash Cond: (l.user_id = u.id)
  ->  Seq Scan on listens l
  ->  Hash
      ->  Seq Scan on users u

Key Takeaways

  1. Just 5 Algorithms power most database operations

  2. Memory (B) determines which algorithm is best

  3. Size matters: Small tables → Basic Join, Large tables → Hash/Sort

  4. No memorization needed: Formulas follow intuition

  5. Real systems use these exact algorithms at massive scale

📚 These formulas are on your system sheet - focus on understanding when to use each one!


Next: See these in action with Join Algorithms