IO Algorithms: Cost Analysis & Intuition

CS145 Equations Sheet

For detailed explanations and examples, see IO Cost Reference.

Real-World Examples: Choosing the Right Join Algorithm

Assume C_r = C_w = 1 (equal cost for reads and writes). For the JOINs below, we always add OUT (for the size of the output result). Always use the Equation Sheet for the full formulae.

Test Setup: B = 100 pages (≈ 6.4 GB RAM), Page Size = 64 MB

Note: These examples use realistic table sizes from actual database workloads. Costs shown are simplified for clarity (actual optimizer costs include more factors).

Scenario Overview

Example Description Table R Table S Special Conditions Key Insight
Ex1 Small Tables Songs
10 GB
Listens
10 GB
Tables fit mostly in RAM
Ex2 Large Tables Songs
100 GB
Listens
2 TB
Tables >> buffer size
Ex3 Pre-sorted Tables Songs ↑
100 GB sorted
Listens ↑
2 TB sorted
Pre-sorted SMJ skips sorting phase
Ex4 Self-Join Listens
2 TB
Listens
2 TB (same)
Self-join Can't optimize table order
Ex5 Song Similarity Listens
100B rows
Listens
100B rows
Massive Real-world similarity query

Algorithm Performance Comparison (Log Scale)

Visualization Note: Bar heights show relative performance on a logarithmic scale. Shorter bars = better performance. Green highlights indicate the winning algorithm for each scenario.
Relative Algorithm Performance Across Scenarios (Log Scale) 10² 10³ 10⁴ 10⁵ 10⁶ 10⁷ Ex1: Small 10GB × 10GB Ex2: Large 100GB × 2TB Ex3: Sorted Pre-sorted Ex4: Self-Join 2TB × 2TB Ex5: Similarity 100B × 100B BNLJ BNLJ-rev SMJ HPJ (Green = Winner) IO Cost (Log Scale)
⚠️ Edge Cases Not Shown:
  • SMJ-BadBackup: When many duplicates exist, SMJ may need to backup and re-scan
  • HPJ-BadSkew: When data is heavily skewed, some hash partitions become too large
  • These edge cases can increase costs by 2-10× in worst cases

Key Observations

Example 1: Small Tables Win with BNLJ

Examples 2-5: Large Tables Need Smart Algorithms

Self-Joins Are Common and Expensive

The Query Optimizer's Decision Process

-- The optimizer evaluates all plans and picks the cheapest
EXPLAIN SELECT * FROM Songs s JOIN Listens l ON s.id = l.song_id;

-- Factors considered:
-- 1. Table sizes and statistics
-- 2. Available memory (B)
-- 3. Whether data is pre-sorted or indexed
-- 4. Expected output size
-- 5. Data distribution (skew detection)

Exercise 2: Impact of Machine Configuration on IO Costs

Scenario: Songs3 (100GB) × Listens3 (1TB)

Key Insight: Different machines have varying RAM (32GB to 640GB) and IO devices with different C_r and C_w costs. Query optimizers may allocate only a portion of RAM per query (20%-60%) when handling parallel workloads.

Cost Analysis Across Different Configurations

Buffer
Size (B)
C_r C_w BNLJ BNLJ-rev SMJ-best HPJ-best Winner
10
(tiny)
1 1 2.6M 2.6M 159K 54K HPJ
1 10 2.6M 2.6M 792K 216K HPJ
10 1 26M 26M 954K 378K HPJ
10 10 26M 26M 1.6M 540K HPJ
100
(small)
1 1 264K 279K 90K 54K HPJ
1 10 264K 279K 414K 216K HPJ
10 1 2.6M 2.8M 575K 378K HPJ
10 10 2.6M 2.8M 899K 540K HPJ
1,000
(medium)
1 1 34K 44K 87K 54K BNLJ
1 10 34K 44K 396K 216K BNLJ
10 1 344K 436K 558K 378K BNLJ
10 10 344K 436K 867K 540K BNLJ
10,000
(large)
1 1 18K 20K 54K 54K BNLJ
1 10 18K 20K 216K 216K BNLJ
10 1 180K 196K 378K 378K BNLJ
10 10 180K 196K 540K 540K BNLJ

Key Observations

Buffer Size Impact

  • Small B (10-100): HPJ dominates
  • Medium B (1,000): BNLJ becomes competitive
  • Large B (10,000): BNLJ wins.

Read/Write Cost Impact

  • High C_w: Penalizes algorithms with heavy writes (SMJ sorting phase)
  • High C_r: All costs scale up proportionally
  • Asymmetric costs: Can change winner even with same B

Practical Implications

  1. Cloud Environments: With variable RAM allocation per query, HPJ often wins due to its consistent performance

  2. SSDs vs HDDs: SSDs have C_r ≈ C_w, while HDDs often have C_w > C_r

  3. Memory-Rich Systems: BNLJ becomes optimal when B is large enough to hold significant chunks

  4. Multi-Query Workloads: Conservative B estimates favor HPJ's predictable performance


Key Takeaways

  1. Just 5 Algorithms power most database operations

  2. Memory (B) determines which algorithm is best

  3. No memorization needed: Formulas follow intuition