IO Algorithms: Cost Analysis & Intuition

Concept. The IO cost of every algorithm in this module (BigSort, HashPartition, BNLJ, SMJ, HPJ) can be expressed as a function of table size, buffer size, and the read/write cost ratio.

Intuition. Different algorithms win in different regimes. Small buffer favors HPJ; large buffer favors BNLJ; pre-sorted input favors SMJ. This page pulls all five formulas together so you can compare them side by side.

CS145 Equations Sheet

Real-World Examples: Choosing the Right Join Algorithm

Assume C_r = C_w = 1. For the JOINs below, we always add OUT (for the size of the output result). Always use the Equation Sheet for the full formulae.

Test Setup: B = 100 pages (≈ 6.4 GB RAM), Page Size = 64 MB

These examples use realistic table sizes from actual database workloads. Costs shown are simplified for clarity; real optimizer costs include additional factors.

Scenario Overview

Example Description Table R Table S Special Conditions Key Insight
Ex1 Small Tables Songs
10 GB
Listens
10 GB
none Tables fit mostly in RAM
Ex2 Large Tables Songs
100 GB
Listens
2 TB
none Tables >> buffer size
Ex3 Pre-sorted Tables Songs ↑
100 GB sorted
Listens ↑
2 TB sorted
Pre-sorted SMJ skips sorting phase
Ex4 Self-Join Listens
2 TB
Listens
2 TB (same)
Self-join Can't optimize table order
Ex5 Song Similarity Listens
100B rows
Listens
100B rows
Massive Real-world similarity query

Algorithm Performance Comparison (Log Scale)

Edge cases not shown in the table:

  • SMJ bad-backup. When many duplicates exist, SMJ may have to back up and re-scan.

  • HPJ bad-skew. When data is heavily skewed, some hash partitions become too large to fit in RAM.

  • These edge cases can increase costs by 2–10× in the worst case.

Key Observations

Example 1: Small Tables Win with BNLJ

  • Songs (10GB) and Listens (10GB) are small relative to buffer size (6.4GB)

  • BNLJ performs the join mostly in RAM without preprocessing overhead

  • No sorting or partitioning needed → Lower cost than SMJ/HPJ

Examples 2-5: Large Tables Need Smart Algorithms

  • When tables >> buffer size, BNLJ becomes prohibitively expensive

  • SMJ and HPJ scale much better with data size

  • Pre-sorted data (Ex3) gives SMJ a huge advantage

Self-Joins Are Common and Expensive

  • Examples 4-5 show real patterns (finding similar songs, user correlations)

  • Self-joins can't optimize by choosing smaller outer relation

  • SMJ/HPJ are essential for these massive operations

The Query Optimizer's Decision Process


Exercise 2: Impact of Machine Configuration on IO Costs

Scenario: Songs3 (100GB) × Listens3 (1TB)

  • Songs3: 100GB = 1,600 pages

  • Listens3: 1TB = 16,384 pages

  • Row size: 1024 bytes

Key insight. Different machines have varying RAM (32 GB to 640 GB) and IO devices with different C_r and C_w costs. Query optimizers may allocate only 20–60% of RAM per query when running parallel workloads, which changes which algorithm wins.

Cost Analysis Across Different Configurations

Buffer
Size (B)
C_r C_w BNLJ BNLJ-rev SMJ-best HPJ-best Winner
10
(tiny)
1 1 2.6M 2.6M 159K 54K HPJ
1 10 2.6M 2.6M 792K 216K HPJ
10 1 26M 26M 954K 378K HPJ
10 10 26M 26M 1.6M 540K HPJ
100
(small)
1 1 264K 279K 90K 54K HPJ
1 10 264K 279K 414K 216K HPJ
10 1 2.6M 2.8M 575K 378K HPJ
10 10 2.6M 2.8M 899K 540K HPJ
1,000
(medium)
1 1 34K 44K 87K 54K BNLJ
1 10 34K 44K 396K 216K BNLJ
10 1 344K 436K 558K 378K BNLJ
10 10 344K 436K 867K 540K BNLJ
10,000
(large)
1 1 18K 20K 54K 54K BNLJ
1 10 18K 20K 216K 216K BNLJ
10 1 180K 196K 378K 378K BNLJ
10 10 180K 196K 540K 540K BNLJ

Key Observations

Buffer Size Impact

  • Small B (10-100): HPJ dominates
  • Medium B (1,000): BNLJ becomes competitive
  • Large B (10,000): BNLJ wins.

Read/Write Cost Impact

  • High C_w: Penalizes algorithms with heavy writes (SMJ sorting phase)
  • High C_r: All costs scale up proportionally
  • Asymmetric costs: Can change winner even with same B

Practical Implications

  1. Cloud Environments: With variable RAM allocation per query, HPJ often wins due to its consistent performance

  2. SSDs vs HDDs: SSDs have C_r ≈ C_w, while HDDs often have C_w > C_r

  3. Memory-Rich Systems: BNLJ becomes optimal when B is large enough to hold significant chunks

  4. Multi-Query Workloads: Conservative B estimates favor HPJ's predictable performance


Key Takeaways

  1. Just 5 algorithms power most database operations.

  2. Memory (B) determines which algorithm is best.

  3. No memorization needed. Formulas follow intuition.


Next

Storage & Algorithms Quiz → Check your understanding of storage layouts, hash partitioning, BigSort, and the three join algorithms.