Problems: IO Cost Model

Reference

See the IO Reference Sheet for official formulas and notation.

IO Cost Model Reference Card

Setup

System Configuration:

Quick Reference (64MB pages):

Device C_r = C_w Access + Transfer
RAM 0.00064s 100ns + 64MB/100GB/s
SSD 0.01281s 10μs + 64MB/5GB/s
HDD 0.65s 10ms + 64MB/100MB/s
Network 0.00641s 1μs + 64MB/10GB/s

Problem 1: Table Size Calculation

The Spotify Songs table has 500 million rows with an average row size of 1024 bytes. Calculate:

  1. Total size in MB

  2. Number of 64 MB pages needed

šŸ” Show Solution

Step 1: Calculate Total Size

Total Size = NumRows Ɨ RowSize
         = 500,000,000 Ɨ 1024 bytes
         = 512,000,000,000 bytes
         = 512,000,000,000 / 1,000,000 MB
         = 512,000 MB = 512 GB

Step 2: Calculate Number of Pages

numPages = Total Size / Page Size
        = 512,000 MB / 64 MB
        = 8,000 pages

Answer

āœ… Size: 512,000 MB (512 GB) | numPages: 8,000


Problem 2: Read Cost Comparison

Calculate the time (in seconds) to read 100 pages from different storage devices.

šŸ” Show Solution

For Each Device:

Cost = numPages Ɨ C_r where C_r includes both access time and transfer time: C_r = Access Time + PageSize / Scan Speed

RAM:

Cost = numPages Ɨ C_r = 100 Ɨ (100Ɨ10⁻⁹ + 64MB/100GB/s)
     = 100 Ɨ (0.0000001 + 0.00064)
     = 100 Ɨ 0.0006401
     = 0.064 seconds

SSD:

Cost = numPages Ɨ C_r = 100 Ɨ (10Ɨ10⁻⁶ + 64MB/5GB/s)
     = 100 Ɨ (0.00001 + 0.0128)
     = 100 Ɨ 0.01281
     = 1.281 seconds

HDD:

Cost = numPages Ɨ C_r = 100 Ɨ (10Ɨ10⁻³ + 64MB/100MB/s)
     = 100 Ɨ (0.01 + 0.64)
     = 100 Ɨ 0.65
     = 65 seconds

Answer

Device Time (seconds) Relative Speed
RAM 0.064 1Ɨ (fastest)
SSD 1.281 20Ɨ slower
HDD 65.0 1,016Ɨ slower

Problem 3: Cache Hit Rate Impact

You need to read 200 pages with the following cache hierarchy:

Calculate the total read time, including the cost of checking RAM.

Cache Hierarchy: Reading 200 Pages Request: 200 Pages Must check RAM first Check all 200 RAM Cache Check Cost: 200 Ɨ 0.00064s = 0.128s āœ“ 180 pages found (90%) āœ— 20 pages miss (10%) 15 pages (75%) 5 pages (25%) SSD Fetch 15 pages Ɨ 0.01281s = 0.192s HDD Fetch 5 pages Ɨ 0.65s = 3.25s Total Time 0.128s + 0.192s + 3.25s = 3.57s 2.5% of pages (5 HDD) = 91% of time!
šŸ” Show Solution

Step 1: Check RAM for ALL pages

Must check RAM for all 200 pages to determine hits/misses
RAM Check Cost = numPages Ɨ C_r = 200 Ɨ (100ns + 64MB/100GB/s)
               = 200 Ɨ 0.00064 = 0.128 seconds

Results: 180 pages found (90% hit), 20 pages miss (10%)

Step 2: Fetch misses from lower levels

Of the 20 misses:
  → 15 pages (75%) found in SSD
  → 5 pages (25%) found in HDD

SSD Fetch Cost = numPages Ɨ C_r = 15 Ɨ (10μs + 64MB/5GB/s)
               = 15 Ɨ 0.01281 = 0.192 seconds

HDD Fetch Cost = numPages Ɨ C_r = 5 Ɨ (10ms + 64MB/100MB/s)
               = 5 Ɨ 0.65 = 3.25 seconds

Step 3: Total Cost

Total = RAM Check + SSD Fetch + HDD Fetch
      = 0.128 + 0.192 + 3.25 = 3.57 seconds

Answer

āœ… Total time: 3.57 seconds

Key Insights:

  1. You ALWAYS pay the cost to check RAM for all pages - this is how caching works!

  2. Even with 90% cache hit rate, the 2.5% that goes to HDD (5 out of 200 pages) dominates the total cost!


Problem 4: Mixed Read/Write Operations

You need to:

Calculate total cost for each storage device.

šŸ” Show Solution

Cost Formula

For mixed operations, calculate each separately:

Total = Read Cost + Write Cost
      = numPages(read) Ɨ C_r + numPages(write) Ɨ C_w
      = 30 Ɨ C_r + 10 Ɨ C_w

Since read and write costs are equal for these devices: C_r = C_w

Total = (30 + 10) Ɨ C_r = 40 Ɨ C_r

RAM:

Cost = 40 Ɨ C_r = 40 Ɨ 0.00064s = 0.026 seconds

SSD:

Cost = 40 Ɨ C_r = 40 Ɨ 0.01281s = 0.512 seconds

HDD:

Cost = 40 Ɨ C_r = 40 Ɨ 0.65s = 26 seconds

Problem 5: Network vs Local Storage

Compare reading 1 page from:

  1. Local RAM

  2. Local SSD

  3. Local HDD

  4. Network RAM (RAM on another machine)

Reading 1 Page: Local vs Network Storage Local Machine Local RAM Direct access 0.00064s Local SSD 10μs + transfer 0.0128s Local HDD 10ms + transfer 0.65s Network RAM Path 1. Read from remote RAM 0.00064s 2. Network transfer (10GB/s) 0.0064s 3. Write to local RAM 0.00064s Total: 0.0077s Remote Machine Remote RAM 64MB Page Read: 0.00064s 10 GB/s Network Network RAM (0.0077s) is faster than Local SSD (0.0128s)!
šŸ” Show Solution

Local Storage Costs

Local RAM:

Cost = numPages Ɨ C_r = 1 Ɨ (100ns + 64MB/100GB/s)
     = 1 Ɨ (0.0000001s + 0.00064s)
     = 0.0006401 seconds

Local SSD:

Cost = numPages Ɨ C_r = 1 Ɨ (10μs + 64MB/5GB/s)
     = 1 Ɨ (0.00001s + 0.0128s)
     = 0.01281 seconds

Local HDD:

Cost = numPages Ɨ C_r = 1 Ɨ (10ms + 64MB/100MB/s)
     = 1 Ɨ (0.01s + 0.64s)
     = 0.65 seconds

Network RAM Cost

Three steps: Read from remote RAM → Network transfer → Write to local RAM

Remote RAM read:  numPages Ɨ C_r = 1 Ɨ 0.0006401 = 0.0006401 seconds
Network transfer: numPages Ɨ C_network = 1 Ɨ (1μs + 64MB/10GB/s)
                = 1 Ɨ (0.000001 + 0.0064) = 0.006401 seconds
Local RAM write:  numPages Ɨ C_w = 1 Ɨ 0.0006401 = 0.0006401 seconds
Total:           0.0006401 + 0.006401 + 0.0006401 = 0.0076812 seconds

Answer

Storage Type Time (seconds) Relative to Local RAM
Local RAM 0.00064 1Ɨ (baseline)
Network RAM 0.0077 12Ɨ slower
Local SSD 0.0128 20Ɨ slower
Local HDD 0.65 1,016Ɨ slower

Key Insight: With modern 10GB/s networks, Network RAM is faster than Local SSD! This is why distributed caching works so well.



šŸŽÆ Transitioning to Algorithm Design

You now understand the physics! For the rest of this course:

  • Ignore device differences - focus on IO patterns and algorithm complexity
  • Count IOs, not seconds: C_r = C_w = 1 IO (abstract unit)
  • Algorithm X reads 100 pages → Cost = 100 IOs
  • Algorithm Y reads 2ƗN pages → Cost = 2N IOs

This lets us compare algorithms independent of hardware. When we say "Algorithm A costs 5N IOs" vs "Algorithm B costs N log N IOs", you can determine which is better based on data size N.

Example: - Hash Join: 3N IOs (linear) - Nested Loop Join: N² IOs (quadratic)
- For N=1000: Hash Join wins (3,000 vs 1,000,000 IOs)


Key Takeaways

  1. HDD is 1000Ɨ slower than RAM for random access

  2. Cache hit rates matter enormously - even 5% HDD access can dominate

  3. Network RAM ā‰ˆ Local SSD in performance

  4. Page size affects random access penalty - larger pages amortize seek cost

  5. Moving forward: We'll count IOs (1 page = 1 IO) to analyze algorithm complexity