Problems: IO Cost Model
Reference
See the IO Reference Sheet for official formulas and notation.

Setup
System Configuration:
-
Page Size: 64 MB
-
Spotify Songs Table: 500 million rows Ć 1024 bytes/row
-
Formulas and Device Specs: See IO Reference Sheet for complete C_r/C_w formulas
Quick Reference (64MB pages):
| Device | C_r = C_w | Access + Transfer |
|---|---|---|
| RAM | 0.00064s | 100ns + 64MB/100GB/s |
| SSD | 0.01281s | 10μs + 64MB/5GB/s |
| HDD | 0.65s | 10ms + 64MB/100MB/s |
| Network | 0.00641s | 1μs + 64MB/10GB/s |
Problem 1: Table Size Calculation
The Spotify Songs table has 500 million rows with an average row size of 1024 bytes. Calculate:
-
Total size in MB
-
Number of 64 MB pages needed
š Show Solution
Step 1: Calculate Total Size
Total Size = NumRows Ć RowSize
= 500,000,000 Ć 1024 bytes
= 512,000,000,000 bytes
= 512,000,000,000 / 1,000,000 MB
= 512,000 MB = 512 GB
Step 2: Calculate Number of Pages
numPages = Total Size / Page Size
= 512,000 MB / 64 MB
= 8,000 pages
Answer
ā Size: 512,000 MB (512 GB) | numPages: 8,000
Problem 2: Read Cost Comparison
Calculate the time (in seconds) to read 100 pages from different storage devices.
š Show Solution
For Each Device:
Cost = numPages Ć C_r where C_r includes both access time and transfer time: C_r = Access Time + PageSize / Scan Speed
RAM:
Cost = numPages Ć C_r = 100 Ć (100Ć10ā»ā¹ + 64MB/100GB/s)
= 100 Ć (0.0000001 + 0.00064)
= 100 Ć 0.0006401
= 0.064 seconds
SSD:
Cost = numPages Ć C_r = 100 Ć (10Ć10ā»ā¶ + 64MB/5GB/s)
= 100 Ć (0.00001 + 0.0128)
= 100 Ć 0.01281
= 1.281 seconds
HDD:
Cost = numPages Ć C_r = 100 Ć (10Ć10ā»Ā³ + 64MB/100MB/s)
= 100 Ć (0.01 + 0.64)
= 100 Ć 0.65
= 65 seconds
Answer
| Device | Time (seconds) | Relative Speed |
|---|---|---|
| RAM | 0.064 | 1Ć (fastest) |
| SSD | 1.281 | 20Ć slower |
| HDD | 65.0 | 1,016Ć slower |
Problem 3: Cache Hit Rate Impact
You need to read 200 pages with the following cache hierarchy:
-
Check RAM first for all pages (90% hit rate)
-
For RAM misses (10%): 75% found in SSD, 25% in HDD
Calculate the total read time, including the cost of checking RAM.
š Show Solution
Step 1: Check RAM for ALL pages
Must check RAM for all 200 pages to determine hits/misses
RAM Check Cost = numPages Ć C_r = 200 Ć (100ns + 64MB/100GB/s)
= 200 Ć 0.00064 = 0.128 seconds
Results: 180 pages found (90% hit), 20 pages miss (10%)
Step 2: Fetch misses from lower levels
Of the 20 misses:
ā 15 pages (75%) found in SSD
ā 5 pages (25%) found in HDD
SSD Fetch Cost = numPages à C_r = 15 à (10μs + 64MB/5GB/s)
= 15 Ć 0.01281 = 0.192 seconds
HDD Fetch Cost = numPages Ć C_r = 5 Ć (10ms + 64MB/100MB/s)
= 5 Ć 0.65 = 3.25 seconds
Step 3: Total Cost
Total = RAM Check + SSD Fetch + HDD Fetch
= 0.128 + 0.192 + 3.25 = 3.57 seconds
Answer
ā Total time: 3.57 seconds
Key Insights:
-
You ALWAYS pay the cost to check RAM for all pages - this is how caching works!
-
Even with 90% cache hit rate, the 2.5% that goes to HDD (5 out of 200 pages) dominates the total cost!
Problem 4: Mixed Read/Write Operations
You need to:
-
Read 30 pages
-
Write 10 pages
Calculate total cost for each storage device.
š Show Solution
Cost Formula
For mixed operations, calculate each separately:
Total = Read Cost + Write Cost
= numPages(read) Ć C_r + numPages(write) Ć C_w
= 30 Ć C_r + 10 Ć C_w
Since read and write costs are equal for these devices: C_r = C_w
Total = (30 + 10) Ć C_r = 40 Ć C_r
RAM:
Cost = 40 Ć C_r = 40 Ć 0.00064s = 0.026 seconds
SSD:
Cost = 40 Ć C_r = 40 Ć 0.01281s = 0.512 seconds
HDD:
Cost = 40 Ć C_r = 40 Ć 0.65s = 26 seconds
Problem 5: Network vs Local Storage
Compare reading 1 page from:
-
Local RAM
-
Local SSD
-
Local HDD
-
Network RAM (RAM on another machine)
š Show Solution
Local Storage Costs
Local RAM:
Cost = numPages Ć C_r = 1 Ć (100ns + 64MB/100GB/s)
= 1 Ć (0.0000001s + 0.00064s)
= 0.0006401 seconds
Local SSD:
Cost = numPages à C_r = 1 à (10μs + 64MB/5GB/s)
= 1 Ć (0.00001s + 0.0128s)
= 0.01281 seconds
Local HDD:
Cost = numPages Ć C_r = 1 Ć (10ms + 64MB/100MB/s)
= 1 Ć (0.01s + 0.64s)
= 0.65 seconds
Network RAM Cost
Three steps: Read from remote RAM ā Network transfer ā Write to local RAM
Remote RAM read: numPages Ć C_r = 1 Ć 0.0006401 = 0.0006401 seconds
Network transfer: numPages à C_network = 1 à (1μs + 64MB/10GB/s)
= 1 Ć (0.000001 + 0.0064) = 0.006401 seconds
Local RAM write: numPages Ć C_w = 1 Ć 0.0006401 = 0.0006401 seconds
Total: 0.0006401 + 0.006401 + 0.0006401 = 0.0076812 seconds
Answer
| Storage Type | Time (seconds) | Relative to Local RAM |
|---|---|---|
| Local RAM | 0.00064 | 1Ć (baseline) |
| Network RAM | 0.0077 | 12Ć slower |
| Local SSD | 0.0128 | 20Ć slower |
| Local HDD | 0.65 | 1,016Ć slower |
Key Insight: With modern 10GB/s networks, Network RAM is faster than Local SSD! This is why distributed caching works so well.
šÆ Transitioning to Algorithm Design
You now understand the physics! For the rest of this course:
- Ignore device differences - focus on IO patterns and algorithm complexity
- Count IOs, not seconds: C_r = C_w = 1 IO (abstract unit)
- Algorithm X reads 100 pages ā Cost = 100 IOs
- Algorithm Y reads 2ĆN pages ā Cost = 2N IOs
This lets us compare algorithms independent of hardware. When we say "Algorithm A costs 5N IOs" vs "Algorithm B costs N log N IOs", you can determine which is better based on data size N.
Example:
- Hash Join: 3N IOs (linear)
- Nested Loop Join: N² IOs (quadratic)
- For N=1000: Hash Join wins (3,000 vs 1,000,000 IOs)
Key Takeaways
-
HDD is 1000Ć slower than RAM for random access
-
Cache hit rates matter enormously - even 5% HDD access can dominate
-
Network RAM ā Local SSD in performance
-
Page size affects random access penalty - larger pages amortize seek cost
-
Moving forward: We'll count IOs (1 page = 1 IO) to analyze algorithm complexity