Storage & Paging: The IO Cost Calculator
How Does One Machine Process 128GB with Only 16GB RAM?
Storage Hierarchy: Speed, Cost, and Capacity
Key Observations: RAM is 100× faster than SSD, 100,000× faster than HDD • Network RAM (1μs) beats local HDD (10ms) • Cost/TB and Speed are inversely related.
IO Cost Definitions
Access Latency: Time to initiate an IO operation before data transfer begins - The fixed overhead for starting any IO operation - Examples: SSD access (10μs), HDD seeks (10ms), RAM access (100ns)
Throughput: Data transfer rate once operation begins
- The sustained rate at which data moves after access starts
- Examples: SSD (5 GB/s), RAM (100 GB/s), HDD (100 MB/s)
Key Insight: For large pages (64MB), transfer time dominates access time. For small pages, access time dominates.
Refresher: Read your OS materials on how the OS' IO controllers work. DBs rely on OS for those details.
Modern Reality: CPUs/GPUs Can't Escape the Disk Bottleneck
The 10,000,000× Gap
The Math: 1 HDD seek = 10,000,000 GPU operations!
GPUs Are Data Hungry
# Training a 100GB model
load_data_from_ssd = 20 seconds # 100GB ÷ 5GB/s
transfer_to_gpu = 3 seconds # 100GB ÷ 32GB/s (PCIe 4.0)
gpu_training_epoch = 0.5 seconds # Blazing fast compute!
# Where does time go?
# 87% loading data from SSD
# 13% transferring to GPU over PCIe
# 2% actual GPU compute
The Bottleneck Chain:
- Data lives on disk - Your 1TB dataset won't fit in 80GB GPU memory
- PCI interfaces between hardware components are narrow - 32GB/s seems fast until you have 1TB to move
- GPU memory is limited - Even A100 only has 80GB
- Compute is free - 312 TFLOPS means compute takes ~0 time