Storage & Paging: The IO Cost Calculator
How Does One Machine Process 128GB with Only 16GB RAM?
Storage Hierarchy: Speed, Cost, and Capacity
The storage hierarchy is a ruthless game of trade-offs. Speed, cost, and capacity are the players, each vying for dominance. Here's the cold, hard truth: the faster the storage, the more it costs, and the less of it you get.
Key Observations: RAM is 100× faster than SSD, 100,000× faster than HDD • Cost/TB and Speed are inversely related.
IO Cost Definitions
Understanding IO costs is about recognizing the fixed overheads and the sustained rates.
Key Insight: For large pages (64MB), transfer time dominates access time. For small pages, access time dominates.
Refresher: Read your OS materials on how the OS' IO controllers work. DBs rely on OS for those details.
Modern Reality: CPUs/GPUs Can't Escape the Disk Bottleneck
The 10,000,000× Gap
The gap between storage and compute is staggering. A single HDD seek can cost you 10 million GPU operations.
The Math: 1 HDD seek = 10,000,000 GPU operations!
GPUs Are Data Hungry
The bottleneck chain is relentless:
-
Data lives on disk - Your 1TB dataset won't fit in 80GB GPU memory (e.g. A100 has 80GB in 2025)
-
PCI interfaces between hardware components are narrow - 32GB/s seems fast until you have 1TB to move
-
Compute is free - 312 TFLOPS means compute takes ~0 time