Storage & Paging: The IO Cost Calculator

How Does One Machine Process 128GB with Only 16GB RAM?


Storage Hierarchy: Speed, Cost, and Capacity

The storage hierarchy is a ruthless game of trade-offs. Speed, cost, and capacity are the players, each vying for dominance. Here's the cold, hard truth: the faster the storage, the more it costs, and the less of it you get.

Storage Level Access Latency Throughput Cost per TB Typical Capacity Use Case
CPU Registers 1 cycle - - < 1KB Immediate values
L1/L2 Cache 1-10ns - - 64KB - 8MB Hot instructions
RAM (Buffer Pool) 100ns 100 GB/s $3,500 16GB Working set pages
SSD 10μs 5 GB/s $75 512GB Active tables
HDD 10ms 100 MB/s $25 4TB Cold storage
Network Storage 1μs 10 GB/s Variable Distributed cache

Key Observations: RAM is 100× faster than SSD, 100,000× faster than HDD • Cost/TB and Speed are inversely related.


IO Cost Definitions

Understanding IO costs is about recognizing the fixed overheads and the sustained rates.

**Access Latency**: Time to initiate an IO operation before data transfer begins - The fixed overhead for starting any IO operation - Examples: SSD access (10μs), HDD seeks (10ms), RAM access (100ns) **Throughput**: Data transfer rate once operation begins - The sustained rate at which data moves after access starts - Examples: SSD (5 GB/s), RAM (100 GB/s), HDD (100 MB/s)

Key Insight: For large pages (64MB), transfer time dominates access time. For small pages, access time dominates.

Refresher: Read your OS materials on how the OS' IO controllers work. DBs rely on OS for those details.


Modern Reality: CPUs/GPUs Can't Escape the Disk Bottleneck

The 10,000,000× Gap

The gap between storage and compute is staggering. A single HDD seek can cost you 10 million GPU operations.

10ms
HDD seek time
10μs
SSD latency
100ns
RAM access
1ns
CPU/GPU cycle

The Math: 1 HDD seek = 10,000,000 GPU operations!

GPUs Are Data Hungry

The bottleneck chain is relentless:

  1. Data lives on disk - Your 1TB dataset won't fit in 80GB GPU memory (e.g. A100 has 80GB in 2025)

  2. PCI interfaces between hardware components are narrow - 32GB/s seems fast until you have 1TB to move

  3. Compute is free - 312 TFLOPS means compute takes ~0 time