Section 2: Systems Basics
You know SQL and the architectural tradeoffs. Now, let's dissect how a database actually executes those operations.
The Key Problems We Solve
1. The I/O Bottleneck
Hardware dictates software behavior.
-
The Problem: Why does one operation fly while another crawls? Why does a query suddenly choke the server?
-
The Reality: We build an I/O Cost Model based on the latency gap between RAM and Disk. You'll see how databases use paging to sidestep costly disk access.
2. The Search Problem
Scanning a billion rows? That's a fool's errand.
-
The Problem: How do systems pinpoint a record—or confirm its absence—without combing through everything?
-
The Reality: We'll dive into Basic Hashing and Bloom Filters. Learn how databases use mathematical shortcuts to dodge unnecessary disk reads.
3. The Physical Constraints Problem
Data centers aren't infinite.
-
The Problem: How do you store petabytes without ballooning costs or delays?
-
The Reality: We explore Compression techniques, showing how databases squeeze more into fewer bytes by trading CPU cycles for savings on pricey disk operations.
4. The Hardware Failure Problem
Thousands of drives mean constant failures.
-
The Problem: How do systems disguise these failures and maintain the illusion of a single, reliable storage layer?
-
The Reality: We'll explore Basic Distributed Algorithms—Sharding, Replication, and Leader Election—and see how Google tackled this with the Google File System (GFS).
Let's peek under the hood.