B+Tree Indexes
Concept. A B+Tree index keeps keys sorted in a balanced tree, supporting O(log N) lookups and range scans (WHERE rating BETWEEN 4.0 AND 5.0) by walking the leaf pages in order.
Intuition. When you query "find Listens with rating between 4.0 and 5.0," a B+Tree on rating walks down the tree to the first 4.0, then reads the leaf pages in order until it passes 5.0 — no need to revisit the tree for each value.
The Workhorse of Database Indexing
B+Trees are the go-to index type for most databases, adept at handling both point queries and range scans with ease.
Prerequisites: Familiarize yourself with Index Fundamentals to grasp how indexes store pointers, not data.
O(log N)
Lookup Time
Sorted Order
Enables ranges
3-4 Levels
For billions of rows
Sequential IO
For range scans
Physical Layout with B+Tree Index
Query Example: Finding Songs in a Date Range
Query: Find all songs released between days 12000-14000
Range Query Performance
Why B+Trees Excel at Ranges
-- Find songs released between March-July 2000 (around days 12000-14000 since 1970)
SELECT * FROM songs
WHERE release_date_since_1970 BETWEEN 12000 AND 14000
-- Find songs released in the 1990s (days 7300-11000 since 1970)
SELECT * FROM songs
WHERE release_date_since_1970 BETWEEN 7300 AND 11000
Step 1
Find start: 3 IOs
Step 2
Scan leaves sequentially
Step 3
Stop at end key
Result
Mostly sequential IO!
B+Tree Operations (Optional Pseudocode)
// Build B+Tree from Songs table on release_date_since_1970
BTreeIndex(songs_table, release_date_since_1970):
// First: sort songs by release date (days since 1970)
sorted_songs = sort(songs_table, by=release_date_since_1970)
// Build leaf level from sorted data
leaves = []
for chunk of FANOUT listens in sorted_listens:
leaf = create_leaf_node(chunk) // Stores (days, page#)
link leaf to next leaf // For range scans!
leaves.append(leaf)
// Build internal nodes bottom-up
current_level = leaves
while len(current_level) > 1:
next_level = []
for chunk of FANOUT nodes in current_level:
parent = create_internal_node(chunk)
next_level.append(parent)
current_level = next_level
return current_level[0] // Root
// Point Lookup - Find song released on specific day
Lookup(root, day_12500):
node = root
// Walk down the tree (12500 > 3187 and < 14821, go middle)
while not node.is_leaf():
node = node.find_child(12500) // Binary search, 1 IO
// Search in leaf - returns page numbers
return node.find(12500) // Returns: [Page 5]
// Range Query - Find songs released between days 12000-14000
Range(root, start_day=12000, end_day=14000):
// Find starting leaf
leaf = find_leaf(root, 12000) // 3 IOs to reach leaf
pages_to_read = set()
// Scan leaves sequentially
while leaf and leaf.min_day <= 14000:
for (day, page_num) in leaf:
if 12000 <= day <= 14000:
pages_to_read.add(page_num)
leaf = leaf.next // Follow pointers - sequential!
// Now read the actual data pages
return read_pages(pages_to_read) // Pages 4, 5, 6
The Magic of Fanout
How Fanout is Calculated
B+Tree Design Formula
Fanout = ⌊Page Size / (Key Size + Pointer Size)⌋ Height = ⌈log_fanout(Number of Keys)⌉ Simple Examples: • 4KB page, 4-byte key, 4-byte pointer Fanout = ⌊4096 / (4 + 4)⌋ = 512 • 8KB page, 4-byte key, 4-byte pointer Fanout = ⌊8192 / (4 + 4)⌋ = 1024 ≈ 1000 • 4KB page, 20-byte key, 8-byte pointer Fanout = ⌊4096 / (20 + 8)⌋ = 146
Real-World Examples: Impact of Page Size and Key Size
| Page Size = 64 KB (0.0625 MB) | |||||
|---|---|---|---|---|---|
| Keys | Key Size | Index Size | Fanout | Height | IOs per Lookup |
| 1M | 8 bytes | 15 MB | 4,096 | 2 | 2 reads |
| 1 Trillion | 8 bytes | 14.9 TB | 4,096 | 4 | 4 reads |
| 1M | 1 KB | 969 MB | 63 | 4 | 4 reads |
| 1 Trillion | 1 KB | 969 TB | 63 | 7 | 7 reads |
| Page Size = 64 MB (1000× larger!) | |||||
|---|---|---|---|---|---|
| Keys | Key Size | Index Size | Fanout | Height | IOs per Lookup |
| 1M | 8 bytes | 15 MB | 4,194,304 | 1 | 1 read! |
| 1 Trillion | 8 bytes | 14.9 TB | 4,194,304 | 2 | 2 reads! |
| 1M | 1 KB | 961 MB | 65,027 | 2 | 2 reads! |
| 1 Trillion | 1 KB | 961 TB | 65,027 | 3 ⚡ | 3 reads! |
Key Insight: Page Size Changes Everything!
- 64 KB pages: 1 trillion keys with 1KB search keys = 7 IOs (slow!)
- 64 MB pages: Same scenario = only 3 IOs (2× faster!)
- Trade-off: Larger pages = more memory per node, but shallower trees
- Real DBs: PostgreSQL uses 8KB, MySQL InnoDB 16KB, Cloud analytics (BigQuery/Snowflake) use MB-sized blocks for columnar data
Key Takeaways
-
Fanout is the secret - F=1000 means log₁₀₀₀(N) levels, not log₂(N)
-
Shallow trees - Billion rows = only 3-4 IOs
-
Range queries shine - Sorted leaves + pointers = sequential scan