Index Fundamentals
What is a Database Index?
An index is like a table of contents for your data. Instead of reading an entire book to find a topic, you look it up in the index and jump directly to the right page.
Key Concepts: What Indexes Really Store
Data Pages: 64MB pages containing the actual rows
-
Store the real data: artist, genre, song titles, lyrics, etc.
-
Each row can be 100s or 1000s of bytes
-
Example: Songs table with lyrics → 10KB per row
📘 Index Pages: The Secret to Speed
- Store only
(search_key, page#)pairs - Much smaller: typically 8-20 bytes per entry
- Like a book's index: "Beatles → Page 47"
- Universal Truth: ALL index types (Hash, B+Tree, LSM) work this way!
Pointer: A disk page number that tells you where to find data
-
In memory: pointer = memory address (e.g., 0x7fff5694a8c0)
-
On disk: pointer = page number (e.g., Page 2, Page 47)
-
Same concept: both tell you where to look to find your data
Visual: Why Indexes Are So Much Smaller
Songs Table: 1M songs × 10KB each = 10GB = 160,000 pages
Artist Index: 1M entries × 20 bytes = 20MB = 1 page!
Query: WHERE artist = 'Beatles'
• Without index: Scan all 160,000 pages
• With index: Read 1 index page + 1 data page = 2 IOs
• Speedup: 80,000× faster!
Trade-offs: There's No Free Lunch
The Cost of Indexes
When to Use Indexes
✅ Good Candidates:
-
Primary keys (automatic in most DBs)
-
Foreign keys (for JOINs)
-
Columns in WHERE clauses
-
Columns in ORDER BY
-
High selectivity columns (many unique values)
❌ Poor Candidates:
-
Columns rarely used in queries
-
Low selectivity (e.g., boolean fields)
-
Frequently updated columns
-
Small tables (< 1000 rows)
Different Index Types for Different Needs
Now that you understand what ALL indexes have in common, let's see how different index types organize these (key, page#) mappings:
| Index Type | Best For | Cannot Do | Organization |
|---|---|---|---|
| Hash Index | Equality lookups (WHERE id = 42) |
Range queries | Hash table |
| B+Tree Index | Range queries & sorting | - | Sorted tree |
| LSM Tree | Write-heavy workloads | - | Log-structured |
| Bitmap Index | Low cardinality columns | Updates | Bit vectors |
Key Takeaways
-
Size is everything - 20MB index vs 10GB table = massive IO savings
-
Trade-offs exist - Faster reads, but slower writes and extra storage
-
Choose the right index type - Hash for equality, B+Tree for ranges, LSM for writes