Index Fundamentals

What is a Database Index?

An index is like a table of contents for your data. Instead of reading an entire book to find a topic, you look it up in the index and jump directly to the right page.

Without Index
Scan entire table
With Index
Jump to exact page
10s of 1000s ×
Speedup

Key Concepts: What Indexes Really Store

Data Pages: 64MB pages containing the actual rows

📘 Index Pages: The Secret to Speed

  • Store only (search_key, page#) pairs
  • Much smaller: typically 8-20 bytes per entry
  • Like a book's index: "Beatles → Page 47"
  • Universal Truth: ALL index types (Hash, B+Tree, LSM) work this way!

Pointer: A disk page number that tells you where to find data


Visual: Why Indexes Are So Much Smaller

Table Size vs Index Size Full Table (160,000 pages) song_id: 1 | "Hey Jude" | "Beatles" | "Hey Jude..." song_id: 2 | "Bohemian" | "Queen" | "Mamma, just killed..." song_id: 3 | "Shake It Off" | "Taylor Swift" | "I stay..." ... millions more rows (10KB each) Each row: ~10KB Artist Index (1 page!) "Beatles" → Page 47 "Taylor Swift" → Page 12 "Queen" → Page 23 Each entry: 20 bytes 500× smaller! 10 GB 20 MB

Songs Table: 1M songs × 10KB each = 10GB = 160,000 pages
Artist Index: 1M entries × 20 bytes = 20MB = 1 page!

Query: WHERE artist = 'Beatles'
• Without index: Scan all 160,000 pages
• With index: Read 1 index page + 1 data page = 2 IOs
Speedup: 80,000× faster!


Trade-offs: There's No Free Lunch

The Cost of Indexes

Space
Extra storage needed
Writes
Must update on INSERT/UPDATE
Selectivity
Not all columns need indexes
Complexity
Query optimizer decisions

When to Use Indexes

Good Candidates:

Poor Candidates:


Different Index Types for Different Needs

Now that you understand what ALL indexes have in common, let's see how different index types organize these (key, page#) mappings:

Index Type Best For Cannot Do Organization
Hash Index Equality lookups (WHERE id = 42) Range queries Hash table
B+Tree Index Range queries & sorting - Sorted tree
LSM Tree Write-heavy workloads - Log-structured
Bitmap Index Low cardinality columns Updates Bit vectors

Key Takeaways

  1. Size is everything - 20MB index vs 10GB table = massive IO savings

  2. Trade-offs exist - Faster reads, but slower writes and extra storage

  3. Choose the right index type - Hash for equality, B+Tree for ranges, LSM for writes