Index Fundamentals

What is an Index?

An index is like a table of contents for your data. You look up a "topic" (e.g, search key) in the index and jump directly to the right page.

Without Index
Scan entire table
With Index
Jump to exact page
10s of 1000s ×
Speedup

Key Concepts: What Indexes Really Store

Data Pages: 64MB pages containing the actual rows

Index Pages: The Secret to Speed

  • Store only (search_key, page#) pairs
  • Much smaller: typically 8-20 bytes per entry
  • Like a book's index: "Beatles → Page 47"
  • ALL index types (e.g., Hash, B+Tree, LSM) work this way!

Pointer: A disk page number that tells you where to find data


Why Indexes Are So Much Smaller

Table Size vs Index Size Full Table (160,000 pages) song_id: 1 | "Hey Jude" | "Beatles" | "Hey Jude..." song_id: 2 | "Bohemian" | "Queen" | "Mamma, just killed..." song_id: 3 | "Shake It Off" | "Taylor Swift" | "I stay..." ... millions more rows (10KB each) Each row: ~10KB Artist Index (1 page!) "Beatles" → Page 47 "Taylor Swift" → Page 12 "Queen" → Page 23 Each entry: 20 bytes 500× smaller! 10 GB 20 MB

Songs Table: 1M songs × 10KB each = 10GB = 160,000 pages
Artist Index: 1M entries × 20 bytes = 20MB = 1 page!

Query: WHERE artist = 'Beatles'
• Without index: Scan all 160,000 pages
• With index: Read 1 index page + 1 data page = 2 IOs
Speedup: 80,000× faster!


The Cost of Indexes

Space
Extra storage needed
Writes
Must update on INSERT/UPDATE
Selectivity
Not all columns need indexes

When to Use Indexes

Good Candidates:

Poor Candidates: