Index Fundamentals

Concept. An index is an auxiliary data structure that maps a key (e.g. user_id) to the page where matching rows live, turning a full-table scan into a one-IO lookup.

Intuition. Without an index, finding Mickey's listens means scanning every row in a billion-row Listens table. With an index on user_id, the database looks up "Mickey lives on page 47," reads page 47, and returns Mickey's 3 listens. Same answer, 100,000× fewer IOs.

What is an Index?

Think of an index as your database's streamlined guide. Instead of flipping through every page, you pinpoint exactly where you need to go.

Without Index

Scan entire table

With Index

Jump to exact page

10s of 1000s ×

Speedup


Key Concepts: What Indexes Really Store

Data Pages: 64MB pages packed with actual rows.

  • This is where the real stuff lives: artist names, genres, song titles, lyrics.

  • Rows can be hefty, stretching into the kilobytes.

  • Example: A song with lyrics might take up 10KB.

Index Pages: The Secret to Speed

  • Store only (search_key, page#) pairs
  • Much smaller: typically 8-20 bytes per entry
  • Like a book's index: "Beatles → Page 47"
  • ALL index types (e.g., Hash, B+Tree, LSM) work this way!

Pointer: Your map to data's location.

  • In memory, it's a memory address (e.g., 0x7fff5694a8c0).

  • On disk, it's a page number (e.g., Page 2, Page 47).

  • Both direct you to where the data resides.


Why Indexes Are So Much Smaller

Table Size vs Index Size Full Table (160,000 pages) song_id: 1 | "Hey Jude" | "Beatles" | "Hey Jude..." song_id: 2 | "Bohemian" | "Queen" | "Mamma, just killed..." song_id: 3 | "Shake It Off" | "Taylor Swift" | "I stay..." ... millions more rows (10KB each) Each row: ~10KB Artist Index (1 page!) "Beatles" → Page 47 "Taylor Swift" → Page 12 "Queen" → Page 23 Each entry: 20 bytes 500× smaller! 10 GB 20 MB

The Math: Why This Works

Songs Table: 1M songs × 10KB each = 10GB = 160,000 pages
Artist Index: 1M entries × 20 bytes = 20MB = 1 page!

Query: WHERE artist = 'Beatles'
• Without index: Scan all 160,000 pages
• With index: Read 1 index page + 1 data page = 2 IOs
Speedup: 80,000× faster!


The Cost of Indexes

Space

Extra storage needed

Writes

Must update on INSERT/UPDATE

Selectivity

Not all columns need indexes

When to Use Indexes

Good Candidates:

  • Primary keys (automatic in most DBs)

  • Foreign keys (for JOINs)

  • High selectivity columns (many unique values)

Poor Candidates:

  • Columns rarely used in queries

  • Low selectivity (e.g., boolean fields)

  • Frequently updated columns

  • Small tables (< 1000 rows)