Index Fundamentals
Concept. An index is an auxiliary data structure that maps a key (e.g. user_id) to the page where matching rows live, turning a full-table scan into a one-IO lookup.
Intuition. Without an index, finding Mickey's listens means scanning every row in a billion-row Listens table. With an index on user_id, the database looks up "Mickey lives on page 47," reads page 47, and returns Mickey's 3 listens. Same answer, 100,000× fewer IOs.
What is an Index?
Think of an index as your database's streamlined guide. Instead of flipping through every page, you pinpoint exactly where you need to go.
Without Index
Scan entire table
With Index
Jump to exact page
10s of 1000s ×
Speedup
Key Concepts: What Indexes Really Store
Data Pages: 64MB pages packed with actual rows.
-
This is where the real stuff lives: artist names, genres, song titles, lyrics.
-
Rows can be hefty, stretching into the kilobytes.
-
Example: A song with lyrics might take up 10KB.
Index Pages: The Secret to Speed
- Store only
(search_key, page#)pairs - Much smaller: typically 8-20 bytes per entry
- Like a book's index: "Beatles → Page 47"
- ALL index types (e.g., Hash, B+Tree, LSM) work this way!
Pointer: Your map to data's location.
-
In memory, it's a memory address (e.g., 0x7fff5694a8c0).
-
On disk, it's a page number (e.g., Page 2, Page 47).
-
Both direct you to where the data resides.
Why Indexes Are So Much Smaller
The Math: Why This Works
Songs Table: 1M songs × 10KB each = 10GB = 160,000 pages
Artist Index: 1M entries × 20 bytes = 20MB = 1 page!
Query: WHERE artist = 'Beatles'
• Without index: Scan all 160,000 pages
• With index: Read 1 index page + 1 data page = 2 IOs
• Speedup: 80,000× faster!
The Cost of Indexes
Space
Extra storage needed
Writes
Must update on INSERT/UPDATE
Selectivity
Not all columns need indexes
When to Use Indexes
Good Candidates:
-
Primary keys (automatic in most DBs)
-
Foreign keys (for JOINs)
-
High selectivity columns (many unique values)
Poor Candidates:
-
Columns rarely used in queries
-
Low selectivity (e.g., boolean fields)
-
Frequently updated columns
-
Small tables (< 1000 rows)