Index Fundamentals
What is an Index?
An index is like a table of contents for your data. You look up a "topic" (e.g, search key) in the index and jump directly to the right page.
Key Concepts: What Indexes Really Store
Data Pages: 64MB pages containing the actual rows
-
Store the real data: artist, genre, song titles, lyrics, etc.
-
Each row can be 100s or 1000s of bytes
-
Example: Songs table with lyrics → 10KB per row
Index Pages: The Secret to Speed
- Store only
(search_key, page#)pairs - Much smaller: typically 8-20 bytes per entry
- Like a book's index: "Beatles → Page 47"
- ALL index types (e.g., Hash, B+Tree, LSM) work this way!
Pointer: A disk page number that tells you where to find data
-
In memory: pointer = memory address (e.g., 0x7fff5694a8c0)
-
On disk: pointer = page number (e.g., Page 2, Page 47)
-
Same concept: both tell you where to look to find your data
Why Indexes Are So Much Smaller
Songs Table: 1M songs × 10KB each = 10GB = 160,000 pages
Artist Index: 1M entries × 20 bytes = 20MB = 1 page!
Query: WHERE artist = 'Beatles'
• Without index: Scan all 160,000 pages
• With index: Read 1 index page + 1 data page = 2 IOs
• Speedup: 80,000× faster!
The Cost of Indexes
When to Use Indexes
Good Candidates:
-
Primary keys (automatic in most DBs)
-
Foreign keys (for JOINs)
-
High selectivity columns (many unique values)
Poor Candidates:
-
Columns rarely used in queries
-
Low selectivity (e.g., boolean fields)
-
Frequently updated columns
-
Small tables (< 1000 rows)