Problem Solving: Saving $550,000 For Queries
Concept. A single unindexed query running every minute on a multi-TB table can cost a company $550,000/year in cloud-warehouse charges — an indexed version of the same query costs less than $200.
Intuition. When an engineer writes SELECT * FROM Listens WHERE user_id = 5 against a billion-row table with no index, every run scans 320,400 pages. Adding a hash index on user_id drops it to 49,200 IOs — 6.5× cheaper per run, $549,800 cheaper per year.
The Bill
Goal: Understand how Indexing, Columnar Storage, and Compression tackle a costly "Query Disaster."
An unoptimized weekly "Trending Genres" aggregation query executes 100 times daily at 9 hours per run, incurring $1,500/day ($550,000/year) in AWS compute costs.
System Archeology: The Schema
To optimize this workload, we must analyze the physical schema layout.
| Component | Stats | Disk Footprint (Standard Pages) |
|---|---|---|
| Songs Table | ~50M songs | 400 pages (25 GB) |
| Listens Table | ~10B total listens | 16,000 pages (1 TB) |
| Weekly Scope | Oct 21-27 Filter | 150 pages (~1% of data) |
| Memory (B) | Buffer Pool | 20 pages (1.28 GB) |
We use a standard page size of 64MB for all calculations in this study.
Problem 0: The Baseline Investigation
Problem: The weekly trending report is taking 9 hours. Before optimizing, we need to calculate the current "Baseline" cost. What is the IO cost of a Hash Partition Join (HPJ) on these two tables? Should the optimizer choose a Block Nested Loop Join (BNLJ) instead?
🔍 Show Solution
Disaster Option 1: Block Nested Loop Join (BNLJ)
If the query optimizer picked BNLJ, it would be a catastrophe.
-
Formula: M + (M / B) * N
-
Math: 400 + (400 / 20) * 16,000 = 320,400 IOs
-
The Why: BNLJ forces the database to scan the 1TB table for every block of the 25GB table. It's the "Disaster" scenario for big data.
Disaster Option 2: Hash Partition Join (HPJ)
HPJ is better, but still flawed for this use case.
-
Formula: 3 * (M + N)
-
Math: 3 * (400 + 16,000) = 49,200 IOs
-
The Why: HPJ is efficient for joining two large tables, but here we only need 1% of the data. Scanning the entire 1TB to find that 1% is massive waste.
Problem 1: The Index Win
Problem: We know that only ~150 pages out of the 16,000 contain the data for Oct 21-27. How do we skip the other pages, and what is the new IO cost?
🔍 Show Solution
-
Logic: Create a B+ Tree Index on
Listens(listen_time). -
Design: The index allows a more targeted scan rather than a full join or scan.
-
IO Math:
- Scan Songs: 400 IOs
- Index Lookup: ~4 IOs
- Fetch filtered Listens: 150 IOs
- Total: ~554 IOs
-
The Why: Indexes prune the search space. By jumping directly to the start date and reading only the 1% target data, we achieve a 90x improvement vs plain HPJ.
Problem 2: The Column Store Win
Problem: Even with an index, row-based storage loads 150 bytes per listen just to read the 24 bytes we need (song_id, user_id, time, duration). How do we eliminate this "column waste," and what is the estimated IO win?
🔍 Show Solution
-
Logic: Switch to a Columnar Store (e.g., BigQuery, Parquet).
-
Design: Physically store each field in its own set of pages. This allows the CPU to ignore the IP addresses and device IDs entirely.
-
IO Math:
- We effectively read only 1/6th of the row width.
- Total: ~60 IOs (Listens: 25 IOs + Songs: 35 IOs)
-
The Why: Columnar storage is optimized for aggregation. By only reading the columns mentioned in the
SELECTandWHEREclauses, we reduce IO by another 9x.
Problem 3: The Final Compression
Problem: How can we reduce the footprint of the genre and rating columns without losing data?
🔍 Show Solution
-
Technique 1 (Dictionary): Map the 200 unique Genres to 1-byte IDs (16 pages -> 1 page).
-
Technique 2 (Bit Packing): Since ratings are 1-5, store them in 4 bits instead of a 32-bit integer (7 pages -> 1 page).
-
Technique 3 (RLE): Since data is sorted, compress repeating User IDs (13 pages -> 5 pages).
-
Final Victory: ~35 IOs | 20 Seconds | $400 / Year
-
The Why: Domain-specific compression.
Summary: The Compound ROI
We optimized the $550,000 disaster to $400 by multiplying three discrete LEGO blocks:
| Stage | IO Reduction | Annual Cost | Win Factor |
|---|---|---|---|
| Baseline (HPJ) | 50,000 IOs | $550,000 | 1x |
| + Index | 550 IOs | $6,500 | 90x |
| + Columnar | 60 IOs | $730 | 9x |
| + Compression | 35 IOs | $365 | 2x |
Total Impact: 1,400x improvement.