Problem Solving: Saving $550,000 For Queries

Putting It All Together Reading Time: 15 mins

The Bill

Goal: Understand how Indexing, Columnar Storage, and Compression multiply to solve a "Query Disaster."

You review your AWS bill. One query—a weekly report on "Trending Genres"—costs Spotify $1,500 every single day. It takes 9 hours to run, and runs 100 times daily.

Total cost? $550,000 per year.

System Archeology: The Schema

To fix the disaster, let's understand the data's physical footprint.

Component	Stats	Disk Footprint (Standard Pages)
Songs Table	~50M songs	400 pages (25 GB)
Listens Table	~10B total listens	16,000 pages (1 TB)
Weekly Scope	Oct 21-27 Filter	150 pages (~1% of data)
Memory (B)	Buffer Pool	20 pages (1.28 GB)

We use a standard page size of 64MB for all calculations in this study.

Problem 0: The Baseline Investigation

Problem: The weekly trending report is taking 9 hours. Before optimizing, we need to calculate the current "Baseline" cost. What is the IO cost of a Hash Partition Join (HPJ) on these two tables. Should the optimizer chose a Block Nested Loop Join (BNLJ) instead?

🔍 Show Solution

Disaster Option 1: Block Nested Loop Join (BNLJ)

If the query optimizer chose BNLJ, results would be catastrophic.

Formula: M + (M / B) N

Math: 400 + (400 / 20) 16,000 = 320,400 IOs
The Why: BNLJ forces the database to scan the 1TB table for every block of the 25GB table. It is the absolute "Disaster" scenario for big data.

Disaster Option 2: Hash Partition Join (HPJ)

HPJ is much better but still flawed for this use case.

Formula: 3 * (M + N)
Math: 3 * (400 + 16,000) = 49,200 IOs
The Why: HPJ is efficient for joining two large tables, but here we only need 1% of the data. Scanning the entire 1TB to find that 1% is massive waste.

Problem 1: The Index Win

Problem: We know that only ~150 pages out of the 16,000 contain the data for Oct 21-27. How do we skip the other pages, and what is the new IO cost?

🔍 Show Solution

Logic: Create a B+ Tree Index on Listens(listen_time).
Design: The index allows a more targeted scan rather than a full join or scan.
IO Math:
- Scan Songs: 400 IOs
- Index Lookup: ~4 IOs
- Fetch filtered Listens: 150 IOs
- Total: ~554 IOs
The Why: Indexes prune the search space. By jumping directly to the start date and reading only the 1% target data, we achieve a 90x improvement vs plain HPJ.

Problem 2: The Column Store Win

Problem: Even with an index, row-based storage loads 150 bytes per listen just to read the 24 bytes we need (song_id, user_id, time, duration). How do we eliminate this "column waste," and what is the estimated IO win?

🔍 Show Solution

Logic: Switch to a Columnar Store (e.g., BigQuery, Parquet).
Design: Physically store each field in its own set of pages. This allows the CPU to ignore the IP addresses and device IDs entirely.
IO Math:
- We effectively read only 1/6th of the row width.
- Total: ~60 IOs (Listens: 25 IOs + Songs: 35 IOs)
The Why: Columnar storage is optimized for aggregation. By only reading the columns mentioned in the SELECT and WHERE clauses, we reduce IO by another 9x.

Problem 3: The Final Compression

Problem: How can we reduce the footprint of the genre and rating columns without losing data?

🔍 Show Solution

Technique 1 (Dictionary): Map the 200 unique Genres to 1-byte IDs (16 pages -> 1 page).
Technique 2 (Bit Packing): Since ratings are 1-5, store them in 4 bits instead of a 32-bit integer (7 pages -> 1 page).
Technique 3 (RLE): Since data is sorted, compress repeating User IDs (13 pages -> 5 pages).
Final Victory: ~35 IOs | 20 Seconds | $400 / Year
The Why: Domain-specific compression.

Summary: The Compound ROI

We optimized the $550,000 disaster to $400 by multiplying three discrete LEGO blocks:

Stage	IO Reduction	Annual Cost	Win Factor
Baseline (HPJ)	50,000 IOs	$550,000	1x
+ Index	550 IOs	$6,500	90x
+ Columnar	60 IOs	$730	9x
+ Compression	35 IOs	$365	2x

Total Impact: 1,400x improvement.