Spotify End-to-End: How Everything Connects
Concept. A single click on Spotify's play button triggers a coordinated cascade across multiple subsystems — auth, recommendation, search, playback — each backed by a different storage and indexing strategy chosen for its specific workload.
Intuition. When Mickey hits play, Postgres authenticates him, an LSM-backed events store records the click, a vector search returns recommended next songs, a B+Tree-indexed catalog fetches the audio file, and a CDN streams it — all within 2 seconds.
One Play Button, Every Algorithm
When you hit play on "Anti-Hero," that single click sets off a chain reaction through every algorithm we've covered.
1 Click
User action
3 Systems
OLTP → ETL → OLAP
8 Algorithms
From this course
2 Seconds
To CEO dashboard
The Journey
Act 1: The Click (OLTP)
You tap play. In 8 milliseconds, this happens:
INSERT INTO listens (user_id, song_id, timestamp)
VALUES (123456, 789, NOW());
Algorithm: LSM Tree (lsm-indexes.md)
-
Write goes to MemTable (in-memory B+Tree)
-
No disk I/O = instant response
-
Background flush to SSTables when full
Why LSM not B+Tree? Spotify handles 100K writes/second. B+Trees demand random disk writes. LSM sticks to sequential writes.
< 8ms
Response time
100K/sec
Writes per node
8.6B
Events per day
Act 2: The Transform (ETL)
At 2 AM, Spark kicks in. 8.6 billion events from yesterday need processing.
Algorithms: Every One We Learned
# 1. Hash Partition (hash-partitioning.md)
listens.repartition(1000, "user_id") # 1000 partitions
# 2. BigSort (big-sort.md)
.sortWithinPartitions("timestamp") # External merge sort
# 3. Broadcast Join (distributed-query.md)
.join(broadcast(songs), "song_id") # Small table to all nodes
# 4. Write Columnar (storage-layout.md)
.write.parquet("s3://warehouse/") # Analytics-optimized
Why These Algorithms?
-
Hash Partition: Spread 100TB across 1000 machines
-
BigSort: Each partition sorts its 100GB chunk
-
Broadcast Join: 100MB song table copied to all nodes
-
Columnar: Compress 100TB → 10TB for analytics
3 Hours
Total pipeline time
1000 Nodes
Spark cluster size
100TB
Daily data processed
Act 3: The Dashboard (OLAP)
CEO opens mobile app, sees real-time "Top Artists" dashboard.
Algorithm: Broadcast Join + Columnar Scan (storage-layout.md, distributed-query.md)
-- Executive Query: "Top Artists by Country This Month"
SELECT artist, country, COUNT(*) AS plays
FROM fact_plays f
JOIN dim_artists a ON f.artist_id = a.artist_id -- Broadcast
WHERE f.date >= '2024-01-01'
GROUP BY artist, country
ORDER BY plays DESC LIMIT 100;
Why Broadcast Join? Artist table is 10MB, fits in memory on all 1000 nodes. No shuffle needed!
Why Columnar? Only scan 3 columns (artist_id, country, date) instead of all 20 columns.
2 Seconds
Query time
3/20 Columns
Columnar efficiency
1PB
Warehouse size