Case Study 3.4: The Engineering of "Spotify Wrapped"

Case Study 3.4 Reading Time: 7 mins

How does Spotify survive its largest annual traffic spike?

Goal: Optimize data access using Hash Indexes, B+ Trees, and Physical Clustering.

Spotify Wrapped isn't just a feature; it's a logistical operation. It splits into two engineering challenges. First, the Offline Phase aggregates trillions of listens. Then, the Online Phase delivers the results to 700 million users in one synchronized moment.


Phase 1: The Offline Aggregation

Before December, Spotify sifts through the Listens Table to tally each user's top tracks. Say you logged 1,000 listens this year.

Option 1: Unclustered

The table is stored chronologically, as songs were played.

Option 2: Clustered

Spotify reorders the trillion-row table by (user_id, listen_time), creating a Clustered Index for the Wrapped query on these columns.


Phase 2: The Online Delivery

Once summaries are computed, they're stored in a Wrapped table.

1. The Wrapped Card: Hash Index

When you tap your personal Wrapped card, the system retrieves your precomputed summary.

2. The "Internals": B+ Tree Index

Some users want to delve deeper and see their full "Recently Played" history from the raw Listens table.


The Tradeoff: The One-Dimension Rule

A table can only be clustered (sorted) by one dimension at a time. Clustering is a physical sort, so data can't sit in two different orders simultaneously.

The Solution: At Spotify's scale, they often maintain multiple copies of the same table, each clustered differently to support various features.


Final Takeaway: "Clustering" handles the heavy lifting of batch processing. "Hashing" is for instant search. "B+ Trees" navigate through time and data landscapes.