Case Study 3.4: The Engineering of "Spotify Wrapped"

Case Study 3.4 Reading Time: 7 mins

How does Spotify survive its largest annual traffic spike?

Goal: Optimize data access using Hash Indexes, B+ Trees, and Physical Clustering.

Spotify Wrapped isn't one feature. It's two different engineering problems. First, an Offline Phase that aggregates trillions of listens, and second, an Online Phase that delivers results to 700M users at once.


Phase 1: The Offline Aggregation (Building the Magic)

Before December, Spotify must scan the Listens Table to calculate every user's top songs. Suppose you have 1,000 listens in the year.

Option 1: Unclustered

The table is physically stored in the order the songs were played (chronological).

Option 2: Clustered

Spotify re-sorts the trillion-row table by (user_id, listen_time). This is a Clustered Index for the Wrapped query on these two columns.


Phase 2: The Online Delivery (The Launch)

Once the summaries are computed, they are stored in a Wrapped table.

1. The Wrapped Card: Hash Index

When you tap your personal Wrapped card, the system finds your precomputed summary.

2. The "Deep Dive": B+ Tree Index

Some users want to go deeper and see their full "Recently Played" history from the raw Listens table.


The Tradeoff: The One-Dimension Rule

A table can only be clustered (sorted) by one dimension at a time. Because clustering is a physical sort, you cannot have the same data sit in two different orders simultaneously.

The Solution: At Spotify's scale, they often maintain multiple copies of the same table, each clustered differently to support different features!


Final Takeaway: "Clustering" is for the heavy lifting of batch processing. "Hashing" is for the instant search. "B+ Trees" are for navigating through time and cities.