Case Study 1.4: Why does Spotify store songs and lyrics in a key-value database?

Case Study 1.4 Reading Time: 5 mins

Why does Spotify store songs and lyrics in a key-value database?

Goal: Learn when to use key-value databases

To understand why a massive app like Spotify uses more than just SQL, we first need to understand the different flavors of data.


Understanding the Data Spectrum

1. Structured Data (Tabular Data) Refers to tables of organized, formatted data that is easily stored and managed in fixed fields, such as rows and columns.

2. Unstructured Data Includes complex types such as media files (audio/video), textual content, or user-generated data.

3. Semi-structured Data Lies between structured and unstructured. It doesn't have a strict table format but has some organization (like a loose schema).


The Spotify Strategy: Key-Value Stores

For unstructured and semi-structured data, Spotify uses Key-Value databases like Google’s Bigtable or AWS’ DynamoDB.

How it works: Imagine a massive locker room. Each locker has a unique number (the Key) and inside the locker is the item (the Value).

Alternatively, they might store the actual file in a distributed file system (like Amazon S3) and use the key-value database as an index (Key = song_id, Value = S3-location). This allows Spotify to scale billions of files while maintaining lightning-fast performance.

SQL vs. Key-Value: The Trade-offs

Feature SQL Database (Conventional) Key-Value Store
Querying Rich capability: Joins, Aggregations, Complex filters. Simple: Given a key, it returns the value.
Schema Rigid: Needs a predefined structure. Flexible: No predefined schema required.
Integrity High: Enforced constraints & transactions. Low: App has to handle integrity.
Best For Structured data with complex relationships. Massive unstructured/semi-structured data (like lyrics or media).

Takeaway: We wouldn't want to store each line of music lyrics in separate SQL columns. Instead, we store it as one semi-structured blob in a Key-Value store: {key: song_id, value: lyrics_blob}. (Or in JSONB in modern SQL databases.)

The 2025 Perspective: Converging Worlds

Recall from πŸ“˜ Case Study 1: UberEats that modern SQL is often a "Swiss Army Knife" with a variety of dataytype-specific features (including JSONB, vector, etc.).

Modern Hybridization: These are classical definitions. Remember that many modern SQL databases (like PostgreSQL, BigQuery, Spanner, Snowflake) now support key-value features. So these distinctions are less relevant in 2026.