Case Study 1.4: Why does Spotify store songs and lyrics in a key-value database?
Why does Spotify store songs and lyrics in a key-value database?
Goal: Learn when to use key-value databases
To understand why a massive app like Spotify uses more than just SQL, we first need to understand the different flavors of data.
Understanding the Data Spectrum
1. Structured Data (Tabular Data) Refers to tables of organized, formatted data that is easily stored and managed in fixed fields, such as rows and columns.
- Motivation: Efficient storage and manipulation. This is perfect for Spotify's tables of integers, strings, and simple types (like User IDs and Song IDs).
2. Unstructured Data Includes complex types such as media files (audio/video), textual content, or user-generated data.
- Motivation: Allows databases to support complex types for richer information. Spotify uses this to store actual audio files and social media data to enhance its recommendation systems.
3. Semi-structured Data Lies between structured and unstructured. It doesn't have a strict table format but has some organization (like a loose schema).
- Motivation: JSON (JavaScript Object Notation) is the industry standard here. It offers a balance between the rigid organization of SQL and the total flexibility of unstructured data, making it easy to integrate changing data requirements.
The Spotify Strategy: Key-Value Stores
For unstructured and semi-structured data, Spotify uses Key-Value databases like Googleβs Bigtable or AWSβ DynamoDB.
How it works: Imagine a massive locker room. Each locker has a unique number (the Key) and inside the locker is the item (the Value).
-
Example: Key =
song_123, Value =audio_file.mp3. -
Retrieval: When you hit play, Spotify uses the song's ID to quickly pull the audio from the key-value store and stream it to you.
Alternatively, they might store the actual file in a distributed file system (like Amazon S3) and use the key-value database as an index (Key = song_id, Value = S3-location). This allows Spotify to scale billions of files while maintaining lightning-fast performance.
SQL vs. Key-Value: The Trade-offs
| Feature | SQL Database (Conventional) | Key-Value Store |
|---|---|---|
| Querying | Rich capability: Joins, Aggregations, Complex filters. | Simple: Given a key, it returns the value. |
| Schema | Rigid: Needs a predefined structure. | Flexible: No predefined schema required. |
| Integrity | High: Enforced constraints & transactions. | Low: App has to handle integrity. |
| Best For | Structured data with complex relationships. | Massive unstructured/semi-structured data (like lyrics or media). |
Takeaway: We wouldn't want to store each line of music lyrics in separate SQL columns. Instead, we store it as one semi-structured blob in a Key-Value store: {key: song_id, value: lyrics_blob}. (Or in JSONB in modern SQL databases.)
The 2025 Perspective: Converging Worlds
Recall from π Case Study 1: UberEats that modern SQL is often a "Swiss Army Knife" with a variety of dataytype-specific features (including JSONB, vector, etc.).
Modern Hybridization: These are classical definitions. Remember that many modern SQL databases (like PostgreSQL, BigQuery, Spanner, Snowflake) now support key-value features. So these distinctions are less relevant in 2026.