DB Design for Big Tech Scale
The Three-Tier Architecture That Powers Netflix, Spotify, and Uber
OLTP for users, OLAP for analytics, ETL to connect them - each optimized for its specific workload
The Three Parts of Big Data Systems
OLTP: User-Facing Speed
Every click, update, "add to cart" - that's OLTP. Here's Spotify handling a play:
-- User plays "Anti-Hero" by Taylor Swift
BEGIN TRANSACTION;
INSERT INTO listens (user_id, song_id, timestamp)
VALUES (123, 789, NOW());
UPDATE users SET last_active = NOW() WHERE user_id = 123;
UPDATE songs SET play_count = play_count + 1 WHERE song_id = 789;
COMMIT;
The OLTP Formula: < 100ms latency, 1000s queries/sec, single-row updates
Scale Limits:
-
Single PostgreSQL: 5K transactions/sec ($500/month)
-
Need more? → Aurora (200K/sec) or Spanner (50K/sec per region)
OLAP: Analytics at Scale
"What's trending? How are we doing?" - that's OLAP. Here's Spotify finding hot genres:
-- Which genres are growing fastest this quarter?
SELECT
s.genre,
DATE_TRUNC('week', l.timestamp) AS week,
COUNT(DISTINCT l.user_id) AS unique_listeners,
COUNT(*) AS total_plays
FROM listens l
JOIN songs s ON l.song_id = s.song_id
WHERE l.timestamp > '2024-01-01'
GROUP BY s.genre, week
ORDER BY week, unique_listeners DESC;
Reality Check: This scans 5TB (50B rows) in 15 seconds on BigQuery. PostgreSQL? 2+ hours.
Why OLAP Needs Distribution:
1 machine scanning 1TB: 200 seconds (SSD @ 5GB/s)
100 machines parallel: 2 seconds (each scans 10GB)
Result: 100x faster, same cost
ETL/ELT: Moving Data Between Worlds
The nightly job that feeds analytics. Here's Spotify's ETL:
# Simplified Apache Spark ETL job
def spotify_daily_etl():
# 1. EXTRACT FROM OLTP (PostgreSQL replicas)
raw_listens = spark.read.jdbc(
"postgresql://replica.spotify.com/listens",
query="SELECT * FROM listens WHERE date = YESTERDAY"
)
# 2. TRANSFORM - Aggregate AND enrich
daily_stats = raw_listens \
.JOIN(songs_dim, "song_id") \
.JOIN(users_dim, "user_id") \
.groupBy("genre", "country", "age_group") \
.agg(
COUNT("*").alias("play_count"),
countDistinct("user_id").alias("unique_users"),
AVG("listen_duration").alias("avg_duration")
)
# 3. LOAD to OLAP (BigQuery)
daily_stats.write \
.mode("append") \
.partitionBy("date") \
.bigquery("spotify_warehouse.daily_stats")
ETL Reality: Extract-Transform-Load: Daily batches, GB to TB, minutes to hours. Tools: Spark + Airflow.
ELT Alternative: Extract-Load-Transform: Load raw data into "big datalake", transform later into tables. Tools: Spark + Airflow + dbt. Idea: remove ETL bottleneck. Let analytics teams transform data as needed.
The Complete Picture
Users → API → PostgreSQL (OLTP: Handle clicks)
↓
Kafka (Queue events reliably)
↓
Spark (ETL: Transform daily)
↓
BigQuery (OLAP: Answer "what's trending?")
↓
Dashboards (Executives see pretty charts)
Key Takeaways
-
Split into Three: OLTP for users, OLAP for analytics, ETL/ELT to connect them
-
Kafka Ships the Data: Reliable message queues (or publish-subscribe) ensure nothing gets lost between systems
-
SQL Runs Everything: PostgreSQL (OLTP), Spark SQL (ETL or ELT), BigQuery (OLAP) - it's all SQL
-
The Big Tech (Not So) Secret: Don't build one database to rule them all - decompose problems and use the right data system for each job