DB Design for Big Tech: When You Actually Need Scale
The Three-Tier Architecture That Powers Netflix, Spotify, and Uber
OLTP for users, OLAP for analytics, ETL to connect them - each optimized for its specific workload
The Three Worlds of Big Data Systems
OLTP: User-Facing Speed
Every click, update, "add to cart" - that's OLTP. Here's Spotify handling a play:
-- User plays "Anti-Hero" by Taylor Swift
BEGIN TRANSACTION;
INSERT INTO listens (user_id, song_id, timestamp)
VALUES (123, 789, NOW());
UPDATE users SET last_active = NOW() WHERE user_id = 123;
UPDATE songs SET play_count = play_count + 1 WHERE song_id = 789;
COMMIT;
The OLTP Formula: < 100ms latency, 1000s queries/sec, single-row updates
Scale Limits:
-
Single PostgreSQL: 5K transactions/sec ($500/month)
-
Need more? → Aurora (200K/sec) or Spanner (50K/sec per region)
OLAP: Analytics at Scale
"What's trending? How are we doing?" - that's OLAP. Here's Spotify finding hot genres:
-- Which genres are growing fastest this quarter?
SELECT
s.genre,
DATE_TRUNC('week', l.timestamp) AS week,
COUNT(DISTINCT l.user_id) AS unique_listeners,
COUNT(*) AS total_plays
FROM listens l
JOIN songs s ON l.song_id = s.song_id
WHERE l.timestamp > '2024-01-01'
GROUP BY s.genre, week
ORDER BY week, unique_listeners DESC;
Reality Check: This scans 5TB (50B rows) in 15 seconds on BigQuery. PostgreSQL? 2+ hours.
Why OLAP Needs Distribution:
1 machine scanning 1TB: 200 seconds (SSD @ 5GB/s)
100 machines parallel: 2 seconds (each scans 10GB)
Result: 100x faster, same cost
ETL: Moving Data Between Worlds
The nightly job that feeds analytics. Here's Spotify's ETL:
# Simplified Apache Spark ETL job
def spotify_daily_etl():
# 1. EXTRACT FROM OLTP (PostgreSQL replicas)
raw_listens = spark.read.jdbc(
"postgresql://replica.spotify.com/listens",
query="SELECT * FROM listens WHERE date = YESTERDAY"
)
# 2. TRANSFORM - Aggregate AND enrich
daily_stats = raw_listens \
.JOIN(songs_dim, "song_id") \
.JOIN(users_dim, "user_id") \
.groupBy("genre", "country", "age_group") \
.agg(
COUNT("*").alias("play_count"),
countDistinct("user_id").alias("unique_users"),
AVG("listen_duration").alias("avg_duration")
)
# 3. LOAD to OLAP (BigQuery)
daily_stats.write \
.mode("append") \
.partitionBy("date") \
.bigquery("spotify_warehouse.daily_stats")
ETL Reality: Daily batches, GB to TB, minutes to hours. Tools: Spark + Airflow + dbt.
Kafka: The Reliable Messenger
Nothing gets lost. Kafka (or Pub/Sub) sits between systems, guaranteeing delivery:
-
OLTP writes event → Kafka stores it → ETL reads when ready
-
Handles spikes, outages, backpressure
-
Spotify: Billions of events/day through Kafka
The Complete Picture
Users → API → PostgreSQL (OLTP: Handle clicks)
↓
Kafka (Queue events reliably)
↓
Spark (ETL: Transform daily)
↓
BigQuery (OLAP: Answer "what's trending?")
↓
Dashboards (Executives see pretty charts)
Key Takeaways
-
Split into Three: OLTP for users, OLAP for analytics, ETL to connect them
-
Kafka Ships the Data: Message queues (or publish-subscribe) ensure nothing gets lost between systems
-
SQL Runs Everything: PostgreSQL (OLTP), Spark SQL (ETL), BigQuery (OLAP) - it's all SQL
-
The Big Tech (Not So) Secret: Don't build one database to rule them all - decompose problems and use the right system for each job