DB Design for Big Tech Scale

The Three-Tier Architecture That Powers Netflix, Spotify, and Uber

OLTP for users, OLAP for analytics, ETL to connect them - each optimized for its specific workload

The Three Parts of Big Data Systems

OLTP: User-Facing Speed

Every click, update, "add to cart" - that's OLTP. Here's Spotify handling a play:

-- User plays "Anti-Hero" by Taylor Swift
BEGIN TRANSACTION;
INSERT INTO listens (user_id, song_id, timestamp) 
VALUES (123, 789, NOW());
UPDATE users SET last_active = NOW() WHERE user_id = 123;
UPDATE songs SET play_count = play_count + 1 WHERE song_id = 789;
COMMIT;

The OLTP Formula: < 100ms latency, 1000s queries/sec, single-row updates

Scale Limits:

Single PostgreSQL: 5K transactions/sec ($500/month)
Need more? → Aurora (200K/sec) or Spanner (50K/sec per region)

OLAP: Analytics at Scale

"What's trending? How are we doing?" - that's OLAP. Here's Spotify finding hot genres:

-- Which genres are growing fastest this quarter?
SELECT 
    s.genre,
    DATE_TRUNC('week', l.timestamp) AS week,
    COUNT(DISTINCT l.user_id) AS unique_listeners,
    COUNT(*) AS total_plays
FROM listens l
JOIN songs s ON l.song_id = s.song_id
WHERE l.timestamp > '2024-01-01'
GROUP BY s.genre, week
ORDER BY week, unique_listeners DESC;

Reality Check: This scans 5TB (50B rows) in 15 seconds on BigQuery. PostgreSQL? 2+ hours.

Why OLAP Needs Distribution:

1 machine scanning 1TB: 200 seconds (SSD @ 5GB/s)
100 machines parallel: 2 seconds (each scans 10GB)
Result: 100x faster, same cost

ETL/ELT: Moving Data Between Worlds

The nightly job that feeds analytics. Here's Spotify's ETL:

# Simplified Apache Spark ETL job
def spotify_daily_etl():
    # 1. EXTRACT FROM OLTP (PostgreSQL replicas)
    raw_listens = spark.read.jdbc(
        "postgresql://replica.spotify.com/listens",
        query="SELECT * FROM listens WHERE date = YESTERDAY"
    )

    # 2. TRANSFORM - Aggregate AND enrich
    daily_stats = raw_listens \
        .JOIN(songs_dim, "song_id") \
        .JOIN(users_dim, "user_id") \
        .groupBy("genre", "country", "age_group") \
        .agg(
            COUNT("*").alias("play_count"),
            countDistinct("user_id").alias("unique_users"),
            AVG("listen_duration").alias("avg_duration")
        )

    # 3. LOAD to OLAP (BigQuery)
    daily_stats.write \
        .mode("append") \
        .partitionBy("date") \
        .bigquery("spotify_warehouse.daily_stats")

ETL Reality: Extract-Transform-Load: Daily batches, GB to TB, minutes to hours. Tools: Spark + Airflow.

ELT Alternative: Extract-Load-Transform: Load raw data into "big datalake", transform later into tables. Tools: Spark + Airflow + dbt. Idea: remove ETL bottleneck. Let analytics teams transform data as needed.

The Complete Picture

Users → API → PostgreSQL (OLTP: Handle clicks)
         ↓
      Kafka (Queue events reliably)  
         ↓
      Spark (ETL: Transform daily)
         ↓
    BigQuery (OLAP: Answer "what's trending?")
         ↓
    Dashboards (Executives see pretty charts)

Key Takeaways

Split into Three: OLTP for users, OLAP for analytics, ETL/ELT to connect them
Kafka Ships the Data: Reliable message queues (or publish-subscribe) ensure nothing gets lost between systems
SQL Runs Everything: PostgreSQL (OLTP), Spark SQL (ETL or ELT), BigQuery (OLAP) - it's all SQL
The Big Tech (Not So) Secret: Don't build one database to rule them all - decompose problems and use the right data system for each job