Database Systems: How to Deploy
80% of database internals share the same core algorithms
OLTP uses B-Trees, OLAP uses columnar storage, ETL uses hash partitions - but they all rely on the same fundamental patterns
Start Simple: The Reality of Database Scale
- Stack Overflow: 4 SQL Servers handle 1.3B pageviews/month
- Basecamp: 3M+ accounts on simple Rails + MySQL architecture
- Notion: PostgreSQL with read replicas at $10B valuation
- Discord: Started simple, moved to Cassandra only after 1M+ concurrent users
- Your managed database vendor says you've hit hard limits
- You have a dedicated database team (5+ engineers)
- Global latency requirements (< 50ms everywhere)
- Handling 100K+ requests per second sustained
OLTP, OLAP, and ETL - For Scale, Split Responsibilities
OLTP: The Speed Demon
What is OLTP?
Online Transaction Processing - The workhorse of user-facing applications. Every click, every update, every "add to cart" is an OLTP operation.
Spotify OLTP in Action
-- User plays "Anti-Hero" by Taylor Swift
BEGIN TRANSACTION;
INSERT INTO listens (user_id, song_id, timestamp)
VALUES (123, 789, NOW());
UPDATE users SET last_active = NOW() WHERE user_id = 123;
UPDATE songs SET play_count = play_count + 1 WHERE song_id = 789;
COMMIT;
Characteristics:
-
Latency: < 100ms per query
-
Volume: 1000s of queries/second
-
Pattern: Single row or small set operations
-
Storage: Row-based (PostgreSQL, MySQL)
Single Machine vs Distributed OLTP
When to distribute OLTP:
-
More than 10K transactions/second
-
Global users requiring < 50ms latency
-
Need 99.99% availability
-
Examples: Google Spanner (50K+ TPS), Aurora (200K writes/sec)
OLAP: The Data Cruncher
What is OLAP?
Online Analytical Processing - Built for business intelligence, data science, and executive dashboards. Answers questions like "What's trending?" and "How are we doing?"
Spotify Analytics Query
-- Which genres are growing fastest this quarter?
SELECT
s.genre,
DATE_TRUNC('week', l.timestamp) AS week,
COUNT(DISTINCT l.user_id) AS unique_listeners,
COUNT(*) AS total_plays
FROM listens l
JOIN songs s ON l.song_id = s.song_id
WHERE l.timestamp > '2024-01-01'
GROUP BY s.genre, week
ORDER BY week, unique_listeners DESC;
This query:
-
Scans 5TB of listening data
-
Processes 50 billion rows
-
Takes 15 seconds on BigQuery
-
Would take 2 hours on PostgreSQL
Why OLAP is Almost Always Distributed
- Data Size: 1TB+ common (won't fit in RAM)
- Scan Speed: SSD reads at 5GB/s → 200 seconds for 1TB
- CPU Bound: Aggregations need parallel processing
- Cost: One huge machine vs 100 small machines
100 machines × 10GB each = 1TB total
Parallel scan: 1TB / 100 = 10GB per machine
Time: 10GB / 5GB/s = 2 seconds (vs 200s)
ETL: The Data Highway
Extract, Transform, Load
The bridge between OLTP and OLAP worlds. Takes operational data and prepares it for analysis.
Spotify's Nightly ETL Pipeline
# Simplified Apache Spark ETL job
def spotify_daily_etl():
# 1. EXTRACT FROM OLTP (PostgreSQL replicas)
raw_listens = spark.read.jdbc(
"postgresql://replica.spotify.com/listens",
query="SELECT * FROM listens WHERE date = YESTERDAY"
)
# 2. TRANSFORM - Aggregate AND enrich
daily_stats = raw_listens \
.JOIN(songs_dim, "song_id") \
.JOIN(users_dim, "user_id") \
.groupBy("genre", "country", "age_group") \
.agg(
COUNT("*").alias("play_count"),
countDistinct("user_id").alias("unique_users"),
AVG("listen_duration").alias("avg_duration")
)
# 3. LOAD to OLAP (BigQuery)
daily_stats.write \
.mode("append") \
.partitionBy("date") \
.bigquery("spotify_warehouse.daily_stats")
ETL Characteristics
-
Schedule: Daily, hourly, or streaming
-
Duration: Minutes to hours
-
Volume: GB to TB per run
-
Tools: Apache Spark, Airflow, dbt
-
Pattern: Batch processing
The Decision Framework
When to Use What?
Question | OLTP | OLAP | ETL |
---|---|---|---|
Is this user-facing? | ✅ Yes | ❌ No | ❌ No |
< 100ms response needed? | ✅ Yes | ❌ No | ❌ No |
Scanning > 1M rows? | ❌ No | ✅ Yes | ✅ Yes |
Complex aggregations? | ❌ No | ✅ Yes | ✅ Yes |
Real-time updates? | ✅ Yes | ❌ No | ❌ No |
Historical analysis? | ❌ No | ✅ Yes | ❌ No |
Real-World Architecture: Spotify Scale
The Numbers
-
OLTP: 500M users × 10 requests/day = 5B transactions/day
-
OLAP: 1PB data warehouse, 10K queries/day
-
ETL: 100TB/day processed
The Stack
Users → Load Balancers → API Servers
↓
PostgreSQL Shards (OLTP)
↓
Kafka (Event Stream)
↓
Spark (ETL Processing)
↓
BigQuery (OLAP Warehouse)
↓
Tableau (Dashboards)
Key Takeaways
-
Three Different Worlds: OLTP, OLAP, and ETL solve different problems
-
Storage Matters: Row storage for OLTP, columnar for OLAP
-
Start Simple: Single machine PostgreSQL handles more than you think
-
Distribute When Necessary: Let performance metrics drive the decision
-
ETL Connects Everything: The critical bridge between operational and analytical
Next Up: Now that we understand WHEN to use different systems, let's dive into HOW they work, starting with Storage & Paging on a single machine.