CS145: Your SQL Career Journey

Everyone masters SQL + Systems → Choose your specialization → Reach your destination

Projects

After completing CS145 ...

Sample FAANG Interview Questions

Common questions you'll be ready to tackle after CS145:

SQL & Analytics Questions

1. Amazon - Top Products Query

Write a query to find products that generated >$1M revenue in Q4 but <$100K in Q1

Follow-up: Now optimize it for a 10TB orders table. What indexes would you create?

Hints: Tables: orders(order_id, product_id, amount, date), products(product_id, name) • Date filtering with EXTRACT/QUARTER • GROUP BY with HAVING

2. Meta - Friend Recommendations

Find users who are friends-of-friends but not direct friends, ranked by mutual friend count

Follow-up: The query takes 30 seconds on 1M users. How do you optimize for 500M users?

Hints: friendships(user1_id, user2_id) bidirectional • Self-joins • COUNT DISTINCT • LEFT JOIN with NULL check

3. Netflix - Binge Watching Detection

Identify users who watched >3 episodes of the same show within any 24-hour window

Follow-up: How would you handle episodes watched across midnight? What about different time zones?

Hints: views(user_id, show_id, episode_id, timestamp) • Window functions (LAG/LEAD) • Timestamp differences

Systems Design Questions

4. Google - URL Deduplication

Design a system to check if a URL has been crawled before. Handle 100B URLs.

Follow-up: How do you handle URL normalization? (www vs non-www, http vs https, trailing slash)

Hints: 100B URLs × 100 bytes = 10TB (won't fit in memory) • Bloom filters (1% false positive ok) • Distributed hash tables

5. Uber - Distributed Event Counter

Design a real-time counter for rides across 5 data centers. Requirements: <1s lag, 99.99% uptime

Follow-up: Would you choose accuracy or availability during a network partition? Why?

Hints: Network partitions between DCs • Eventually consistent counters • CRDTs • Write-through cache

You'll read, debug, and write queries like...

Example: Finding Power Users

📋 Complex Production Query

WITH user_stats AS (
  SELECT 
    user_id,
    COUNT(DISTINCT song_id) as unique_songs,
    COUNT(DISTINCT genre) as unique_genres,
    SUM(play_duration) as total_minutes
  FROM listens l
  JOIN songs s ON l.song_id = s.song_id
  GROUP BY user_id
),
power_users AS (
  SELECT 
    user_id,
    unique_songs,
    unique_genres,
    total_minutes,
    NTILE(10) OVER (ORDER BY total_minutes DESC) as decile
  FROM user_stats
  WHERE unique_genres >= 3
)
SELECT 
  u.name,
  p.unique_songs,
  p.unique_genres,
  ROUND(p.total_minutes / 60.0, 1) as total_hours
FROM power_users p
JOIN users u ON p.user_id = u.user_id
WHERE p.decile = 1  -- Top 10% only
ORDER BY p.total_minutes DESC;

Where This Journey Leads: Real-World Architectures

Enterprise Scale: Spotify's Data Pipeline

See how one play event travels through OLTP → ETL → OLAP using the algorithms you'll learn in this course:

Startup Journey: From MVP to Scale

See how a food delivery startup grows from PostgreSQL → Distributed Systems using the concepts you'll master:

Optional Reading: Quickly Scan Real-World Data Architecture Decisions

Learn from engineering teams at scale - how they chose and evolved their data systems:

From SQL to Distributed Systems

How Uber Serves Over 40 Million Reads Per Second
Uber's journey from MySQL → BigQuery, Spanner, and PrestoSQL. See how they handle massive scale with integrated caching.

Spotify's Data Platform Explained
Deep dive into Spotify's MySQL and BigQuery infrastructure. Learn how they process billions of events daily.

Building and Scaling Notion's Data Lake
How Notion scaled PostgreSQL to handle their collaborative workspace data. Real-world PostgreSQL optimization at scale.

The NoSQL Reality Check

HBase Deprecation at Pinterest
Why Pinterest moved away from NoSQL back to SQL systems. A case study in choosing the right tool for the job.

Why NoSQL Deployments Are Failing at Scale
Industry analysis on NoSQL limitations and why companies are returning to SQL-based systems.