CS145: Your SQL Career Journey

Master the core → Pick a technical path → Build your career.

A left-to-right map: two blue boxes are CS145 content (core curriculum with Project 1, then a Project 2 choice of Science or the Systems path NanoMem), a grey box is what you keep learning beyond the course, a green box is the careers it leads to.

Figure 1. The CS145 journey as one map. Master the core database skills, then a project, then a chosen technical path, then where that path goes in industry. The line is the course; the branches are the careers it leads to.

After CS145: Interview Questions You Can Answer

These are the kinds of questions FAANG-tier screens ask. By the end of CS145 you can work each one start to finish, including the scale follow-up.

SQL & Analytics

Amazon · Top products query

Products with over $1M revenue in Q4 but under $100K in Q1.

Follow-up: optimize for a 10 TB orders table; which indexes?

Hints: orders(order_id, product_id, amount, date), products(product_id, name) · EXTRACT/QUARTER · GROUP BY with HAVING

Meta · Friend recommendations

Friends-of-friends but not direct friends, ranked by mutual friend count.

Follow-up: 30 s on 1M users; optimize for 500M.

Hints: friendships(user1_id, user2_id) bidirectional · self-joins · COUNT DISTINCT · LEFT JOIN with NULL check

Netflix · Binge-watching detection

Users who watched over 3 episodes of one show in any 24-hour window.

Follow-up: episodes across midnight, or across time zones?

Hints: views(user_id, show_id, episode_id, timestamp) · window functions (LAG / LEAD) · timestamp diffs

Systems Design

Google · URL deduplication

Check if a URL was crawled before. Handle 100B URLs.

Follow-up: normalization: www vs non-www, http vs https, trailing slash.

Hints: 100B × 100 bytes = 10 TB (no fit in memory) · Bloom filters (1% false positive ok) · distributed hash tables

Uber · Distributed event counter

Real-time ride counter across 5 data centers. Under 1 s lag, 99.99% uptime.

Follow-up: accuracy or availability during a network partition? Why?

Hints: partitions between DCs · eventually consistent counters · CRDTs · write-through cache

You'll read, debug, and write queries like this

Example: Finding Power Users

Complex production query

WITH user_stats AS (
SELECT 
  user_id,
  COUNT(DISTINCT song_id) as unique_songs,
  COUNT(DISTINCT genre) as unique_genres,
  SUM(play_duration) as total_minutes
FROM listens l
JOIN songs s ON l.song_id = s.song_id
GROUP BY user_id
),
power_users AS (
SELECT 
  user_id,
  unique_songs,
  unique_genres,
  total_minutes,
  NTILE(10) OVER (ORDER BY total_minutes DESC) as decile
FROM user_stats
WHERE unique_genres >= 3
)
SELECT 
u.name,
p.unique_songs,
p.unique_genres,
ROUND(p.total_minutes / 60.0, 1) as total_hours
FROM power_users p
JOIN users u ON p.user_id = u.user_id
WHERE p.decile = 1  -- Top 10% only
ORDER BY p.total_minutes DESC;

Where This Journey Leads: Real-World Architectures

Enterprise Scale: Spotify's Data Pipeline

One play event travels through OLTP → ELT → OLAP using the algorithms this course teaches:

One play event traced left to right across three panels: the blue OLTP real-time write path, the grey ELT batch hop, and the green OLAP analytics store that serves dashboards.

Figure 2. One Spotify play event traced across the stack: the transactional database (OLTP), the ELT hop, and the analytics database (OLAP). Each stage runs on algorithms this course teaches. One SQL backbone, three roles.