Prep For

Curated journeys through the existing course material. Pick the path that matches your goal — the exam, a SQL interview next week, a systems-design loop, or going deeper after the term ends. Steps are ordered: walk them top to bottom.

Path 1 Minimum Viable Exam Prep ≈ 10–14 hrs

The straight line from "haven't studied" to "ready to sit the exam." Quizzes first to find your gaps, then P-sets for the mechanics, then CA section walkthroughs to see solved examples, then re-read the concept summaries.

1Search

Skim the glossary on the Search page

make sure you are familiar with all the terms and concepts before you dive in

Open → 2Quiz

Take the 11 self-test quizzes

M1 Basic SQL · M1 Intermediate · M2 Systems · M3 Storage / Indexing / QO · M4 CC · M5 Distributed

Start → 3M1

PSET 1 — SQL fundamentals (15 questions)

SELECT · JOINs · GROUP BY · CTEs · window functions

Open → 4M2

PSET 2 — Systems primer (15 questions)

IO cost model · RAM/SSD/HDD · caching · hashing · Bloom filters

Open → 5M3

PSET 3 — nanoDB algorithms (30 questions)

BigSort · HashPartition · BNLJ · SMJ · HPJ · index selection

Open → 6M4

PSET 4 — Transactions (18 questions)

macro/microschedules · 2PL · WAL · recovery

Open → 7M5/M6

PSET 5 — Building Data Systems (on Gradescope)

distributed patterns + designing data systems · NoSQL · scaling stories — find it in the CS145 course on Gradescope

Gradescope → 8CA

Walk through the 8 CA section colabs

SQL · Systems · NanoDB Storage · Query Plans · Transactions · Distributed SQL · NoSQL · PySpark

Open Drive → 9M1

Re-read the learning outcomes

course goals · what you should be able to do at the end

Open →

Path 2 Data / SQL Interview Prep ≈ 3–5 hrs

For SQL screens and analytics-engineer interviews. Start with the patterns interviewers actually test (Round 1 = single pattern, Round 2 = composition), then drill on the playground, then read how Postgres scales in the wild for the inevitable "how would you scale this?" follow-up.

1M1B

How SQL interviews actually work

Round 1: single pattern · Round 2: pattern composition with CTEs

Open → 2M1B

The 5 common SQL patterns

Ladder · Timeline · Exception finder · Funnel · Top-N

Open → 3M1B

Practice problems — JOIN vs IN vs EXISTS, set ops, window functions

debug tables · manual traces · 6 ranking exercises

Open → 4Lab

PostgreSQL Colab playground

Spotify schema · run real queries · check your answers

Open Colab → 5M1B

Case Study 1.2: OpenAI & Postgres

read replicas · indexing strategy · the "how would you scale this?" answer

Open → 6M0

Case Study 0.2: Memory for UberEats

three-sided marketplace · transactional consistency · why SQL still wins

Open →

Path 3 Systems Design Interview Prep ≈ 6–10 hrs

For data-infra, backend, and platform-engineering interviews. Start with the IO cost model (the substrate every systems answer rests on), then index selection, then a real-world transaction disaster, then distributed primitives. Each step is a concrete pattern you can name in an interview.

1M2

IO cost model — RAM vs SSD vs HDD vs network

the substrate every systems-design answer rests on

Open → 2M2

Pick the right data structure

hash table vs B-tree vs Bloom filter vs skiplist — when and why

Open → 3M3B

Index selection patterns at 1M / 100M / 1B scale

User Lookup · Song Rating Lookup · range queries · hash vs B+Tree vs LSM

Open → 4M3C

End-to-end query optimization

selectivity · join order · pushdowns · the optimizer's mental model

Open → 5M4C

The Taylor Swift Eras Tour transaction disaster

lock contention · queue management · payment-flow blocking — a real case

Open → 6M5

Sharding & replication primitives

partition keys · primary/replica · failure modes

Open → 7M5

CAP theorem — Dynamo, Cassandra, consistent hashing

the tradeoff every interviewer asks about

Open → 8M5

Consensus & leader election — Paxos, Raft, Zookeeper

how distributed systems agree on anything

Open →

Path 4 Going Deeper — Reading List post-class

For when class is over and you want to keep going. Three columns: the foundational papers that built the field, the in-course case studies grouped by topic, and the modern systems shaping where things go next.

Foundational papers & systems

Case Study: GFS

sharding · replication · the paper that started "big data"

MapReduce — Google (2004)

hash partitioning · distributed sort · fault tolerance

Apache Spark — RDDs & in-memory compute

MapReduce evolved · why disk I/O was the bottleneck

Apache Kafka — real-time event streaming

partitioning · producers/consumers · delivery semantics

Distributed file systems

HDFS · Colossus · Google Cloud Storage architecture

Consensus — Paxos, Raft, Zookeeper