Stanford · CS 145 · Fall 2026

CS 145

Intro to Big Data Systems
Tue · Thu
4:30 – 5:50 pm
Modern data systems for the AI era

Build real modern data systems. Learn it your way.

Every modern data project runs from a single SQL query to a database spread across a thousand machines. You build it, scale it, and keep it correct and cheap. Read it, watch it, do it, quiz yourself.

Sign in with a Google account to save your progress. That is the only sign-up.

The whole course in a couple of minutes.

Why it's different

Three reasons the content is different.

Built for the AI era

Read, write, and debug SQL, and check the correctness and meaning of what an LLM writes for you. Agentic data stacks, and where precision actually lives.

Real systems, torn apart

Case-study teardowns of UberEats, Claude and OpenAI agent memory, and OpenAI on Postgres. Real systems, not toy schemas.

The full stack, every layer

From a SQL query down to the disk: storage, indexes, transactions, distribution. You own the whole stack, not one piece of it.

Flexible by design

How you learn it, and what it's for.

Three ways to learn it, aimed at three goals A bipartite map. On the left, three ways you learn: watch or read, practice, and test yourself. On the right, four goals you aim at: the CS 145 exam, data and SQL interviews, systems-design interviews, and building a full systems or science project. Lines connect every way on the left to every goal on the right, because any way of learning serves any goal. Color key: blue is how you learn, green is what it's for. Learn it any of three ways. Aim it at any goal. Watch, practice, or test yourself, toward the exam or an interview. HOW YOU LEARN WHAT IT'S FOR Watch or read A five-minute video or a page Practice Problem sets and Colab notebooks Test yourself Quizzes you can retake any time The CS 145 exam Minimum viable exam prep Data & SQL interviews Joins, NULLs, window functions Systems-design interviews Design at scale Projects Build a full systems or science project Color key  blue = how you learn · green = what it's for. Any way reaches any goal.

Watch, practice, or test yourself, and aim any of it at the exam, an interview, or a full course project. Sign in with Google and it tracks your progress.

What you'll build

Own the full stack, every layer.

Six modules, from the query language down to the disk and out across machines. You build each layer, and you see how they stack.

Own the full stack: the six modules Six boxes left to right, the six modules. M1 SQL: query the data. M2 Systems: storage and IO cost. M3 nanoDB: a storage engine and indexes. M4 Transactions: concurrency and recovery. M5 Distributed: across many machines, sharding and consensus. M6 Modern: Spark and BigQuery, with OLAP, OLTP, and NoSQL. Color key: each module has its own color across the course; together they are the full stack, from the query language down to the disk and out across machines. Own the full stack: the six modules From the query language, down to the disk, and out across machines. M1 · SQL Query the data. SELECT, joins, NULLs. M2 · Systems Storage and IO cost. Paging, hashing, Bloom. M3 · nanoDB A storage engine. Indexes, query plans. M4 · Transactions Concurrency, recovery. Locking, WAL, crashes. M5 · Distributed Across many machines. Sharding, consensus. M6 · Modern Spark and BigQuery. OLAP, OLTP, NoSQL. Color key  each module has its own color across the course; together they are the full stack.
The throughline

Five questions every data project answers.

Strip away the titles and every data project answers the same five. The whole course is how to answer them.

Five questions every data project answers Five numbered cards left to right. One, Correctness: is the answer right, and can you prove it. Two, Efficiency and cost: fast and cheap at scale. Three, The live application: data in, through, and out at thousands of operations per second. Four, Scale and architecture: past one machine. Five, Agentic workflows: build it with agents and still trust it. Color key: blue numbers mark the five recurring questions; they are design lenses, not modules. Five questions every data project answers Design lenses, not modules. The same five, from a notebook to a planet-scale service. 1 Correctness Is the answer right, and can you prove it? NULLs, duplicates, concurrent writers. A query that looks right drops rows. 2 Efficiency Fast and cheap at scale? The same answer can cost a cent or a thousand dollars, by layout and plan. 3 Live app Data in, through, and out? Thousands of writes and reads per second, nothing lost or double-counted. 4 Scale Past one machine? Spread one database over many machines, keep it consistent and fast. 5 Agents Build it with agents, trust it? Use AI to build and operate the system, then verify the result. Color key  blue = a recurring question every data project answers, the throughline of the course.
Where to start

Two doors.

Taking CS 145 at Stanford?

Start with how the class runs, then work through the modules in order. The weekly sections and problem sets keep you in step.

How the class works

Just exploring?

Watch a video, take a quiz, or search for a concept. Start anywhere.

Browse the course