CS145 Logo

CS 145

Fall 2025 • Intro to Big Data Systems

Problem Sets

Problem Sets (PSETs) will help you reinforce core concepts from class. They bridge the gap between lecture theory and project implementation.

PSET 0

Pre-Req Check and Logistics

Ensure you're ready for the course. We work on big data in this class --100x the scale of data in 161 and 111. 145 builds on the basics of data structures, algorithms and operating systems, so we can scale it 100x.

Also, familiarize yourself with logistics and where to find key course notes. And share your goals and background so we know how to best support you.
Prereqs: pre-145 Covers: Setup & Prerequisites Submit PSET0 on Gradescope →
PSET 1

Introductory SQL

Familiarize yourself with the basic syntax and logic of SQL. This set focuses on SELECT-FROM-WHERE, basic filters, and understanding how data is queried from single tables.

Note: In the questions, None refers to NULL.

Prereqs: 1st lecture Section 1: SQL Basics Submit PSET1 on Gradescope →
PSET 2

Intermediate SQL

Dive deeper into SQL using the Spotify schema. Solve complex problems involving JOINs, Window Functions, and CTEs.

Prereqs: 1st 4 lectures Section 1: Advanced SQL Submit PSET2 on Gradescope →
PSET 3

System Primer & IO

Analyze the impact of different machine configurations using NanoDB.

  • Implementations of IODevices
  • Calculating read/write costs
  • Comparing IO costs across different device types
Prereqs: Lecture 4 Section 2: Systems Primer Submit PSET3 on Gradescope →
PSET 4

JOIN Algorithms & IO

Focus on the IO costs of various JOIN algorithms under different machine configurations.

  • Block Nested Loop Join (BNLJ)
  • Sort Merge Join (SMJ)
  • Hash Partition Join (HPJ)
Prereqs: Weeks 3-4 Section 3: JOINs & GROUPBY Submit PSET4 on Gradescope →
PSET 5

Transactions

Analyze concurrency control, recovery, and the properties that ensure data integrity in the face of crashes and simultaneous users.

Section 4: Transactions Submit PSET5 on Gradescope →