Search
Key concepts and definitions across all sections.
M1 Module 1: SQL
How do I query relational data?
Joins:
INNER JOIN • LEFT JOIN • CROSS JOIN • Self Join
Common Table Expressions:
CTE (Common Table Expression) • WITH Clause
Window Functions:
Window Function • PARTITION BY • ROW_NUMBER • RANK • DENSE_RANK
Subqueries:
Uncorrelated Subquery • Correlated Subquery • EXISTS • IN • NULL in NOT IN
NULL Handling:
NULL • Three-Valued Logic • IS NULL
How do I know my queries are correct?
LLMs and SQL:
LLMs and Semantics
Query Equivalence:
Query Equivalence • Output Equivalence
Debug Tables:
Debug Tables
How do I solve intermediate-level SQL questions?
SQL Patterns:
The Funnel • The Ladder • The Timeline • The Comparison • The Exception Finder
M2 Module 2: Systems
How does physical hardware limit software?
IO:
Storage Hierarchy • IO Cost
Pages & Blocks:
Sequential vs Random IO
How do we manipulate data efficiently?
Bloom Filters:
Bloom Filter • False Positive Rate
Compression:
Run-Length Encoding (RLE) • Dictionary Encoding • Delta Encoding • Bit Packing
M3 Module 3: nanoDB
How is data physically organized on disk?
Storage:
Row Storage • Columnar Storage • Page
How do databases physically execute queries?
Algorithms:
Hash Partitioning • BigSort • Block Nested Loop Join (BNLJ) • Hash Partition Join (HPJ) • Sort-Merge Join (SMJ)
How do databases find data quickly?
Indexing:
B+ Tree • LSM Tree • SSTable • Compaction
How does the database choose the fastest path?
Query Optimization:
Query Optimizer • Query Plan • Selectivity • Cardinality • IO Cost Summary (Plans)
M4 Module 4: Transactions
What guarantees does a database make?
Motivation:
Transaction • Scale • Complexity
ACID Properties:
ACID • Atomicity • Consistency • Isolation • Durability
How do we handle concurrent users?
Building Transactions:
Serial Schedule • Concurrent Schedule • Transaction Manager
Transaction Scheduling:
Reordering Rules • Interleaved Schedule
Concurrency Primer:
Concurrency • Parallelism • Optimistic Concurrency Control (OCC) • Pessimistic Concurrency Control (PCC) • Multi-Version Concurrency Control (MVCC)
Lock Foundations:
Shared Lock (S-Lock) • Exclusive Lock (X-Lock) • Lock Compatibility Matrix
Two-Phase Locking (2PL):
Two-Phase Locking (2PL) • Strict Two-Phase Locking (S2PL) • Cascading Abort • Growing Phase • Shrinking Phase
Microschedules:
Microschedule • WaitsFor Graph • Deadlock • Cycle
Proving correctness.
Correctness:
Conflict • Read-Write Conflict • Write-Read Conflict • Write-Write Conflict • Conflict Serializability • Conflict Graph • Macroschedules • Topological Sort
Connecting Together:
2PL Theorem
What happens when the power goes out?
Write-Ahead Logging (WAL):
Write-Ahead Logging (WAL) • Trace Tables • WAL Protocol Rules
Post-Crash Recovery:
Analysis Phase • Redo Phase • Undo Phase • Checkpoint
M5 Module 5: Distributed
How do we scale to 1,000 machines?
Scalability:
Sharding Strategies • Replication Patterns
Algorithms:
Consistent Hashing
What happens when machines inevitably fail?
Compute Frameworks:
MapReduce
Spark:
Apache Spark • RDDs • Spark SQL • Lineage
How do we handle real-time data?
What are the physical limits of scaling?
Theory:
CAP Theorem
M6 Module 6: Data Systems
How does Big Tech scale data pipelines?
When is SQL the wrong tool?
How do we protect user data at scale?
Data Privacy:
Differential Privacy • Synthetic Data