Concurrency Full Stack

Concept. Concurrent transactions are independent multi-step operations running at the same time on shared data; the database is responsible for making each one act as if no other transactions exist. Across four cooperating layers (your SQL, the database's lock manager, the OS's mutexes, the hardware's atomic instructions).

Intuition. When Mickey and Minnie both like the same song at the same instant, both transactions hit the same Likes row. And four layers cooperate so both likes count without one overwriting the other: the database's lock manager serializes access to the row, the OS schedules the threads that hold those locks, and the hardware guarantees the increment instruction is atomic.

Every time you type BEGIN TRANSACTION, your query embarks on a journey through four crucial layers:

Your Code	`BEGIN TRANSACTION; UPDATE accounts...; COMMIT;`
Database	Lock Manager • Conflict Detection • Deadlock Prevention
Operating System	Mutex • Semaphore • Spinlock • Thread Scheduling
Hardware	CAS Instructions • Memory Barriers • Cache Coherence

Each layer stacks its own abstractions on the one below, all to tackle a singular challenge: How do we let thousands of transactions run simultaneously without stepping on each other?

Layer 4: Your Code

The top of the stack. Where you write BEGIN TRANSACTION, your queries, and COMMIT. From this layer you don't see locks, you don't see threads, you don't see CPU cycles. You see one transaction. The rest of the stack pays to keep that promise.

Layer 3: The Database

The first thing the database does with your transaction is decide what it conflicts with. If Mickey's transaction wants to write a row that Minnie is currently reading, the database has to know that. It tracks who is touching what, by table, by page, by row, using its lock manager. Most of the rest of this module is about this layer.

Layer 2: The Operating System

Every transaction runs on a thread. The OS decides which thread gets the CPU at any moment, and provides primitives, mutex, semaphore, spinlock, thread scheduling, that the database's lock manager sits on top of. The database knows what to lock; the OS knows how.

Layer 1: The Hardware

At the very bottom, a single CPU instruction the hardware promises is atomic: compare-and-swap. Read a memory location, check it hasn't changed, write a new value, all in one indivisible step. Every higher layer eventually reduces to that one instruction.

Key Definitions

Concurrency Control Algorithms

Manage concurrent access to shared data when dealing with (slow) IO devices

Goal: Maximize parallelism while maintaining data consistency

Locks

Data structures these algorithms use to coordinate access

Purpose: Prevent conflicting operations on the same data

Three Philosophies of Concurrency

Pessimistic: "Lock Everything First"

Strategy: Acquire ALL needed locks before touching ANY data

Zero conflicts possible
− Limited parallelism

Optimistic: "Lock Nothing, Validate Later"

Strategy: Do all work freely, then check for conflicts at commit

Maximum parallelism
− Must restart on conflicts

Multi-Version: "Everyone Gets Their Own View"

Strategy: Keep multiple versions of every piece of data

Readers never block writers
− Storage overhead & garbage collection

Modern Hybrid: "Best of All Worlds"

Strategy: Combine techniques based on workload patterns

Adaptive performance
− Complex implementation

Why OS vs Database Locks?

You might wonder: "Why can't databases just use regular OS locks?" Here's the key difference:

OS Locks (Mutex, Semaphore)

Purpose: Simple thread coordination

• Binary scope (locked/unlocked)
• No data awareness
• Manual deadlock detection
• Thread-level coordination

Database Lock Manager

Purpose: Sophisticated data coordination

• Rich set of locks. E.g. I "intend" to read these rows. Plan ahead.
• Table→page→row awareness
• Automatic deadlock resolution
• Million-transaction scale

Today's Challenge: Coordinate millions of AI agents and microservices, all hitting the same database simultaneously.