Concurrency Full Stack
Concept. Concurrent transactions are independent multi-step operations running at the same time on shared data; the database is responsible for making each one act as if no other transactions exist. Across four cooperating layers (your SQL, the database's lock manager, the OS's mutexes, the hardware's atomic instructions).
Intuition. When Mickey and Minnie both like the same song at the same instant, both transactions hit the same Likes row. And four layers cooperate so both likes count without one overwriting the other: the database's lock manager serializes access to the row, the OS schedules the threads that hold those locks, and the hardware guarantees the increment instruction is atomic.
Every time you type BEGIN TRANSACTION, your query embarks on a journey through four crucial layers:
| Your Code | BEGIN TRANSACTION; UPDATE accounts...; COMMIT; |
| Database | Lock Manager • Conflict Detection • Deadlock Prevention |
| Operating System | Mutex • Semaphore • Spinlock • Thread Scheduling |
| Hardware | CAS Instructions • Memory Barriers • Cache Coherence |
Each layer stacks its own abstractions on the one below, all to tackle a singular challenge: How do we let thousands of transactions run simultaneously without stepping on each other?
Layer 4: Your Code
The top of the stack. Where you write BEGIN TRANSACTION, your queries, and COMMIT. From this layer you don't see locks, you don't see threads, you don't see CPU cycles. You see one transaction. The rest of the stack pays to keep that promise.
Layer 3: The Database
The first thing the database does with your transaction is decide what it conflicts with. If Mickey's transaction wants to write a row that Minnie is currently reading, the database has to know that. It tracks who is touching what, by table, by page, by row, using its lock manager. Most of the rest of this module is about this layer.
Layer 2: The Operating System
Every transaction runs on a thread. The OS decides which thread gets the CPU at any moment, and provides primitives, mutex, semaphore, spinlock, thread scheduling, that the database's lock manager sits on top of. The database knows what to lock; the OS knows how.
Layer 1: The Hardware
At the very bottom, a single CPU instruction the hardware promises is atomic: compare-and-swap. Read a memory location, check it hasn't changed, write a new value, all in one indivisible step. Every higher layer eventually reduces to that one instruction.
Key Definitions
Concurrency Control Algorithms
Manage concurrent access to shared data when dealing with (slow) IO devices
Goal: Maximize parallelism while maintaining data consistency
Locks
Data structures these algorithms use to coordinate access
Purpose: Prevent conflicting operations on the same data
Three Philosophies of Concurrency
Pessimistic: "Lock Everything First"
Strategy: Acquire ALL needed locks before touching ANY data
- Zero conflicts possible
− Limited parallelism
Optimistic: "Lock Nothing, Validate Later"
Strategy: Do all work freely, then check for conflicts at commit
- Maximum parallelism
− Must restart on conflicts
Multi-Version: "Everyone Gets Their Own View"
Strategy: Keep multiple versions of every piece of data
- Readers never block writers
− Storage overhead & garbage collection
Modern Hybrid: "Best of All Worlds"
Strategy: Combine techniques based on workload patterns
- Adaptive performance
− Complex implementation
Why OS vs Database Locks?
You might wonder: "Why can't databases just use regular OS locks?" Here's the key difference:
OS Locks (Mutex, Semaphore)
Purpose: Simple thread coordination
• Binary scope (locked/unlocked)
• No data awareness
• Manual deadlock detection
• Thread-level coordination
Database Lock Manager
Purpose: Sophisticated data coordination
• Rich set of locks. E.g. I "intend" to read these rows. Plan ahead.
• Table→page→row awareness
• Automatic deadlock resolution
• Million-transaction scale
Today's Challenge: Coordinate millions of AI agents and microservices, all hitting the same database simultaneously.