Concurrency Full Stack

Every time you type BEGIN TRANSACTION, your query embarks on a journey through four crucial layers:

Your CodeBEGIN TRANSACTION; UPDATE accounts...; COMMIT;
DatabaseLock Manager • Conflict Detection • Deadlock Prevention
Operating SystemMutex • Semaphore • Spinlock • Thread Scheduling
HardwareCAS Instructions • Memory Barriers • Cache Coherence

Each layer stacks its own abstractions on the one below, all to tackle a singular challenge: How do we let thousands of transactions run simultaneously without stepping on each other?


Key Definitions

Concurrent Transactions

A set of transactions executing in a (small) time window

Example: 100 fans buying concert tickets at exactly the same time

Concurrency Control Algorithms

Manage concurrent access to shared data when dealing with (slow) IO devices

Goal: Maximize parallelism while maintaining data consistency

Locks

Data structures these algorithms use to coordinate access

Purpose: Prevent conflicting operations on the same data

Kitchen Chaos: Three Ways to Share a Kitchen

The Problem: Multiple chefs, one kitchen. Everyone needs different ingredients. How to coordinate?

Pessimistic Chef

Strategy: Lock only what you need

Making pasta? Lock ONLY tomatoes and pasta station
Others can make sushi, grill fish, bake desserts
Trade-off: Good parallelism until sushi needs tomatoes

Optimistic Chef

Strategy: Grab freely, check at serving

Everyone uses ingredients without reservations
"Wait, we both used the last tomato?"
Trade-off: Max speed until collision

Multi-Version Chef

Strategy: Use version control at all prep stages

"Version of file (tomato?) from 5 minutes ago"
"Current file"
Trade-off: No waits, but storage overhead

The Insight: Choose the right strategy for your workload. High conflict (everyone wants tomatoes)? Go pessimistic. Rare conflicts? Go optimistic. Read-heavy? Multi-version.


Three Philosophies of Concurrency

Now that we understand the kitchen analogy, let's see how databases actually implement these approaches:

Pessimistic: "Lock Everything First"

Strategy: Acquire ALL needed locks before touching ANY data

✅ Zero conflicts possible
❌ Limited parallelism

Optimistic: "Lock Nothing, Validate Later"

Strategy: Do all work freely, then check for conflicts at commit

✅ Maximum parallelism
❌ Must restart on conflicts

Multi-Version: "Everyone Gets Their Own View"

Strategy: Keep multiple versions of every piece of data

✅ Readers never block writers
❌ Storage overhead & garbage collection

Modern Hybrid: "Best of All Worlds"

Strategy: Combine techniques based on workload patterns

✅ Adaptive performance
❌ Complex implementation

Why OS vs Database Locks?

You might wonder: "Why can't databases just use regular OS locks?" Here's the key difference:

OS Locks (Mutex, Semaphore)

Purpose: Simple thread coordination

• Binary scope (locked/unlocked)
• No data awareness
• Manual deadlock detection
• Thread-level coordination

Database Lock Manager

Purpose: Sophisticated data coordination

• Rich set of locks. E.g. I "intend" to read these rows. Plan ahead.
• Table→page→row awareness
• Automatic deadlock resolution
• Million-transaction scale

Today's Challenge: Coordinate millions of AI agents and microservices, all hitting the same database simultaneously.


What You'll Learn Next

This section will take you through the complete concurrency control journey:

  1. Lock Foundations - How database locks actually work

  2. Two-Phase Locking - The gold standard protocol

  3. Microschedules - Reasoning about concurrent execution

  4. Correctness - Proving your system won't corrupt data

  5. Connecting Together - Putting it all into practice

By the end, you'll understand how databases manage complex coordination: letting thousands of transactions run concurrently while maintaining perfect data consistency.


The Core Insight

Concurrency control isn't just about preventing conflicts—it's about enabling massive parallelism while maintaining the illusion that each transaction runs alone.

Every technique we'll explore is a different answer to: "How do we coordinate access to shared data at web scale?"