Concurrency Full Stack 🎭

Every time you write BEGIN TRANSACTION, your query travels through four critical layers:

Your Code	`BEGIN TRANSACTION; UPDATE accounts...; COMMIT;`
Database	Lock Manager • Conflict Detection • Deadlock Prevention
Operating System	Mutex • Semaphore • Spinlock • Thread Scheduling
Hardware	CAS Instructions • Memory Barriers • Cache Coherence

Each layer builds abstractions on the lower layer, all to answer one fundamental question: How do we let thousands of transactions run simultaneously without stepping on each other?

Key Definitions 📚

🚦 Concurrent Transactions

A set of transactions executing in a (small) time window

Example: 100 fans buying concert tickets at exactly the same time

⚙️ Concurrency Control Algorithms

Help manage concurrent access to shared data when you have (slow) IO devices

Goal: Maximize parallelism while maintaining data consistency

🔒 Locks

Data structures (popular) that these algorithms use to coordinate access

Purpose: Prevent conflicting operations on the same data

Kitchen Chaos: Three Ways to Share a Kitchen 🍳

The Problem: Multiple chefs, one kitchen. Everyone needs different ingredients. How to coordinate?

🔒 Pessimistic Chef

Strategy: Lock only what you need

Making pasta? Lock ONLY tomatoes and pasta station
Others can make sushi, grill fish, bake desserts
Trade-off: Good parallelism until sushi needs tomatoes

🎯 Optimistic Chef

Strategy: Grab freely, check at serving

Everyone uses ingredients without reservations
"Wait, we both used the last tomato?"
Trade-off: Max speed until collision

📚 Multi-Version Chef

Strategy: Uhm... let's use Github at all prep stages

"Version of file (tomato?) from 5 minutes ago"
"Current file"
Trade-off: No waits, but storage overhead

The Insight: The art is choosing the right strategy for your workloas. High conflict (everyone wants tomatoes)? Go pessimistic. Rare conflicts? Go optimistic. Read-heavy? Multi-version.

Three Philosophies of Concurrency 🎭

Now that we understand the kitchen analogy, let's see how databases actually implement these approaches:

🔒 Pessimistic: "Lock Everything First"

Strategy: Acquire ALL needed locks before touching ANY data

✅ Zero conflicts possible
❌ Limited parallelism

🎯 Optimistic: "Lock Nothing, Validate Later"

Strategy: Do all work freely, then check for conflicts at commit

✅ Maximum parallelism
❌ Must restart on conflicts

📚 Multi-Version: "Everyone Gets Their Own View"

Strategy: Keep multiple versions of every piece of data

✅ Readers never block writers
❌ Storage overhead & garbage collection

🔄 Modern Hybrid: "Best of All Worlds"

Strategy: Combine techniques based on workload patterns

✅ Adaptive performance
❌ Complex implementation

Why OS vs Database Locks? 🤔

You might wonder: "Why can't databases just use regular OS locks?" Here's the key difference:

🖥️ OS Locks (Mutex, Semaphore)

Purpose: Simple thread coordination

• Binary scope (locked/unlocked)
• No data awareness
• Manual deadlock detection
• Thread-level coordination

🗄️ Database Lock Manager

Purpose: Sophisticated data coordination

• Rich set of locks. E.g. I "intend" to read these rows. Plan ahead.
• Table→page→row awareness
• Automatic deadlock resolution
• Million-transaction scale

Today's Challenge: Coordinate millions of AI agents and microservices, all hitting the same database simultaneously. 🤖

What You'll Learn Next 🎯

This section will take you through the complete concurrency control journey:

Lock Foundations - How database locks actually work
Two-Phase Locking - The gold standard protocol
Microschedules - Reasoning about concurrent execution
Correctness - Proving your system won't corrupt data
Connecting Together - Putting it all into practice

By the end, you'll understand how databases achieve the seemingly impossible: letting thousands of transactions run concurrently while maintaining perfect data consistency.

🎭 The Core Insight

Concurrency control isn't just about preventing conflicts—it's about enabling massive parallelism while maintaining the illusion that each transaction runs alone.

Every technique we'll explore is a different answer to: "How do we coordinate access to shared data at web scale?"