Query Optimizer: Building Execution from Lego Blocks

The Magic Behind Every SQL Query

Your SQL goes through a sophisticated optimizer that considers millions of possible execution strategies in milliseconds.

SQL Text
What you write
1000s of Plans
What optimizer considers
Best Plan
What executes
10-100x Faster
vs naive execution

The Journey: SQL to Execution

Query Optimizer: From SQL to Execution Plan SQL Query SELECT a.name, COUNT(*) FROM artists a JOIN songs s ON ... JOIN listens l ON ... WHERE l.timestamp > '2024' Parse Parse Tree SELECT JOIN GROUP orders customers Logical Plan Logical Plan π (PROJECT) γ (GROUP BY) ⋈ (JOIN) Optimize Physical Plan Options Option 1: Hash Join + Seq Scan Option 2: Sort-Merge + Index Option 3: Nested Loop + B+Tree Cost: 523 Cost: 412 Cost: 892 ✓ Cheapest! Algorithm Lego Blocks The optimizer combines these building blocks to create execution plans 1. Access Methods Sequential Scan Read all pages Good for >10% rows Index Scan B+Tree lookup Good for <1% rows Bitmap Scan Multiple indexes Good for 1-10% rows 2. Break Into Small Problems Hash Partition Big Sort 3. Aggregation Methods Hash Aggregate Hash table for groups Sort Aggregate Sort then group 4. Join Algorithms Hash Join Build hash table O(N+M) with memory Sort-Merge Join Sort both inputs O(N log N + M log M) Nested Loop Join For each row, scan O(N×M) but simple Cost Model: How the Optimizer Decides Cost Formula Cost = CPU_cost + IO_cost × 1000 IO is 1000x slower than CPU Random IO = 10ms, Sequential = 0.1ms Table Statistics • Rows: 1,000,000 • Pages: 10,000 • Distinct values: 50,000 • Histogram: value distribution Example: Join 1M × 10K rows • Hash Join: 1M + 10K = 1.01M IOs • Sort-Merge: 2M log 1M + 20K log 10K = 42M IOs • Nested Loop: 1M × 10K = 10B IOs → Optimizer picks Hash Join! Final Plan: Hash Join + Sequential Scan + Hash Aggregate Estimated: 412 cost units | Actual: 47ms | 10x faster than naive nested loops

Query Parsing: SQL → Tree

SELECT * FROM users WHERE age > 21

Becomes:

        SELECT
           |
        PROJECT(*)
           |
        FILTER(age > 21)
           |
        SCAN(users)

Logical Planning: Tree → Algebra

Transform parse tree into relational algebra operations:

Apply optimization rules:


For each logical operation, pick a physical algorithm: always check IO equations, but here's some rough breakdowns.

Logical Op Physical Options When to Use
Join Hash Join Check IO equations
Sort-Merge Join Check IO equations
Nested Loop Check IO equations
Scan Sequential >10% of rows
Index <1% of rows
Bitmap 1-10% of rows
Aggregate Hash Unsorted input
Sort Already sorted

The Search Space Problem

3 tables to join = 3! = 6 possible join orders
4 tables = 4! = 24 orders
5 tables = 5! = 120 orders
10 tables = 10! = 3,628,800 orders!

Solution: Dynamic programming to prune bad plans early


Key Takeaways

  1. SQL is declarative - You say WHAT, optimizer figures out HOW

  2. Algorithms are lego blocks - Mix and match for best performance

  3. Cost-based decisions - Statistics drive algorithm selection

  4. Exponential search space - Smart pruning essential

  5. 10-100x performance difference - Good vs bad plans

The query optimizer is why you can write simple SQL and get fast execution!