Query Optimizer: Building Execution from Lego Blocks
The Magic Behind Every SQL Query
Your SQL goes through a sophisticated optimizer that considers millions of possible execution strategies in milliseconds.
The Journey: SQL to Execution
Query Parsing: SQL → Tree
SELECT * FROM users WHERE age > 21
Becomes:
SELECT
|
PROJECT(*)
|
FILTER(age > 21)
|
SCAN(users)
Logical Planning: Tree → Algebra
Transform parse tree into relational algebra operations:
-
Selection (σ): Filter rows
-
Projection (π): Choose columns
-
Join (⋈): Combine tables
-
Aggregation (γ): GROUP BY
Apply optimization rules:
-
Push filters down (do them early)
-
Eliminate unnecessary projections
-
Simplify expression
For each logical operation, pick a physical algorithm: always check IO equations, but here's some rough breakdowns.
| Logical Op | Physical Options | When to Use |
|---|---|---|
| Join | Hash Join | Check IO equations |
| Sort-Merge Join | Check IO equations | |
| Nested Loop | Check IO equations | |
| Scan | Sequential | >10% of rows |
| Index | <1% of rows | |
| Bitmap | 1-10% of rows | |
| Aggregate | Hash | Unsorted input |
| Sort | Already sorted |
The Search Space Problem
3 tables to join = 3! = 6 possible join orders
4 tables = 4! = 24 orders
5 tables = 5! = 120 orders
10 tables = 10! = 3,628,800 orders!
Solution: Dynamic programming to prune bad plans early
Key Takeaways
-
SQL is declarative - You say WHAT, optimizer figures out HOW
-
Algorithms are lego blocks - Mix and match for best performance
-
Cost-based decisions - Statistics drive algorithm selection
-
Exponential search space - Smart pruning essential
-
10-100x performance difference - Good vs bad plans
The query optimizer is why you can write simple SQL and get fast execution!