Correlated Subqueries: Row-by-Row Comparisons
The Problem: Personalized Comparisons
- What if each row needs its OWN answer? (e.g., "Mickey's average is 4.2, Minnie's is 4.4")
Consider this: "Find users whose latest rating is above THEIR personal average" - Each user has a unique average!
Why Basic Subqueries Aren't Enough
What we really need: Each user compared to THEIR OWN average!
Solution: Correlated Subqueries
Here's the fix: Add one extra line inside a basic subquery to personalize (correlate).
Key Insight: The inner query can "see" the outer query's current row!
Example: Users Above Their Own Average
Step-by-Step Execution
| User | Their Ratings | Their AVG | Ratings > Their AVG | Result | |
|---|---|---|---|---|---|
| Mickey | 4.5, 4.2, 3.9 | 4.2 | 4.5 > 4.2 ✓ | Include | |
| Minnie | 4.7, 4.6, 3.9 | 4.4 | 4.7 > 4.4 ✓, 4.6 > 4.4 ✓ | Include | |
| Daffy | 2.9, 4.9, NULL | 3.9 | 4.9 > 3.9 ✓ | Include | |
| Pluto | (no ratings) | NULL | (no comparisons) | Exclude |
Output
| name | Notes |
|---|---|
| Mickey | Has rating 4.5 above their 4.2 average |
| Minnie | Has ratings 4.7, 4.6 above their 4.4 average |
| Daffy | Has rating 4.9 above their 3.9 average |
-
Each user gets their OWN personalized average
-
Different threshold for each row
-
This is impossible with basic subqueries!
Example: Users with High Ratings (EXISTS Pattern)
Execution: Like Nested Loops with Scoped Variables
Row-by-Row Execution
| Outer Row | user_id | name | Inner Query Check | EXISTS? | Result | |
|---|---|---|---|---|---|---|
| Row 1 | 1 | Mickey | Check ratings for user 1: {4.5, 4.2, 3.9} | ✗ 4.5 not > 4.5 | Exclude | |
| Row 2 | 2 | Minnie | Check ratings for user 2: {4.7, 4.6, 3.9} | ✓ 4.7 > 4.5 | Include | |
| Row 3 | 3 | Daffy | Check ratings for user 3: {2.9, 4.9, NULL} | ✓ 4.9 > 4.5 | Include | |
| Row 4 | 4 | Pluto | Check ratings for user 4: {} | ✗ No listens | Exclude |
Output
| name | |
|---|---|
| Minnie | |
| Daffy |
Example: Users with NO High Ratings (NOT EXISTS Pattern)
Row-by-Row Execution
| Outer Row | user_id | name | Inner Query Check | NOT EXISTS? | Result | |
|---|---|---|---|---|---|---|
| Row 1 | 1 | Mickey | Check ratings for user 1: {4.5, 4.2, 3.9} | ✓ None > 4.5 | Include | |
| Row 2 | 2 | Minnie | Check ratings for user 2: {4.7, 4.6, 3.9} | ✗ Has 4.7, 4.6 > 4.5 | Exclude | |
| Row 3 | 3 | Daffy | Check ratings for user 3: {2.9, 4.9, NULL} | ✗ Has 4.9 > 4.5 | Exclude | |
| Row 4 | 4 | Pluto | Check ratings for user 4: {} | ✓ No listens | Include |
Output
| name | Notes | |
|---|---|---|
| Mickey | Highest rating is 4.5 (not > 4.5) | |
| Pluto | Has no ratings at all |
Correlated vs Regular: Key Differences
| Aspect | Regular Subquery | Correlated Subquery |
|---|---|---|
| Execution | Runs ONCE total | Runs ONCE PER outer row |
| Independence | Standalone query | Depends on outer query |
| Use Case | Global comparisons | Row-specific comparisons |
| Performance | Usually faster | N+1 query problem |
| Example | "Above global average" | "Above YOUR average" |
When to Use Correlated Subqueries
✅ Good Use Cases:
-
Personal comparisons (above YOUR average)
-
Relative rankings (best in YOUR category)
-
Row-specific existence checks (do YOU have...)
-
Self-referential queries (employees earning more than their manager)
❌ Avoid When:
-
Simple global comparisons suffice
-
Performance is critical (consider JOIN alternatives)
-
Query complexity becomes unreadable
Common Patterns
-
Personal Aggregates: Compare to your own AVG/MAX/MIN
-
Existence per Entity: EXISTS with correlation
-
Ranking within Groups: Find top N per category
-
Self-Joins Alternative: Sometimes clearer than complex JOINs
Common Mistakes with Correlated Subqueries
Forgetting Correlation
Problem: Without correlation, EXISTS checks globally instead of per-row
Performance Anti-Pattern (N+1 Problem)
Problem: Subquery runs once per row - with 1000 users, that's 1001 queries total!
Note: Modern query optimizers may recognize this pattern and automatically rewrite it as a JOIN or use other optimization techniques like subquery caching. However, explicit JOINs are still preferred for clarity and guaranteed performance.