Correlated Subqueries: Row-by-Row Comparisons
Concept. A correlated subquery's inner SELECT references a column from the outer row, so it re-runs once per outer row.
Intuition. "Mickey's ratings above the GLOBAL average" uses one number, 4.2, computed once across all Listens. That's a basic scalar subquery (the previous page). "Mickey's ratings above HIS PERSONAL average" needs a different number for each user: 4.2 for Mickey, 4.4 for Minnie, 3.9 for Daffy. The threshold travels with the user. That's a correlated subquery.
Mental Model: A Double For-Loop
The correlation is just this: the inner query references a column from the outer row. The database runs that inner query naively, once per outer row. Conceptually:
for outer_row in outer_table: # outer loop
inner_result = run_inner( # inner runs PER outer_row
outer_value=outer_row.col # the correlation
)
if matches(outer_row, inner_result):
yield outer_row
When to reach for this: whenever the comparison threshold changes per row. The clue is the word their or its own: "above their own average," "top in their genre," "better than their manager's salary." Each row carries the context that defines its own comparison. If the threshold is a single global number, use a regular subquery; if it depends on the row, correlated.
Example: Users Above Their Own Average
-- Find users who have at least one rating above THEIR personal average.
SELECT DISTINCT u.name
FROM Users u
JOIN Listens l ON u.user_id = l.user_id
WHERE l.rating > (
-- Inner query: this user's personal average.
SELECT AVG(l2.rating)
FROM Listens l2
-- CORRELATED: references outer u.user_id; runs once per outer row.
WHERE l2.user_id = u.user_id
);
Step-by-Step Execution
| User | Their Ratings | Their AVG | Ratings > Their AVG | Result | |
|---|---|---|---|---|---|
| Mickey | 4.5, 4.2, 3.9 | 4.2 | 4.5 > 4.2 ✓ | Include | |
| Minnie | 4.7, 4.6, 3.9 | 4.4 | 4.7 > 4.4 ✓, 4.6 > 4.4 ✓ | Include | |
| Daffy | 2.9, 4.9, NULL | 3.9 | 4.9 > 3.9 ✓ | Include | |
| Pluto | (no ratings) | NULL | (no comparisons) | Exclude |
Output
| name | Notes |
|---|---|
| Mickey | Has rating 4.5 above their 4.2 average |
| Minnie | Has ratings 4.7, 4.6 above their 4.4 average |
| Daffy | Has rating 4.9 above their 3.9 average |
-
Each user gets their OWN personalized average
-
Different threshold for each row
-
This is impossible with basic subqueries!
Example: Users with High Ratings (EXISTS Pattern)
-- Find users who have at least one rating above 4.5.
SELECT u.name
FROM Users u
WHERE EXISTS (
-- Inner query: does this user have any rating > 4.5?
SELECT 1
FROM Listens l
-- CORRELATED: references outer u.user_id; runs once per outer row.
WHERE l.user_id = u.user_id
AND l.rating > 4.5
);
Execution: Like Nested Loops with Scoped Variables
Row-by-Row Execution
| Outer Row | user_id | name | Inner Query Check | EXISTS? | Result | |
|---|---|---|---|---|---|---|
| Row 1 | 1 | Mickey | Check ratings for user 1: {4.5, 4.2, 3.9} | ✗ 4.5 not > 4.5 | Exclude | |
| Row 2 | 2 | Minnie | Check ratings for user 2: {4.7, 4.6, 3.9} | ✓ 4.7 > 4.5 | Include | |
| Row 3 | 3 | Daffy | Check ratings for user 3: {2.9, 4.9, NULL} | ✓ 4.9 > 4.5 | Include | |
| Row 4 | 4 | Pluto | Check ratings for user 4: {} | ✗ No listens | Exclude |
Output
| name | |
|---|---|
| Minnie | |
| Daffy |
Example: Users with NO High Ratings (NOT EXISTS Pattern)
-- Find users who have NO rating above 4.5 (never had a high rating).
SELECT u.name
FROM Users u
WHERE NOT EXISTS (
-- Inner query: does this user have any rating > 4.5?
SELECT 1
FROM Listens l
-- CORRELATED: references outer u.user_id.
WHERE l.user_id = u.user_id
AND l.rating > 4.5
);
Row-by-Row Execution
| Outer Row | user_id | name | Inner Query Check | NOT EXISTS? | Result | |
|---|---|---|---|---|---|---|
| Row 1 | 1 | Mickey | Check ratings for user 1: {4.5, 4.2, 3.9} | ✓ None > 4.5 | Include | |
| Row 2 | 2 | Minnie | Check ratings for user 2: {4.7, 4.6, 3.9} | ✗ Has 4.7, 4.6 > 4.5 | Exclude | |
| Row 3 | 3 | Daffy | Check ratings for user 3: {2.9, 4.9, NULL} | ✗ Has 4.9 > 4.5 | Exclude | |
| Row 4 | 4 | Pluto | Check ratings for user 4: {} | ✓ No listens | Include |
Output
| name | Notes | |
|---|---|---|
| Mickey | Highest rating is 4.5 (not > 4.5) | |
| Pluto | Has no ratings at all |
Correlated vs Regular: Key Differences
| Aspect | Regular Subquery | Correlated Subquery |
|---|---|---|
| Execution | Runs ONCE total | Runs ONCE PER outer row |
| Independence | Standalone query | Depends on outer query |
| Use Case | Global comparisons | Row-specific comparisons |
| Performance | Usually faster | N+1 query problem |
| Example | "Above global average" | "Above YOUR average" |
When to Use Correlated Subqueries
Good Use Cases:
-
Personal comparisons (above YOUR average)
-
Relative rankings (best in YOUR category)
-
Row-specific existence checks (do YOU have...)
-
Self-referential queries (employees earning more than their manager)
Avoid When:
-
Simple global comparisons suffice
-
Performance is critical (consider JOIN alternatives)
-
Query complexity becomes unreadable
Common Patterns
-
Personal Aggregates: Compare to your own AVG/MAX/MIN
-
Existence per Entity: EXISTS with correlation
-
Ranking within Groups: Find top N per category
-
Self-Joins Alternative: Sometimes clearer than complex JOINs
Common Mistakes with Correlated Subqueries
Forgetting Correlation
Problem: Without correlation, EXISTS checks globally instead of per-row
Performance Anti-Pattern (N+1 Problem)
Problem: Subquery runs once per row - with 1000 users, that's 1001 queries total!
Note: Modern query optimizers may recognize this pattern and automatically rewrite it as a JOIN or use other optimization techniques like subquery caching. However, explicit JOINs are still preferred for clarity and guaranteed performance.