Correlated Subqueries: Row-by-Row Comparisons

Concept. A correlated subquery's inner SELECT references a column from the outer row, so it re-runs once per outer row.

Intuition. "Mickey's ratings above the GLOBAL average" uses one number, 4.2, computed once across all Listens. That's a basic scalar subquery (the previous page). "Mickey's ratings above HIS PERSONAL average" needs a different number for each user: 4.2 for Mickey, 4.4 for Minnie, 3.9 for Daffy. The threshold travels with the user. That's a correlated subquery.

Mental Model: A Double For-Loop

The correlation is just this: the inner query references a column from the outer row. The database runs that inner query naively, once per outer row. Conceptually:

for outer_row in outer_table:        # outer loop
    inner_result = run_inner(        # inner runs PER outer_row
        outer_value=outer_row.col    # the correlation
    )
    if matches(outer_row, inner_result):
        yield outer_row

When to reach for this: whenever the comparison threshold changes per row. The clue is the word their or its own: "above their own average," "top in their genre," "better than their manager's salary." Each row carries the context that defines its own comparison. If the threshold is a single global number, use a regular subquery; if it depends on the row, correlated.

Example: Users Above Their Own Average

-- Find users who have at least one rating above THEIR personal average.
SELECT DISTINCT u.name
FROM Users u
JOIN Listens l ON u.user_id = l.user_id
WHERE l.rating > (
    -- Inner query: this user's personal average.
    SELECT AVG(l2.rating)
    FROM Listens l2
    -- CORRELATED: references outer u.user_id; runs once per outer row.
    WHERE l2.user_id = u.user_id
);

Step-by-Step Execution

User	Their Ratings	Their AVG	Ratings > Their AVG	Result
Mickey	4.5, 4.2, 3.9	4.2	4.5 > 4.2 ✓	Include
Minnie	4.7, 4.6, 3.9	4.4	4.7 > 4.4 ✓, 4.6 > 4.4 ✓	Include
Daffy	2.9, 4.9, NULL	3.9	4.9 > 3.9 ✓	Include
Pluto	(no ratings)	NULL	(no comparisons)	Exclude

Output

name	Notes
Mickey	Has rating 4.5 above their 4.2 average
Minnie	Has ratings 4.7, 4.6 above their 4.4 average
Daffy	Has rating 4.9 above their 3.9 average

Each user gets their OWN personalized average
Different threshold for each row
This is impossible with basic subqueries!

Example: Users with High Ratings (EXISTS Pattern)

-- Find users who have at least one rating above 4.5.
SELECT u.name
FROM Users u
WHERE EXISTS (
    -- Inner query: does this user have any rating > 4.5?
    SELECT 1
    FROM Listens l
    -- CORRELATED: references outer u.user_id; runs once per outer row.
    WHERE l.user_id = u.user_id
      AND l.rating > 4.5
);

Execution: Like Nested Loops with Scoped Variables

Row-by-Row Execution

Outer Row	user_id	name	Inner Query Check	EXISTS?	Result
Row 1	1	Mickey	Check ratings for user 1: {4.5, 4.2, 3.9}	✗ 4.5 not > 4.5	Exclude
Row 2	2	Minnie	Check ratings for user 2: {4.7, 4.6, 3.9}	✓ 4.7 > 4.5	Include
Row 3	3	Daffy	Check ratings for user 3: {2.9, 4.9, NULL}	✓ 4.9 > 4.5	Include
Row 4	4	Pluto	Check ratings for user 4: {}	✗ No listens	Exclude

Output

	name
	Minnie
	Daffy

Example: Users with NO High Ratings (NOT EXISTS Pattern)

-- Find users who have NO rating above 4.5 (never had a high rating).
SELECT u.name
FROM Users u
WHERE NOT EXISTS (
    -- Inner query: does this user have any rating > 4.5?
    SELECT 1
    FROM Listens l
    -- CORRELATED: references outer u.user_id.
    WHERE l.user_id = u.user_id
      AND l.rating > 4.5
);

Row-by-Row Execution

Outer Row	user_id	name	Inner Query Check	NOT EXISTS?	Result
Row 1	1	Mickey	Check ratings for user 1: {4.5, 4.2, 3.9}	✓ None > 4.5	Include
Row 2	2	Minnie	Check ratings for user 2: {4.7, 4.6, 3.9}	✗ Has 4.7, 4.6 > 4.5	Exclude
Row 3	3	Daffy	Check ratings for user 3: {2.9, 4.9, NULL}	✗ Has 4.9 > 4.5	Exclude
Row 4	4	Pluto	Check ratings for user 4: {}	✓ No listens	Include

Output

	name	Notes
	Mickey	Highest rating is 4.5 (not > 4.5)
	Pluto	Has no ratings at all

Correlated vs Regular: Key Differences

Aspect	Regular Subquery	Correlated Subquery
Execution	Runs ONCE total	Runs ONCE PER outer row
Independence	Standalone query	Depends on outer query
Use Case	Global comparisons	Row-specific comparisons
Performance	Usually faster	N+1 query problem
Example	"Above global average"	"Above YOUR average"

When to Use Correlated Subqueries

Good Use Cases:

Personal comparisons (above YOUR average)
Relative rankings (best in YOUR category)
Row-specific existence checks (do YOU have...)
Self-referential queries (employees earning more than their manager)

Avoid When:

Simple global comparisons suffice
Performance is critical (consider JOIN alternatives)
Query complexity becomes unreadable

Common Patterns

Personal Aggregates: Compare to your own AVG/MAX/MIN
Existence per Entity: EXISTS with correlation
Ranking within Groups: Find top N per category
Self-Joins Alternative: Sometimes clearer than complex JOINs

Common Mistakes with Correlated Subqueries

Forgetting Correlation

Problem: Without correlation, EXISTS checks globally instead of per-row

Performance Anti-Pattern (N+1 Problem)

Problem: Subquery runs once per row - with 1000 users, that's 1001 queries total!

Note: Modern query optimizers may recognize this pattern and automatically rewrite it as a JOIN or use other optimization techniques like subquery caching. However, explicit JOINs are still preferred for clarity and guaranteed performance.