Correlated Subqueries: Row-by-Row Comparisons

Concept. A correlated subquery's inner SELECT references a column from the outer row, so it re-runs once per outer row.

Intuition. "Mickey's ratings above the GLOBAL average" uses one number, 4.2, computed once across all Listens. That's a basic scalar subquery (the previous page). "Mickey's ratings above HIS PERSONAL average" needs a different number for each user: 4.2 for Mickey, 4.4 for Minnie, 3.9 for Daffy. The threshold travels with the user. That's a correlated subquery.

Mental Model: A Double For-Loop

The correlation is just this: the inner query references a column from the outer row. The database runs that inner query naively, once per outer row. Conceptually:

for outer_row in outer_table:        # outer loop
    inner_result = run_inner(        # inner runs PER outer_row
        outer_value=outer_row.col    # the correlation
    )
    if matches(outer_row, inner_result):
        yield outer_row

When to reach for this: whenever the comparison threshold changes per row. The clue is the word their or its own: "above their own average," "top in their genre," "better than their manager's salary." Each row carries the context that defines its own comparison. If the threshold is a single global number, use a regular subquery; if it depends on the row, correlated.


Example: Users Above Their Own Average

-- Find users who have at least one rating above THEIR personal average.
SELECT DISTINCT u.name
FROM Users u
JOIN Listens l ON u.user_id = l.user_id
WHERE l.rating > (
    -- Inner query: this user's personal average.
    SELECT AVG(l2.rating)
    FROM Listens l2
    -- CORRELATED: references outer u.user_id; runs once per outer row.
    WHERE l2.user_id = u.user_id
);

Step-by-Step Execution

UserTheir RatingsTheir AVGRatings > Their AVGResult
Mickey4.5, 4.2, 3.94.24.5 > 4.2 ✓Include
Minnie4.7, 4.6, 3.94.44.7 > 4.4 ✓, 4.6 > 4.4 ✓Include
Daffy2.9, 4.9, NULL3.94.9 > 3.9 ✓Include
Pluto(no ratings)NULL(no comparisons)Exclude

Output

nameNotes
MickeyHas rating 4.5 above their 4.2 average
MinnieHas ratings 4.7, 4.6 above their 4.4 average
DaffyHas rating 4.9 above their 3.9 average
  • Each user gets their OWN personalized average

  • Different threshold for each row

  • This is impossible with basic subqueries!


Example: Users with High Ratings (EXISTS Pattern)

-- Find users who have at least one rating above 4.5.
SELECT u.name
FROM Users u
WHERE EXISTS (
    -- Inner query: does this user have any rating > 4.5?
    SELECT 1
    FROM Listens l
    -- CORRELATED: references outer u.user_id; runs once per outer row.
    WHERE l.user_id = u.user_id
      AND l.rating > 4.5
);

Execution: Like Nested Loops with Scoped Variables

Row-by-Row Execution

Outer Rowuser_idnameInner Query CheckEXISTS?Result
Row 11MickeyCheck ratings for user 1: {4.5, 4.2, 3.9}✗ 4.5 not > 4.5Exclude
Row 22MinnieCheck ratings for user 2: {4.7, 4.6, 3.9}✓ 4.7 > 4.5Include
Row 33DaffyCheck ratings for user 3: {2.9, 4.9, NULL}✓ 4.9 > 4.5Include
Row 44PlutoCheck ratings for user 4: {}✗ No listensExclude

Output

name
Minnie
Daffy

Example: Users with NO High Ratings (NOT EXISTS Pattern)

-- Find users who have NO rating above 4.5 (never had a high rating).
SELECT u.name
FROM Users u
WHERE NOT EXISTS (
    -- Inner query: does this user have any rating > 4.5?
    SELECT 1
    FROM Listens l
    -- CORRELATED: references outer u.user_id.
    WHERE l.user_id = u.user_id
      AND l.rating > 4.5
);

Row-by-Row Execution

Outer Rowuser_idnameInner Query CheckNOT EXISTS?Result
Row 11MickeyCheck ratings for user 1: {4.5, 4.2, 3.9}✓ None > 4.5Include
Row 22MinnieCheck ratings for user 2: {4.7, 4.6, 3.9}✗ Has 4.7, 4.6 > 4.5Exclude
Row 33DaffyCheck ratings for user 3: {2.9, 4.9, NULL}✗ Has 4.9 > 4.5Exclude
Row 44PlutoCheck ratings for user 4: {}✓ No listensInclude

Output

nameNotes
MickeyHighest rating is 4.5 (not > 4.5)
PlutoHas no ratings at all

Correlated vs Regular: Key Differences

Aspect Regular Subquery Correlated Subquery
Execution Runs ONCE total Runs ONCE PER outer row
Independence Standalone query Depends on outer query
Use Case Global comparisons Row-specific comparisons
Performance Usually faster N+1 query problem
Example "Above global average" "Above YOUR average"

When to Use Correlated Subqueries

Good Use Cases:

  • Personal comparisons (above YOUR average)

  • Relative rankings (best in YOUR category)

  • Row-specific existence checks (do YOU have...)

  • Self-referential queries (employees earning more than their manager)

Avoid When:

  • Simple global comparisons suffice

  • Performance is critical (consider JOIN alternatives)

  • Query complexity becomes unreadable

Common Patterns

  1. Personal Aggregates: Compare to your own AVG/MAX/MIN

  2. Existence per Entity: EXISTS with correlation

  3. Ranking within Groups: Find top N per category

  4. Self-Joins Alternative: Sometimes clearer than complex JOINs

Common Mistakes with Correlated Subqueries

Forgetting Correlation

Problem: Without correlation, EXISTS checks globally instead of per-row

Performance Anti-Pattern (N+1 Problem)

Problem: Subquery runs once per row - with 1000 users, that's 1001 queries total!

Note: Modern query optimizers may recognize this pattern and automatically rewrite it as a JOIN or use other optimization techniques like subquery caching. However, explicit JOINs are still preferred for clarity and guaranteed performance.