Correlated Subqueries: Row-by-Row Comparisons

The Problem: Personalized Comparisons

Consider this: "Find users whose latest rating is above THEIR personal average" - Each user has a unique average!

Why Basic Subqueries Aren't Enough

What we really need: Each user compared to THEIR OWN average!


Solution: Correlated Subqueries

Here's the fix: Add one extra line inside a basic subquery to personalize (correlate).

Key Insight: The inner query can "see" the outer query's current row!


Example: Users Above Their Own Average

Step-by-Step Execution

UserTheir RatingsTheir AVGRatings > Their AVGResult
Mickey4.5, 4.2, 3.94.24.5 > 4.2 ✓Include
Minnie4.7, 4.6, 3.94.44.7 > 4.4 ✓, 4.6 > 4.4 ✓Include
Daffy2.9, 4.9, NULL3.94.9 > 3.9 ✓Include
Pluto(no ratings)NULL(no comparisons)Exclude

Output

nameNotes
MickeyHas rating 4.5 above their 4.2 average
MinnieHas ratings 4.7, 4.6 above their 4.4 average
DaffyHas rating 4.9 above their 3.9 average

Example: Users with High Ratings (EXISTS Pattern)

Execution: Like Nested Loops with Scoped Variables

Row-by-Row Execution

Outer Rowuser_idnameInner Query CheckEXISTS?Result
Row 11MickeyCheck ratings for user 1: {4.5, 4.2, 3.9}✗ 4.5 not > 4.5Exclude
Row 22MinnieCheck ratings for user 2: {4.7, 4.6, 3.9}✓ 4.7 > 4.5Include
Row 33DaffyCheck ratings for user 3: {2.9, 4.9, NULL}✓ 4.9 > 4.5Include
Row 44PlutoCheck ratings for user 4: {}✗ No listensExclude

Output

name
Minnie
Daffy

Example: Users with NO High Ratings (NOT EXISTS Pattern)

Row-by-Row Execution

Outer Rowuser_idnameInner Query CheckNOT EXISTS?Result
Row 11MickeyCheck ratings for user 1: {4.5, 4.2, 3.9}✓ None > 4.5Include
Row 22MinnieCheck ratings for user 2: {4.7, 4.6, 3.9}✗ Has 4.7, 4.6 > 4.5Exclude
Row 33DaffyCheck ratings for user 3: {2.9, 4.9, NULL}✗ Has 4.9 > 4.5Exclude
Row 44PlutoCheck ratings for user 4: {}✓ No listensInclude

Output

nameNotes
MickeyHighest rating is 4.5 (not > 4.5)
PlutoHas no ratings at all

Correlated vs Regular: Key Differences

Aspect Regular Subquery Correlated Subquery
Execution Runs ONCE total Runs ONCE PER outer row
Independence Standalone query Depends on outer query
Use Case Global comparisons Row-specific comparisons
Performance Usually faster N+1 query problem
Example "Above global average" "Above YOUR average"

When to Use Correlated Subqueries

Good Use Cases:

Avoid When:

Common Patterns

  1. Personal Aggregates: Compare to your own AVG/MAX/MIN

  2. Existence per Entity: EXISTS with correlation

  3. Ranking within Groups: Find top N per category

  4. Self-Joins Alternative: Sometimes clearer than complex JOINs

Common Mistakes with Correlated Subqueries

Forgetting Correlation

Problem: Without correlation, EXISTS checks globally instead of per-row

Performance Anti-Pattern (N+1 Problem)

Problem: Subquery runs once per row - with 1000 users, that's 1001 queries total!

Note: Modern query optimizers may recognize this pattern and automatically rewrite it as a JOIN or use other optimization techniques like subquery caching. However, explicit JOINs are still preferred for clarity and guaranteed performance.