Subqueries: Basic Patterns

Core Concept

Subqueries are SQL's nesting dolls - queries inside queries. They solve problems in layers. It's how SQL breaks complex questions into simple steps.

Basic Subqueries: Queries Inside Queries Queries Inside Queries Get user names Find users who listened to these songs Find Taylor Swift songs {1, 2, 9} {1, 2, 3} Mickey, Minnie, Daffy Runs ONCE Same result for ALL outer rows Three Main Types IN / NOT IN Check if value exists in result list [ ] EXISTS Check if ANY rows match (early stop) ? Scalar Returns exactly one value for comparison 1

Key Rules

  1. Inside-Out Execution: Inner queries complete before outer queries start

  2. Return Types: Single value (scalar), multiple values (IN), or existence check (EXISTS)

  3. Independence: Basic subqueries run once, independently of outer query

  4. NULL Handling: NULL in NOT IN can eliminate all results

  5. Performance: Often rewritten as JOINs by the optimizer


IN Subquery: Find Taylor Swift Listeners

-- Find users who listen to Taylor Swift songs
SELECT name                            -- Last: Select columns to output
FROM Users
WHERE user_id IN (                    -- IN: Check if user_id in the list
    -- Step 2: Get distinct users who listened to those songs
    SELECT DISTINCT user_id           -- DISTINCT: Each user once only
    FROM Listens
    WHERE song_id IN (                -- IN: Check if song_id in the list
        -- Step 1: Find all Taylor Swift song IDs (executes FIRST)
        SELECT song_id                -- Returns set: {1, 2, 9}
        FROM Songs
        WHERE artist = 'Taylor Swift'
    )
)

Step 1: Find Taylor Swift Songs

song_idNotes
1✓ Taylor Swift - Evermore
2✓ Taylor Swift - Willow
3✗ Ed Sheeran - Shape of You
4✗ Ed Sheeran - Photograph
5✗ Ed Sheeran - Shivers
6✗ Beatles - Yesterday
7✗ Beatles - Yellow Submarine
8✗ Beatles - Hey Jude
9✓ Taylor Swift - Bad Blood
10✗ DJ - DJ Mix

Result: {1, 2, 9}

Step 2: Find Users with song_ids 1, 2, 9

user_idsong_idNotes
11User 1 listened to Taylor Swift
12User 1 listened to Taylor Swift
22User 2 listened to Taylor Swift
31User 3 listened to Taylor Swift
32User 3 listened to Taylor Swift

Distinct users: {1, 2, 3}

Step 3: Get Names for {1, 2, 3}

nameNotes: user_id
Mickey1
Minnie2
Daffy3

NOT IN Subquery: Non-Taylor Swift Fans

-- Find users who never listened to any Taylor Swift song
SELECT name
FROM Users
WHERE user_id NOT IN (                -- NOT IN: Exclude these users
    -- Step 2: Get distinct users who listened to those songs
    SELECT DISTINCT user_id
    FROM Listens
    WHERE song_id IN (
        -- Step 1: Find all Taylor Swift song IDs  
        SELECT song_id
        FROM Songs
        WHERE artist = 'Taylor Swift'
    )
    AND user_id IS NOT NULL           -- CRITICAL: Filter NULLs to avoid empty result!
)

Using same inner queries, but inverting at the end:

Step 1 & 2: Same as above → Users {1, 2, 3} listened to Taylor Swift

Step 3: Get Names for Users NOT IN {1, 2, 3}

user_idnameNotes: Check
4Pluto✓ NOT IN {1,2,3} - never listened to Taylor Swift

Scalar Subquery: Above Average Ratings

-- Find listens with above-average ratings
SELECT u.name, l.rating               -- Last: Select columns to output
FROM Users u
JOIN Listens l ON u.user_id = l.user_id
WHERE l.rating > (                    -- Compare to single value (scalar subquery)
    -- Step 1: Calculate average rating (returns ONE value: 4.2)
    SELECT AVG(rating)                -- AVG automatically ignores NULL values
    FROM Listens                      -- Executes ONCE, cached, reused for all rows
)

Step 1: Calculate Average

ratingNotes
4.5
4.2
3.9
4.7
4.6
3.9
2.9
4.9
NULL✗ Skip NULL

AVG = (4.5+4.2+3.9+4.7+4.6+3.9+2.9+4.9) / 8 = 4.2

Step 2: Filter Ratings > 4.2

nameratingNotes: CheckNotes: Result
Mickey4.5✓ TRUEInclude
Mickey4.2✗ FALSEExclude
Mickey3.9✗ FALSEExclude
Minnie4.7✓ TRUEInclude
Minnie4.6✓ TRUEInclude
Minnie3.9✗ FALSEExclude
Daffy2.9✗ FALSEExclude
Daffy4.9✓ TRUEInclude
DaffyNULL? UNKNOWNExclude

Output

name rating
Mickey 4.5
Minnie 4.7
Minnie 4.6
Daffy 4.9

EXISTS: Early Stop Optimization

Efficient: EXISTS with Early Stopping

-- Check if ANY high-value listeners exist  
SELECT 'High-value listeners found!' AS result
WHERE EXISTS (                        -- EXISTS: TRUE if ANY row found
    SELECT 1                          -- SELECT 1 is convention (value ignored)
    FROM Listens
    WHERE rating > 4.5                -- Stops at FIRST match (efficient!)
)

Execution Trace:

StepratingCheckAction
Row 14.54.5 > 4.5? ✗Continue
Row 24.24.2 > 4.5? ✗Continue
Row 33.93.9 > 4.5? ✗Continue
Row 44.74.7 > 4.5? ✓STOP!
Row 54.6-Skipped
Row 63.9-Skipped
Row 72.9-Skipped
Row 84.9-Skipped
Row 9NULL-Skipped

Rows checked: 4 of 9

✓ Stops at first match
✓ Returns TRUE immediately
✓ Saves 5 row checks

Inefficient: COUNT(*) Comparison

-- INEFFICIENT - must check ALL rows
SELECT 'High-value listeners found!' AS result
WHERE (
    SELECT COUNT(*)                   -- Counts ALL matching rows (slow!)
    FROM Listens 
    WHERE rating > 4.5                -- Must scan entire table
) > 0                                 -- Then compares count to 0

Execution Trace:

StepratingCheckAction
Row 14.54.5 > 4.5? ✗Count: 0
Row 24.24.2 > 4.5? ✗Count: 0
Row 33.93.9 > 4.5? ✗Count: 0
Row 44.74.7 > 4.5? ✓Count: 1
Row 54.64.6 > 4.5? ✓Count: 2
Row 63.93.9 > 4.5? ✗Count: 2
Row 72.92.9 > 4.5? ✗Count: 2
Row 84.94.9 > 4.5? ✓Count: 3
Row 9NULLNULL > 4.5? ✗Count: 3

Rows checked: 9 of 9

✗ Must check every row
✗ Builds complete count
✗ Wastes resources

Performance Comparison:


Subquery Types Comparison

Type Returns Example Use When
IN List of values WHERE id IN (SELECT...) Checking membership
NOT IN List to exclude WHERE id NOT IN (...) Finding non-members
EXISTS TRUE/FALSE WHERE EXISTS (...) Any match suffices
NOT EXISTS TRUE/FALSE WHERE NOT EXISTS (...) Ensuring absence
Scalar Single value WHERE x > (SELECT AVG...) Comparing to aggregate
Table Full result set FROM (SELECT...) AS t Derived tables

Common Patterns

Common Mistakes

⚠️ The NULL in NOT IN Death Trap

🚨 CRITICAL: NULL in NOT IN

If a NOT IN subquery returns ANY NULL value, the ENTIRE result becomes empty! This is the #1 most dangerous NULL trap in SQL.

-- WRONG: If subquery returns ANY NULL, entire result becomes empty!
WHERE user_id NOT IN (
    SELECT user_id FROM Listens       -- May contain NULL user_ids
)                                     -- NULL in NOT IN = empty result!

-- CORRECT: Filter out NULLs explicitly
WHERE user_id NOT IN (
    SELECT user_id FROM Listens 
    WHERE user_id IS NOT NULL         -- Prevent NULL trap
)

Why This Happens (Step by Step)

What You Think Happens:

-- Check: Is Pluto's ID (4) not in list?
4 NOT IN (1, 2, 3, NULL)
-- You expect: TRUE (Pluto not listed)

What Actually Happens:

-- SQL expands NOT IN to:
4 != 1 AND 4 != 2 AND 4 != 3 AND 4 != NULL

-- Evaluates to:
TRUE AND TRUE AND TRUE AND UNKNOWN

-- Final result:
UNKNOWN  -- Not TRUE, so filtered out!

Real Example with Our Data

-- Find songs never listened to
-- WRONG: Returns empty if any song_id could be NULL
SELECT title FROM Songs
WHERE song_id NOT IN (
    SELECT song_id FROM Listens  -- Has songs 1,2,6,7,8 + potential NULLs
)

-- CORRECT: Filter out NULLs explicitly
SELECT title FROM Songs  
WHERE song_id NOT IN (
    SELECT song_id FROM Listens 
    WHERE song_id IS NOT NULL  -- Critical filter!
)

Result Difference:

Safe Patterns to Use Instead

-- Option 1: Filter NULLs (shown above)
WHERE id NOT IN (SELECT id FROM TABLE WHERE id IS NOT NULL)

-- Option 2: Use NOT EXISTS (NULL-safe)
WHERE NOT EXISTS (
    SELECT 1 FROM Listens l 
    WHERE l.song_id = s.song_id
)

-- Option 3: Use LEFT JOIN with NULL check
FROM Songs s
LEFT JOIN Listens l ON s.song_id = l.song_id
WHERE l.song_id IS NULL

Remember: Always check if your NOT IN subquery can return NULL!

The Multiple Values Error

-- WRONG: Subquery returns multiple rows but = expects one
WHERE rating = (SELECT rating FROM Listens)       -- Error: Multiple rows!

-- CORRECT: Use IN for multiple values
WHERE rating IN (SELECT rating FROM Listens)      -- IN handles sets
-- OR: Use scalar aggregation to get single value
WHERE rating = (SELECT MAX(rating) FROM Listens)  -- MAX returns ONE value

Problem: = expects single value, not a list


Next: Correlated Subqueries

What we've learned: Basic subqueries that run ONCE and return the same result for all rows.

The limitation: What if each row needs a DIFFERENT comparison?

→ Continue to Correlated Subqueries for personalized row-by-row comparisons