Subqueries: Basic Patterns

Core Concept

Subqueries are SQL's nesting dollsβ€”queries within queries. They tackle problems step-by-step, breaking down complex questions into simpler tasks.

Basic Subqueries: Queries Inside Queries Queries Inside Queries Get user names Find users who listened to these songs Find Taylor Swift songs {1, 2, 9} {1, 2, 3} Mickey, Minnie, Daffy Runs ONCE Same result for ALL outer rows Three Main Types IN / NOT IN Check if value exists in result list [ ] EXISTS Check if ANY rows match (early stop) ? Scalar Returns exactly one value for comparison 1

IN Subquery: Find Taylor Swift Listeners

Key Rule: Inside-Out Execution: Inner queries finish before outer queries begin

-- Find users who listen to Taylor Swift songs
SELECT name                            -- Last: Select columns to output
FROM Users
WHERE user_id IN (                    -- IN: Check if user_id in the list
    -- Step 2: Get distinct users who listened to those songs
    SELECT DISTINCT user_id           -- DISTINCT: Each user once only
    FROM Listens
    WHERE song_id IN (                -- IN: Check if song_id in the list
        -- Step 1: Find all Taylor Swift song IDs (executes FIRST)
        SELECT song_id                -- Returns set: {1, 2, 9}
        FROM Songs
        WHERE artist = 'Taylor Swift'
    )
)

Step 1: Find Taylor Swift Songs

song_idNotes
1βœ“ Taylor Swift - Evermore
2βœ“ Taylor Swift - Willow
3βœ— Ed Sheeran - Shape of You
4βœ— Ed Sheeran - Photograph
5βœ— Ed Sheeran - Shivers
6βœ— Beatles - Yesterday
7βœ— Beatles - Yellow Submarine
8βœ— Beatles - Hey Jude
9βœ“ Taylor Swift - Bad Blood
10βœ— DJ - DJ Mix

Result: {1, 2, 9}

Step 2: Find Users with song_ids 1, 2, 9

user_idsong_idNotes
11User 1 listened to Taylor Swift
12User 1 listened to Taylor Swift
22User 2 listened to Taylor Swift
31User 3 listened to Taylor Swift
32User 3 listened to Taylor Swift

Distinct users: {1, 2, 3}

Step 3: Get Names for {1, 2, 3}

nameNotes: user_id
Mickey1
Minnie2
Daffy3

NOT IN Subquery: Non-Taylor Swift Fans

-- Find users who never listened to any Taylor Swift song
SELECT name
FROM Users
WHERE user_id NOT IN (                -- NOT IN: Exclude these users
    -- Step 2: Get distinct users who listened to those songs
    SELECT DISTINCT user_id
    FROM Listens
    WHERE song_id IN (
        -- Step 1: Find all Taylor Swift song IDs  
        SELECT song_id
        FROM Songs
        WHERE artist = 'Taylor Swift'
    )
    AND user_id IS NOT NULL           -- CRITICAL: Filter NULLs to avoid empty result!
)

Using the same inner queries, but inverting at the end:

Step 1 & 2: Same as above β†’ Users {1, 2, 3} listened to Taylor Swift

Step 3: Get Names for Users NOT IN {1, 2, 3}

user_idnameNotes: Check
4Plutoβœ“ NOT IN {1,2,3} - never listened to Taylor Swift

Scalar Subquery: Above Average Ratings

-- Find listens with above-average ratings
SELECT u.name, l.rating               -- Last: Select columns to output
FROM Users u
JOIN Listens l ON u.user_id = l.user_id
WHERE l.rating > (                    -- Compare to single value (scalar subquery)
    -- Step 1: Calculate average rating (returns ONE value: 4.2)
    SELECT AVG(rating)                -- AVG automatically ignores NULL values
    FROM Listens                      -- Executes ONCE, cached, reused for all rows
)

Step 1: Calculate Average

ratingNotes
4.5βœ“
4.2βœ“
3.9βœ“
4.7βœ“
4.6βœ“
3.9βœ“
2.9βœ“
4.9βœ“
NULLβœ— Skip NULL

AVG = (4.5+4.2+3.9+4.7+4.6+3.9+2.9+4.9) / 8 = 4.2

Step 2: Filter Ratings > 4.2

nameratingNotes: CheckNotes: Result
Mickey4.5βœ“ TRUEInclude
Mickey4.2βœ— FALSEExclude
Mickey3.9βœ— FALSEExclude
Minnie4.7βœ“ TRUEInclude
Minnie4.6βœ“ TRUEInclude
Minnie3.9βœ— FALSEExclude
Daffy2.9βœ— FALSEExclude
Daffy4.9βœ“ TRUEInclude
DaffyNULL? UNKNOWNExclude

Output

name rating
Mickey 4.5
Minnie 4.7
Minnie 4.6
Daffy 4.9

EXISTS: Early Stop Optimization

Efficient: EXISTS with Early Stopping

-- Check if ANY high-value listeners exist  
SELECT 'High-value listeners found!' AS result
WHERE EXISTS (                        -- EXISTS: TRUE if ANY row found
    SELECT 1                          -- SELECT 1 is convention (value ignored)
    FROM Listens
    WHERE rating > 4.5                -- Stops at FIRST match (efficient!)
)

Execution Trace:

StepratingCheckAction
Row 14.54.5 > 4.5? βœ—Continue
Row 24.24.2 > 4.5? βœ—Continue
Row 33.93.9 > 4.5? βœ—Continue
Row 44.74.7 > 4.5? βœ“STOP!
Row 54.6-Skipped
Row 63.9-Skipped
Row 72.9-Skipped
Row 84.9-Skipped
Row 9NULL-Skipped

Rows checked: 4 of 9

βœ“ Stops at first match
βœ“ Returns TRUE immediately
βœ“ Saves 5 row checks

Inefficient: COUNT(*) Comparison

-- INEFFICIENT - must check ALL rows
SELECT 'High-value listeners found!' AS result
WHERE (
    SELECT COUNT(*)                   -- Counts ALL matching rows (slow!)
    FROM Listens 
    WHERE rating > 4.5                -- Must scan entire table
) > 0                                 -- Then compares count to 0

Execution Trace:

StepratingCheckAction
Row 14.54.5 > 4.5? βœ—Count: 0
Row 24.24.2 > 4.5? βœ—Count: 0
Row 33.93.9 > 4.5? βœ—Count: 0
Row 44.74.7 > 4.5? βœ“Count: 1
Row 54.64.6 > 4.5? βœ“Count: 2
Row 63.93.9 > 4.5? βœ—Count: 2
Row 72.92.9 > 4.5? βœ—Count: 2
Row 84.94.9 > 4.5? βœ“Count: 3
Row 9NULLNULL > 4.5? βœ—Count: 3

Rows checked: 9 of 9

βœ— Must check every row
βœ— Builds complete count
βœ— Wastes resources

Performance Comparison:


Subquery Types Comparison

Type Returns Example Use When
IN List of values WHERE id IN (SELECT...) Checking membership
NOT IN List to exclude WHERE id NOT IN (...) Finding non-members
EXISTS TRUE/FALSE WHERE EXISTS (...) Any match suffices
NOT EXISTS TRUE/FALSE WHERE NOT EXISTS (...) Ensuring absence
Scalar Single value WHERE x > (SELECT AVG...) Comparing to aggregate
Table Full result set FROM (SELECT...) AS t Derived tables

Key Rules

  1. Inside-Out Execution: Inner queries complete before outer queries start

  2. Return Types: Single value (scalar), multiple values (IN), or existence check (EXISTS)

  3. Independence: Basic subqueries run once, independently of the outer query

  4. NULL Handling: NULL in NOT IN can eliminate all results

  5. Performance: Often rewritten as JOINs by the optimizer


Common Patterns

Common Mistakes

⚠️ The NULL in NOT IN Death Trap

🚨 CRITICAL: NULL in NOT IN

If a NOT IN subquery returns ANY NULL value, the ENTIRE result becomes empty! This is the #1 most dangerous NULL trap in SQL.

-- WRONG: If subquery returns ANY NULL, entire result becomes empty!
WHERE user_id NOT IN (
    SELECT user_id FROM Listens       -- May contain NULL user_ids
)                                     -- NULL in NOT IN = empty result!

-- CORRECT: Filter out NULLs explicitly
WHERE user_id NOT IN (
    SELECT user_id FROM Listens 
    WHERE user_id IS NOT NULL         -- Prevent NULL trap
)

Why This Happens (Step by Step)

What You Think Happens:

-- Check: Is Pluto's ID (4) not in list?
4 NOT IN (1, 2, 3, NULL)
-- You expect: TRUE (Pluto not listed)

What Actually Happens:

-- SQL expands NOT IN to:
4 != 1 AND 4 != 2 AND 4 != 3 AND 4 != NULL

-- Evaluates to:
TRUE AND TRUE AND TRUE AND UNKNOWN

-- Final result:
UNKNOWN  -- Not TRUE, so filtered out!

Real Example with Our Data

-- Find songs never listened to
-- WRONG: Returns empty if any song_id could be NULL
SELECT title FROM Songs
WHERE song_id NOT IN (
    SELECT song_id FROM Listens  -- Has songs 1,2,6,7,8 + potential NULLs
)

-- CORRECT: Filter out NULLs explicitly
SELECT title FROM Songs  
WHERE song_id NOT IN (
    SELECT song_id FROM Listens 
    WHERE song_id IS NOT NULL  -- Critical filter!
)

Result Difference:

Safe Patterns to Use Instead

-- Option 1: Filter NULLs (shown above)
WHERE id NOT IN (SELECT id FROM TABLE WHERE id IS NOT NULL)

-- Option 2: Use NOT EXISTS (NULL-safe)
WHERE NOT EXISTS (
    SELECT 1 FROM Listens l 
    WHERE l.song_id = s.song_id
)

-- Option 3: Use LEFT JOIN with NULL check
FROM Songs s
LEFT JOIN Listens l ON s.song_id = l.song_id
WHERE l.song_id IS NULL

Remember: Always check if your NOT IN subquery can return NULL!

The Multiple Values Error

-- WRONG: Subquery returns multiple rows but = expects one
WHERE rating = (SELECT rating FROM Listens)       -- Error: Multiple rows!

-- CORRECT: Use IN for multiple values
WHERE rating IN (SELECT rating FROM Listens)      -- IN handles sets
-- OR: Use scalar aggregation to get single value
WHERE rating = (SELECT MAX(rating) FROM Listens)  -- MAX returns ONE value

Problem: = expects a single value, not a list