Subqueries: Basic Patterns

Concept. A subquery is a SELECT nested inside another query. With IN / NOT IN / EXISTS, the inner query computes a set and the outer query asks whether each row belongs to it.

Intuition. Subqueries are SQL's nesting dolls, queries within queries. Run the inner query first against Listens, "which song_ids were rated above 4.5?", to build a list. Then run the outer query against Songs and keep only the rows whose song_id appears in that list.

Example. Find users who listen to Taylor Swift (nest queries layer by layer, songs → listens → users).

Basic Subqueries: Queries Inside Queries Queries Inside Queries Get user names Find users who listened to these songs Find Taylor Swift songs {1, 2, 9} {1, 2, 3} Mickey, Minnie, Daffy Runs ONCE Same result for ALL outer rows Three Main Types IN / NOT IN Check if value exists in result list [ ] EXISTS Check if ANY rows match (early stop) ? Scalar Returns exactly one value for comparison 1

IN Subquery: Find Taylor Swift Listeners

Key Rule: Inside-Out Execution: Inner queries finish before outer queries begin

-- Find users who listen to Taylor Swift songs.
-- Reads inside-out: Step 1 first, Step 3 last.

-- Step 3 (LAST): pick names for the users found in Step 2.
SELECT name
FROM Users
WHERE user_id IN (
    -- Step 2: distinct users who listened to those songs.
    SELECT DISTINCT user_id
    FROM Listens
    WHERE song_id IN (
        -- Step 1 (FIRST): all Taylor Swift song IDs. Returns {1, 2, 9}.
        SELECT song_id
        FROM Songs
        WHERE artist = 'Taylor Swift'
    )
)

Step 1: Find Taylor Swift Songs

song_idNotes
1✓ Taylor Swift - Evermore
2✓ Taylor Swift - Willow
3✗ Ed Sheeran - Shape of You
4✗ Ed Sheeran - Photograph
5✗ Ed Sheeran - Shivers
6✗ Beatles - Yesterday
7✗ Beatles - Yellow Submarine
8✗ Beatles - Hey Jude
9✓ Taylor Swift - Bad Blood
10✗ DJ - DJ Mix

Result: {1, 2, 9}

Step 2: Find Users with song_ids 1, 2, 9

user_idsong_idNotes
11User 1 listened to Taylor Swift
12User 1 listened to Taylor Swift
22User 2 listened to Taylor Swift
31User 3 listened to Taylor Swift
32User 3 listened to Taylor Swift

Distinct users: {1, 2, 3}

Step 3: Get Names for {1, 2, 3}

nameNotes: user_id
Mickey1
Minnie2
Daffy3
  • Three layers of queries

  • Each feeds results outward

  • Set operations at each level


NOT IN Subquery: Non-Taylor Swift Fans

-- Find users who never listened to any Taylor Swift song.

-- Step 3 (LAST): pick names for users NOT in the Step 2 list.
SELECT name
FROM Users
WHERE user_id NOT IN (
    -- Step 2: distinct users who listened to those songs.
    SELECT DISTINCT user_id
    FROM Listens
    WHERE song_id IN (
        -- Step 1: all Taylor Swift song IDs.
        SELECT song_id
        FROM Songs
        WHERE artist = 'Taylor Swift'
    )
    -- CRITICAL: filter NULLs to avoid the NOT IN empty-result trap.
    AND user_id IS NOT NULL
)

Using the same inner queries, but inverting at the end:

Step 1 & 2: Same as above → Users {1, 2, 3} listened to Taylor Swift

Step 3: Get Names for Users NOT IN {1, 2, 3}

user_idnameNotes: Check
4Pluto✓ NOT IN {1,2,3} - never listened to Taylor Swift

Scalar Subquery: Above Average Ratings

-- Find listens with above-average ratings.

SELECT u.name, l.rating
FROM Users u
JOIN Listens l ON u.user_id = l.user_id
-- Compare each rating to a single value (scalar subquery).
WHERE l.rating > (
    -- Step 1: average rating across all Listens. Returns ONE value: 4.2.
    -- AVG ignores NULL values. Runs ONCE, cached, reused for every outer row.
    SELECT AVG(rating)
    FROM Listens
)

Step 1: Calculate Average

ratingNotes
4.5
4.2
3.9
4.7
4.6
3.9
2.9
4.9
NULL✗ Skip NULL

AVG = (4.5+4.2+3.9+4.7+4.6+3.9+2.9+4.9) / 8 = 4.2

Step 2: Filter Ratings > 4.2

nameratingNotes: CheckNotes: Result
Mickey4.5✓ TRUEInclude
Mickey4.2✗ FALSEExclude
Mickey3.9✗ FALSEExclude
Minnie4.7✓ TRUEInclude
Minnie4.6✓ TRUEInclude
Minnie3.9✗ FALSEExclude
Daffy2.9✗ FALSEExclude
Daffy4.9✓ TRUEInclude
DaffyNULL? UNKNOWNExclude

Output

name rating
Mickey 4.5
Minnie 4.7
Minnie 4.6
Daffy 4.9
  • Scalar subquery returns a single value

  • Used in comparisons like any constant

  • NULL handling is critical for AVG


EXISTS: Early Stop Optimization

Efficient: EXISTS with Early Stopping

-- Check if ANY high-value listeners exist.

SELECT 'High-value listeners found!' AS result
-- EXISTS returns TRUE as soon as ANY row matches.
WHERE EXISTS (
    -- SELECT 1 is convention; the value is ignored.
    SELECT 1
    FROM Listens
    -- Stops at the FIRST row where rating > 4.5 (efficient).
    WHERE rating > 4.5
)

Execution Trace:

StepratingCheckAction
Row 14.54.5 > 4.5? ✗Continue
Row 24.24.2 > 4.5? ✗Continue
Row 33.93.9 > 4.5? ✗Continue
Row 44.74.7 > 4.5? ✓STOP!
Row 54.6-Skipped
Row 63.9-Skipped
Row 72.9-Skipped
Row 84.9-Skipped
Row 9NULL-Skipped

Rows checked: 4 of 9

✓ Stops at first match
✓ Returns TRUE immediately
✓ Saves 5 row checks

Inefficient: COUNT(*) Comparison

-- INEFFICIENT: must scan ALL rows before answering.

SELECT 'High-value listeners found!' AS result
WHERE (
    -- COUNT(*) tallies ALL matching rows (no early stop).
    SELECT COUNT(*)
    FROM Listens
    -- Scans the entire table.
    WHERE rating > 4.5
) > 0  -- Compare the final count to 0.

Execution Trace:

StepratingCheckAction
Row 14.54.5 > 4.5? ✗Count: 0
Row 24.24.2 > 4.5? ✗Count: 0
Row 33.93.9 > 4.5? ✗Count: 0
Row 44.74.7 > 4.5? ✓Count: 1
Row 54.64.6 > 4.5? ✓Count: 2
Row 63.93.9 > 4.5? ✗Count: 2
Row 72.92.9 > 4.5? ✗Count: 2
Row 84.94.9 > 4.5? ✓Count: 3
Row 9NULLNULL > 4.5? ✗Count: 3

Rows checked: 9 of 9

✗ Must check every row
✗ Builds complete count
✗ Wastes resources

Performance Comparison:

  • EXISTS: 4 rows checked, stops early

  • COUNT(*): 9 rows checked, no early stop

  • Savings: 56% fewer rows to examine

  • Best for: Presence checks, not quantity


Subquery Types Comparison

Type Returns Example Use When
IN List of values WHERE id IN (SELECT...) Checking membership
NOT IN List to exclude WHERE id NOT IN (...) Finding non-members
EXISTS TRUE/FALSE WHERE EXISTS (...) Any match suffices
NOT EXISTS TRUE/FALSE WHERE NOT EXISTS (...) Ensuring absence
Scalar Single value WHERE x > (SELECT AVG...) Comparing to aggregate
Table Full result set FROM (SELECT...) AS t Derived tables

Key Rules

  1. Inside-Out Execution: Inner queries complete before outer queries start

  2. Return Types: Single value (scalar), multiple values (IN), or existence check (EXISTS)

  3. Independence: Basic subqueries run once, independently of the outer query

  4. NULL Handling: NULL in NOT IN can eliminate all results

  5. Performance: Often rewritten as JOINs by the optimizer


Common Patterns

  • IN/NOT IN: Check if a value exists in a list

  • EXISTS/NOT EXISTS: Check if any rows match a condition

  • Scalar Comparison: Compare against a single calculated value

  • Nested Levels: Subqueries within subqueries

Common Mistakes

⚠️ The NULL in NOT IN Death Trap

🚨 CRITICAL: NULL in NOT IN

If a NOT IN subquery returns ANY NULL value, the ENTIRE result becomes empty! This is the #1 most dangerous NULL trap in SQL.

-- WRONG: if the subquery returns ANY NULL, the entire result becomes empty.
-- NULL inside a NOT IN list silently returns 0 rows.
WHERE user_id NOT IN (
    -- May contain NULL user_ids.
    SELECT user_id FROM Listens
)

-- CORRECT: filter NULLs out of the subquery first.
WHERE user_id NOT IN (
    SELECT user_id FROM Listens
    WHERE user_id IS NOT NULL
)

Why This Happens (Step by Step)

What You Think Happens:

-- Check: Is Pluto's ID (4) not in list?
4 NOT IN (1, 2, 3, NULL)
-- You expect: TRUE (Pluto not listed)

What Actually Happens:

-- SQL expands NOT IN to:
4 != 1 AND 4 != 2 AND 4 != 3 AND 4 != NULL

-- Evaluates to:
TRUE AND TRUE AND TRUE AND UNKNOWN

-- Final result:
UNKNOWN  -- Not TRUE, so filtered out!

Real Example with Our Data

-- Find songs never listened to.

-- WRONG: returns empty if any song_id in Listens could be NULL.
SELECT title FROM Songs
WHERE song_id NOT IN (
    -- Has songs 1, 2, 6, 7, 8 plus potential NULLs.
    SELECT song_id FROM Listens
)

-- CORRECT: filter NULLs out of the subquery first.
SELECT title FROM Songs
WHERE song_id NOT IN (
    -- Critical filter to avoid the NOT IN NULL trap.
    SELECT song_id FROM Listens
    WHERE song_id IS NOT NULL
)

Result Difference:

  • Wrong query: 0 rows (silent failure!)

  • Correct query: 5 rows (songs 3,4,5,9,10)

Safe Patterns to Use Instead

-- Option 1: Filter NULLs (shown above)
WHERE id NOT IN (SELECT id FROM TABLE WHERE id IS NOT NULL)

-- Option 2: Use NOT EXISTS (NULL-safe)
WHERE NOT EXISTS (
    SELECT 1 FROM Listens l 
    WHERE l.song_id = s.song_id
)

-- Option 3: Use LEFT JOIN with NULL check
FROM Songs s
LEFT JOIN Listens l ON s.song_id = l.song_id
WHERE l.song_id IS NULL

Remember: Always check if your NOT IN subquery can return NULL!

The Multiple Values Error

-- WRONG: subquery returns multiple rows, but `=` expects one.
WHERE rating = (SELECT rating FROM Listens)

-- CORRECT: use IN for multiple values.
WHERE rating IN (SELECT rating FROM Listens)

-- OR: use scalar aggregation to get a single value.
WHERE rating = (SELECT MAX(rating) FROM Listens)

Problem: = expects a single value, not a list