Subqueries: Basic Patterns
Concept. A subquery is a SELECT nested inside another query. With IN / NOT IN / EXISTS, the inner query computes a set and the outer query asks whether each row belongs to it.
Intuition. Subqueries are SQL's nesting dolls, queries within queries. Run the inner query first against Listens, "which song_ids were rated above 4.5?", to build a list. Then run the outer query against Songs and keep only the rows whose song_id appears in that list.
Example. Find users who listen to Taylor Swift (nest queries layer by layer, songs → listens → users).
IN Subquery: Find Taylor Swift Listeners
Key Rule: Inside-Out Execution: Inner queries finish before outer queries begin
-- Find users who listen to Taylor Swift songs.
-- Reads inside-out: Step 1 first, Step 3 last.
-- Step 3 (LAST): pick names for the users found in Step 2.
SELECT name
FROM Users
WHERE user_id IN (
-- Step 2: distinct users who listened to those songs.
SELECT DISTINCT user_id
FROM Listens
WHERE song_id IN (
-- Step 1 (FIRST): all Taylor Swift song IDs. Returns {1, 2, 9}.
SELECT song_id
FROM Songs
WHERE artist = 'Taylor Swift'
)
)
Step 1: Find Taylor Swift Songs
| song_id | Notes |
|---|---|
| 1 | ✓ Taylor Swift - Evermore |
| 2 | ✓ Taylor Swift - Willow |
| 3 | ✗ Ed Sheeran - Shape of You |
| 4 | ✗ Ed Sheeran - Photograph |
| 5 | ✗ Ed Sheeran - Shivers |
| 6 | ✗ Beatles - Yesterday |
| 7 | ✗ Beatles - Yellow Submarine |
| 8 | ✗ Beatles - Hey Jude |
| 9 | ✓ Taylor Swift - Bad Blood |
| 10 | ✗ DJ - DJ Mix |
Result: {1, 2, 9}
Step 2: Find Users with song_ids 1, 2, 9
| user_id | song_id | Notes | |
|---|---|---|---|
| 1 | 1 | User 1 listened to Taylor Swift | |
| 1 | 2 | User 1 listened to Taylor Swift | |
| 2 | 2 | User 2 listened to Taylor Swift | |
| 3 | 1 | User 3 listened to Taylor Swift | |
| 3 | 2 | User 3 listened to Taylor Swift |
Distinct users: {1, 2, 3}
Step 3: Get Names for {1, 2, 3}
| name | Notes: user_id | |
|---|---|---|
| Mickey | 1 | |
| Minnie | 2 | |
| Daffy | 3 |
-
Three layers of queries
-
Each feeds results outward
-
Set operations at each level
NOT IN Subquery: Non-Taylor Swift Fans
-- Find users who never listened to any Taylor Swift song.
-- Step 3 (LAST): pick names for users NOT in the Step 2 list.
SELECT name
FROM Users
WHERE user_id NOT IN (
-- Step 2: distinct users who listened to those songs.
SELECT DISTINCT user_id
FROM Listens
WHERE song_id IN (
-- Step 1: all Taylor Swift song IDs.
SELECT song_id
FROM Songs
WHERE artist = 'Taylor Swift'
)
-- CRITICAL: filter NULLs to avoid the NOT IN empty-result trap.
AND user_id IS NOT NULL
)
Using the same inner queries, but inverting at the end:
Step 1 & 2: Same as above → Users {1, 2, 3} listened to Taylor Swift
Step 3: Get Names for Users NOT IN {1, 2, 3}
| user_id | name | Notes: Check | |
|---|---|---|---|
| 4 | Pluto | ✓ NOT IN {1,2,3} - never listened to Taylor Swift |
Scalar Subquery: Above Average Ratings
-- Find listens with above-average ratings.
SELECT u.name, l.rating
FROM Users u
JOIN Listens l ON u.user_id = l.user_id
-- Compare each rating to a single value (scalar subquery).
WHERE l.rating > (
-- Step 1: average rating across all Listens. Returns ONE value: 4.2.
-- AVG ignores NULL values. Runs ONCE, cached, reused for every outer row.
SELECT AVG(rating)
FROM Listens
)
Step 1: Calculate Average
| rating | Notes |
|---|---|
| 4.5 | ✓ |
| 4.2 | ✓ |
| 3.9 | ✓ |
| 4.7 | ✓ |
| 4.6 | ✓ |
| 3.9 | ✓ |
| 2.9 | ✓ |
| 4.9 | ✓ |
| NULL | ✗ Skip NULL |
AVG = (4.5+4.2+3.9+4.7+4.6+3.9+2.9+4.9) / 8 = 4.2
Step 2: Filter Ratings > 4.2
| name | rating | Notes: Check | Notes: Result | |
|---|---|---|---|---|
| Mickey | 4.5 | ✓ TRUE | Include | |
| Mickey | 4.2 | ✗ FALSE | Exclude | |
| Mickey | 3.9 | ✗ FALSE | Exclude | |
| Minnie | 4.7 | ✓ TRUE | Include | |
| Minnie | 4.6 | ✓ TRUE | Include | |
| Minnie | 3.9 | ✗ FALSE | Exclude | |
| Daffy | 2.9 | ✗ FALSE | Exclude | |
| Daffy | 4.9 | ✓ TRUE | Include | |
| Daffy | NULL | ? UNKNOWN | Exclude |
Output
| name | rating |
|---|---|
| Mickey | 4.5 |
| Minnie | 4.7 |
| Minnie | 4.6 |
| Daffy | 4.9 |
-
Scalar subquery returns a single value
-
Used in comparisons like any constant
-
NULL handling is critical for AVG
EXISTS: Early Stop Optimization
Efficient: EXISTS with Early Stopping
-- Check if ANY high-value listeners exist.
SELECT 'High-value listeners found!' AS result
-- EXISTS returns TRUE as soon as ANY row matches.
WHERE EXISTS (
-- SELECT 1 is convention; the value is ignored.
SELECT 1
FROM Listens
-- Stops at the FIRST row where rating > 4.5 (efficient).
WHERE rating > 4.5
)
Execution Trace:
| Step | rating | Check | Action |
|---|---|---|---|
| Row 1 | 4.5 | 4.5 > 4.5? ✗ | Continue |
| Row 2 | 4.2 | 4.2 > 4.5? ✗ | Continue |
| Row 3 | 3.9 | 3.9 > 4.5? ✗ | Continue |
| Row 4 | 4.7 | 4.7 > 4.5? ✓ | STOP! |
| Row 5 | 4.6 | - | Skipped |
| Row 6 | 3.9 | - | Skipped |
| Row 7 | 2.9 | - | Skipped |
| Row 8 | 4.9 | - | Skipped |
| Row 9 | NULL | - | Skipped |
Rows checked: 4 of 9
✓ Stops at first match
✓ Returns TRUE immediately
✓ Saves 5 row checks
Inefficient: COUNT(*) Comparison
-- INEFFICIENT: must scan ALL rows before answering.
SELECT 'High-value listeners found!' AS result
WHERE (
-- COUNT(*) tallies ALL matching rows (no early stop).
SELECT COUNT(*)
FROM Listens
-- Scans the entire table.
WHERE rating > 4.5
) > 0 -- Compare the final count to 0.
Execution Trace:
| Step | rating | Check | Action |
|---|---|---|---|
| Row 1 | 4.5 | 4.5 > 4.5? ✗ | Count: 0 |
| Row 2 | 4.2 | 4.2 > 4.5? ✗ | Count: 0 |
| Row 3 | 3.9 | 3.9 > 4.5? ✗ | Count: 0 |
| Row 4 | 4.7 | 4.7 > 4.5? ✓ | Count: 1 |
| Row 5 | 4.6 | 4.6 > 4.5? ✓ | Count: 2 |
| Row 6 | 3.9 | 3.9 > 4.5? ✗ | Count: 2 |
| Row 7 | 2.9 | 2.9 > 4.5? ✗ | Count: 2 |
| Row 8 | 4.9 | 4.9 > 4.5? ✓ | Count: 3 |
| Row 9 | NULL | NULL > 4.5? ✗ | Count: 3 |
Rows checked: 9 of 9
✗ Must check every row
✗ Builds complete count
✗ Wastes resources
Performance Comparison:
-
EXISTS: 4 rows checked, stops early
-
COUNT(*): 9 rows checked, no early stop
-
Savings: 56% fewer rows to examine
-
Best for: Presence checks, not quantity
Subquery Types Comparison
| Type | Returns | Example | Use When |
|---|---|---|---|
| IN | List of values | WHERE id IN (SELECT...) | Checking membership |
| NOT IN | List to exclude | WHERE id NOT IN (...) | Finding non-members |
| EXISTS | TRUE/FALSE | WHERE EXISTS (...) | Any match suffices |
| NOT EXISTS | TRUE/FALSE | WHERE NOT EXISTS (...) | Ensuring absence |
| Scalar | Single value | WHERE x > (SELECT AVG...) | Comparing to aggregate |
| Table | Full result set | FROM (SELECT...) AS t | Derived tables |
Key Rules
-
Inside-Out Execution: Inner queries complete before outer queries start
-
Return Types: Single value (scalar), multiple values (IN), or existence check (EXISTS)
-
Independence: Basic subqueries run once, independently of the outer query
-
NULL Handling: NULL in NOT IN can eliminate all results
-
Performance: Often rewritten as JOINs by the optimizer
Common Patterns
-
IN/NOT IN: Check if a value exists in a list
-
EXISTS/NOT EXISTS: Check if any rows match a condition
-
Scalar Comparison: Compare against a single calculated value
-
Nested Levels: Subqueries within subqueries
Common Mistakes
⚠️ The NULL in NOT IN Death Trap
🚨 CRITICAL: NULL in NOT IN
If a NOT IN subquery returns ANY NULL value, the ENTIRE result becomes empty! This is the #1 most dangerous NULL trap in SQL.
-- WRONG: if the subquery returns ANY NULL, the entire result becomes empty.
-- NULL inside a NOT IN list silently returns 0 rows.
WHERE user_id NOT IN (
-- May contain NULL user_ids.
SELECT user_id FROM Listens
)
-- CORRECT: filter NULLs out of the subquery first.
WHERE user_id NOT IN (
SELECT user_id FROM Listens
WHERE user_id IS NOT NULL
)
Why This Happens (Step by Step)
What You Think Happens:
-- Check: Is Pluto's ID (4) not in list?
4 NOT IN (1, 2, 3, NULL)
-- You expect: TRUE (Pluto not listed)
What Actually Happens:
-- SQL expands NOT IN to:
4 != 1 AND 4 != 2 AND 4 != 3 AND 4 != NULL
-- Evaluates to:
TRUE AND TRUE AND TRUE AND UNKNOWN
-- Final result:
UNKNOWN -- Not TRUE, so filtered out!
Real Example with Our Data
-- Find songs never listened to.
-- WRONG: returns empty if any song_id in Listens could be NULL.
SELECT title FROM Songs
WHERE song_id NOT IN (
-- Has songs 1, 2, 6, 7, 8 plus potential NULLs.
SELECT song_id FROM Listens
)
-- CORRECT: filter NULLs out of the subquery first.
SELECT title FROM Songs
WHERE song_id NOT IN (
-- Critical filter to avoid the NOT IN NULL trap.
SELECT song_id FROM Listens
WHERE song_id IS NOT NULL
)
Result Difference:
-
Wrong query: 0 rows (silent failure!)
-
Correct query: 5 rows (songs 3,4,5,9,10)
Safe Patterns to Use Instead
-- Option 1: Filter NULLs (shown above)
WHERE id NOT IN (SELECT id FROM TABLE WHERE id IS NOT NULL)
-- Option 2: Use NOT EXISTS (NULL-safe)
WHERE NOT EXISTS (
SELECT 1 FROM Listens l
WHERE l.song_id = s.song_id
)
-- Option 3: Use LEFT JOIN with NULL check
FROM Songs s
LEFT JOIN Listens l ON s.song_id = l.song_id
WHERE l.song_id IS NULL
Remember: Always check if your NOT IN subquery can return NULL!
The Multiple Values Error
-- WRONG: subquery returns multiple rows, but `=` expects one.
WHERE rating = (SELECT rating FROM Listens)
-- CORRECT: use IN for multiple values.
WHERE rating IN (SELECT rating FROM Listens)
-- OR: use scalar aggregation to get a single value.
WHERE rating = (SELECT MAX(rating) FROM Listens)
Problem: = expects a single value, not a list