Subqueries: Basic Patterns
Core Concept
- Example: Find users who listen to Taylor Swift (nest queries layer by layer - songs → listens → users)
Subqueries are SQL's nesting dolls - queries inside queries. They solve problems in layers. It's how SQL breaks complex questions into simple steps.
Key Rules
-
Inside-Out Execution: Inner queries complete before outer queries start
-
Return Types: Single value (scalar), multiple values (IN), or existence check (EXISTS)
-
Independence: Basic subqueries run once, independently of outer query
-
NULL Handling: NULL in NOT IN can eliminate all results
-
Performance: Often rewritten as JOINs by the optimizer
IN Subquery: Find Taylor Swift Listeners
-- Find users who listen to Taylor Swift songs
SELECT name -- Last: Select columns to output
FROM Users
WHERE user_id IN ( -- IN: Check if user_id in the list
-- Step 2: Get distinct users who listened to those songs
SELECT DISTINCT user_id -- DISTINCT: Each user once only
FROM Listens
WHERE song_id IN ( -- IN: Check if song_id in the list
-- Step 1: Find all Taylor Swift song IDs (executes FIRST)
SELECT song_id -- Returns set: {1, 2, 9}
FROM Songs
WHERE artist = 'Taylor Swift'
)
)
Step 1: Find Taylor Swift Songs
song_id | Notes |
---|---|
1 | ✓ Taylor Swift - Evermore |
2 | ✓ Taylor Swift - Willow |
3 | ✗ Ed Sheeran - Shape of You |
4 | ✗ Ed Sheeran - Photograph |
5 | ✗ Ed Sheeran - Shivers |
6 | ✗ Beatles - Yesterday |
7 | ✗ Beatles - Yellow Submarine |
8 | ✗ Beatles - Hey Jude |
9 | ✓ Taylor Swift - Bad Blood |
10 | ✗ DJ - DJ Mix |
Result: {1, 2, 9}
Step 2: Find Users with song_ids 1, 2, 9
user_id | song_id | Notes | |
---|---|---|---|
1 | 1 | User 1 listened to Taylor Swift | |
1 | 2 | User 1 listened to Taylor Swift | |
2 | 2 | User 2 listened to Taylor Swift | |
3 | 1 | User 3 listened to Taylor Swift | |
3 | 2 | User 3 listened to Taylor Swift |
Distinct users: {1, 2, 3}
Step 3: Get Names for {1, 2, 3}
name | Notes: user_id | |
---|---|---|
Mickey | 1 | |
Minnie | 2 | |
Daffy | 3 |
-
Three layers of queries
-
Each feeds results outward
-
Set operations at each level
NOT IN Subquery: Non-Taylor Swift Fans
-- Find users who never listened to any Taylor Swift song
SELECT name
FROM Users
WHERE user_id NOT IN ( -- NOT IN: Exclude these users
-- Step 2: Get distinct users who listened to those songs
SELECT DISTINCT user_id
FROM Listens
WHERE song_id IN (
-- Step 1: Find all Taylor Swift song IDs
SELECT song_id
FROM Songs
WHERE artist = 'Taylor Swift'
)
AND user_id IS NOT NULL -- CRITICAL: Filter NULLs to avoid empty result!
)
Using same inner queries, but inverting at the end:
Step 1 & 2: Same as above → Users {1, 2, 3} listened to Taylor Swift
Step 3: Get Names for Users NOT IN {1, 2, 3}
user_id | name | Notes: Check | |
---|---|---|---|
4 | Pluto | ✓ NOT IN {1,2,3} - never listened to Taylor Swift |
Scalar Subquery: Above Average Ratings
-- Find listens with above-average ratings
SELECT u.name, l.rating -- Last: Select columns to output
FROM Users u
JOIN Listens l ON u.user_id = l.user_id
WHERE l.rating > ( -- Compare to single value (scalar subquery)
-- Step 1: Calculate average rating (returns ONE value: 4.2)
SELECT AVG(rating) -- AVG automatically ignores NULL values
FROM Listens -- Executes ONCE, cached, reused for all rows
)
Step 1: Calculate Average
rating | Notes |
---|---|
4.5 | ✓ |
4.2 | ✓ |
3.9 | ✓ |
4.7 | ✓ |
4.6 | ✓ |
3.9 | ✓ |
2.9 | ✓ |
4.9 | ✓ |
NULL | ✗ Skip NULL |
AVG = (4.5+4.2+3.9+4.7+4.6+3.9+2.9+4.9) / 8 = 4.2
Step 2: Filter Ratings > 4.2
name | rating | Notes: Check | Notes: Result | |
---|---|---|---|---|
Mickey | 4.5 | ✓ TRUE | Include | |
Mickey | 4.2 | ✗ FALSE | Exclude | |
Mickey | 3.9 | ✗ FALSE | Exclude | |
Minnie | 4.7 | ✓ TRUE | Include | |
Minnie | 4.6 | ✓ TRUE | Include | |
Minnie | 3.9 | ✗ FALSE | Exclude | |
Daffy | 2.9 | ✗ FALSE | Exclude | |
Daffy | 4.9 | ✓ TRUE | Include | |
Daffy | NULL | ? UNKNOWN | Exclude |
Output
name | rating |
---|---|
Mickey | 4.5 |
Minnie | 4.7 |
Minnie | 4.6 |
Daffy | 4.9 |
-
Scalar subquery returns single value
-
Used in comparisons like any constant
-
NULL handling critical for AVG
EXISTS: Early Stop Optimization
Efficient: EXISTS with Early Stopping
-- Check if ANY high-value listeners exist
SELECT 'High-value listeners found!' AS result
WHERE EXISTS ( -- EXISTS: TRUE if ANY row found
SELECT 1 -- SELECT 1 is convention (value ignored)
FROM Listens
WHERE rating > 4.5 -- Stops at FIRST match (efficient!)
)
Execution Trace:
Step | rating | Check | Action |
---|---|---|---|
Row 1 | 4.5 | 4.5 > 4.5? ✗ | Continue |
Row 2 | 4.2 | 4.2 > 4.5? ✗ | Continue |
Row 3 | 3.9 | 3.9 > 4.5? ✗ | Continue |
Row 4 | 4.7 | 4.7 > 4.5? ✓ | STOP! |
Row 5 | 4.6 | - | Skipped |
Row 6 | 3.9 | - | Skipped |
Row 7 | 2.9 | - | Skipped |
Row 8 | 4.9 | - | Skipped |
Row 9 | NULL | - | Skipped |
Rows checked: 4 of 9
✓ Stops at first match
✓ Returns TRUE immediately
✓ Saves 5 row checks
Inefficient: COUNT(*) Comparison
-- INEFFICIENT - must check ALL rows
SELECT 'High-value listeners found!' AS result
WHERE (
SELECT COUNT(*) -- Counts ALL matching rows (slow!)
FROM Listens
WHERE rating > 4.5 -- Must scan entire table
) > 0 -- Then compares count to 0
Execution Trace:
Step | rating | Check | Action |
---|---|---|---|
Row 1 | 4.5 | 4.5 > 4.5? ✗ | Count: 0 |
Row 2 | 4.2 | 4.2 > 4.5? ✗ | Count: 0 |
Row 3 | 3.9 | 3.9 > 4.5? ✗ | Count: 0 |
Row 4 | 4.7 | 4.7 > 4.5? ✓ | Count: 1 |
Row 5 | 4.6 | 4.6 > 4.5? ✓ | Count: 2 |
Row 6 | 3.9 | 3.9 > 4.5? ✗ | Count: 2 |
Row 7 | 2.9 | 2.9 > 4.5? ✗ | Count: 2 |
Row 8 | 4.9 | 4.9 > 4.5? ✓ | Count: 3 |
Row 9 | NULL | NULL > 4.5? ✗ | Count: 3 |
Rows checked: 9 of 9
✗ Must check every row
✗ Builds complete count
✗ Wastes resources
Performance Comparison:
-
EXISTS: 4 rows checked, stops early
-
COUNT(*): 9 rows checked, no early stop
-
Savings: 56% fewer rows to examine
-
Best for: Presence checks, not quantity
Subquery Types Comparison
Type | Returns | Example | Use When |
---|---|---|---|
IN | List of values | WHERE id IN (SELECT...) | Checking membership |
NOT IN | List to exclude | WHERE id NOT IN (...) | Finding non-members |
EXISTS | TRUE/FALSE | WHERE EXISTS (...) | Any match suffices |
NOT EXISTS | TRUE/FALSE | WHERE NOT EXISTS (...) | Ensuring absence |
Scalar | Single value | WHERE x > (SELECT AVG...) | Comparing to aggregate |
Table | Full result set | FROM (SELECT...) AS t | Derived tables |
Common Patterns
-
IN/NOT IN: Check if value exists in a list
-
EXISTS/NOT EXISTS: Check if any rows match condition
-
Scalar Comparison: Compare against single calculated value
-
Nested Levels: Subqueries within subqueries
Common Mistakes
⚠️ The NULL in NOT IN Death Trap
🚨 CRITICAL: NULL in NOT IN
If a NOT IN subquery returns ANY NULL value, the ENTIRE result becomes empty! This is the #1 most dangerous NULL trap in SQL.
-- WRONG: If subquery returns ANY NULL, entire result becomes empty!
WHERE user_id NOT IN (
SELECT user_id FROM Listens -- May contain NULL user_ids
) -- NULL in NOT IN = empty result!
-- CORRECT: Filter out NULLs explicitly
WHERE user_id NOT IN (
SELECT user_id FROM Listens
WHERE user_id IS NOT NULL -- Prevent NULL trap
)
Why This Happens (Step by Step)
What You Think Happens:
-- Check: Is Pluto's ID (4) not in list?
4 NOT IN (1, 2, 3, NULL)
-- You expect: TRUE (Pluto not listed)
What Actually Happens:
-- SQL expands NOT IN to:
4 != 1 AND 4 != 2 AND 4 != 3 AND 4 != NULL
-- Evaluates to:
TRUE AND TRUE AND TRUE AND UNKNOWN
-- Final result:
UNKNOWN -- Not TRUE, so filtered out!
Real Example with Our Data
-- Find songs never listened to
-- WRONG: Returns empty if any song_id could be NULL
SELECT title FROM Songs
WHERE song_id NOT IN (
SELECT song_id FROM Listens -- Has songs 1,2,6,7,8 + potential NULLs
)
-- CORRECT: Filter out NULLs explicitly
SELECT title FROM Songs
WHERE song_id NOT IN (
SELECT song_id FROM Listens
WHERE song_id IS NOT NULL -- Critical filter!
)
Result Difference:
-
Wrong query: 0 rows (silent failure!)
-
Correct query: 5 rows (songs 3,4,5,9,10)
Safe Patterns to Use Instead
-- Option 1: Filter NULLs (shown above)
WHERE id NOT IN (SELECT id FROM TABLE WHERE id IS NOT NULL)
-- Option 2: Use NOT EXISTS (NULL-safe)
WHERE NOT EXISTS (
SELECT 1 FROM Listens l
WHERE l.song_id = s.song_id
)
-- Option 3: Use LEFT JOIN with NULL check
FROM Songs s
LEFT JOIN Listens l ON s.song_id = l.song_id
WHERE l.song_id IS NULL
Remember: Always check if your NOT IN subquery can return NULL!
The Multiple Values Error
-- WRONG: Subquery returns multiple rows but = expects one
WHERE rating = (SELECT rating FROM Listens) -- Error: Multiple rows!
-- CORRECT: Use IN for multiple values
WHERE rating IN (SELECT rating FROM Listens) -- IN handles sets
-- OR: Use scalar aggregation to get single value
WHERE rating = (SELECT MAX(rating) FROM Listens) -- MAX returns ONE value
Problem: = expects single value, not a list
Next: Correlated Subqueries
What we've learned: Basic subqueries that run ONCE and return the same result for all rows.
The limitation: What if each row needs a DIFFERENT comparison?
-
"Find users above THE average" ✓ (basic subquery)
-
"Find users above THEIR average" ✗ (needs correlation!)
→ Continue to Correlated Subqueries for personalized row-by-row comparisons