Query Equivalence
SQL is flexible - most problems have multiple solutions. Let's compare approaches side-by-side.
Important: Query Equivalence vs Output Equivalence
Query Equivalence
Same logic, different syntax
These queries are semantically identical - they'll produce the same result for ANY valid input data.
LEFT JOIN + WHERE col IS NULL
→ Always same result!
Output Equivalence
Same result, different logic
Queries produce the same result for THIS data, but not for OTHER data.
WHERE rating >= 4.6
→ Same until we get a new row when someone rates 4.55!
Example 1: Users Above Their Personal Average
Problem: Find users who have at least one rating above their personal average
Correlated Subquery
-- Each user's avg calculated per row
SELECT DISTINCT u.name
FROM Users u
JOIN Listens l ON u.user_id = l.user_id
WHERE l.rating > (
SELECT AVG(l2.rating)
FROM Listens l2
WHERE l2.user_id = u.user_id -- Correlate
)
- Intuitive for "per-user" logic
- Can be slow (N+1 queries)
CTE + JOIN
-- Pre-calculate all averages once
WITH UserAvgs AS (
SELECT user_id,
AVG(rating) AS avg_rating
FROM Listens
GROUP BY user_id
)
SELECT DISTINCT u.name
FROM Users u
JOIN Listens l ON u.user_id = l.user_id
JOIN UserAvgs ua ON u.user_id = ua.user_id
WHERE l.rating > ua.avg_rating
- Single pass through data
- More verbose
Window Function
-- Calculate avg alongside each row
SELECT DISTINCT name
FROM (
SELECT u.name, l.rating,
AVG(l.rating) OVER (
PARTITION BY u.user_id
) AS user_avg
FROM Users u
JOIN Listens l ON u.user_id = l.user_id
) t
WHERE rating > user_avg
- Modern SQL with Window Functions
- Harder to debug
All produce: Mickey, Minnie, Daffy
Example 2: Users with NO High Ratings (> 4.5)
Problem: Find users who have no ratings greater than 4.5
NOT EXISTS
-- Check non-existence directly
SELECT u.name
FROM Users u
WHERE NOT EXISTS (
SELECT 1
FROM Listens l
WHERE l.user_id = u.user_id
AND l.rating > 4.5
)
- Clear intent
- Handles NULLs correctly
LEFT JOIN + NULL
-- Join and check for no matches
SELECT u.name
FROM Users u
LEFT JOIN Listens l
ON u.user_id = l.user_id
AND l.rating > 4.5 -- Condition in ON
WHERE l.listen_id IS NULL -- No match found
- Often fastest
- Less intuitive
NOT IN
-- List users not in high-raters
SELECT name
FROM Users
WHERE user_id NOT IN (
SELECT user_id
FROM Listens
WHERE rating > 4.5
AND user_id IS NOT NULL -- Critical!
)
- NULL handling tricky
All produce: Mickey, Pluto
Example 3: Top 3 Songs per User
Problem: Find each user's top 3 highest-rated songs
Window Function (Modern)
-- Rank within each user's songs
WITH RankedListens AS (
SELECT user_id, song_id, rating,
ROW_NUMBER() OVER (
PARTITION BY user_id -- Per user
ORDER BY rating DESC -- Highest first
) AS rn
FROM Listens
WHERE rating IS NOT NULL
)
SELECT u.name, rl.song_id, rl.rating
FROM Users u
JOIN RankedListens rl ON u.user_id = rl.user_id
WHERE rn <= 3 -- Top 3 only
ORDER BY u.name, rl.rating DESC
- Standard approach for top-N per group
- Handles ties consistently (ROW_NUMBER assigns unique ranks)
Correlated Subquery (Classical)
-- Count better songs for same user
SELECT u.name, l.song_id, l.rating
FROM Users u
JOIN Listens l ON u.user_id = l.user_id
WHERE l.rating IS NOT NULL
AND (
SELECT COUNT(*)
FROM Listens l2
WHERE l2.user_id = l.user_id
AND l2.rating > l.rating -- Count higher ratings
) < 3 -- Less than 3 songs rated higher
ORDER BY u.name, l.rating DESC
- Works in older SQL versions
- Complex tie handling
- May return > 3 rows if ties
Example 4: Users Who Listen to Multiple Genres
Problem: Find users who have listened to songs from at least 2 different genres
GROUP BY + HAVING
-- Direct aggregation approach
SELECT u.name
FROM Users u
JOIN Listens l ON u.user_id = l.user_id
JOIN Songs s ON l.song_id = s.song_id
GROUP BY u.user_id, u.name
HAVING COUNT(DISTINCT s.genre) >= 2
- Most straightforward
- Single pass
- Clear intent
CTE for Clarity
-- Break into logical steps
WITH UserGenreCounts AS (
-- Step 1: Count genres per user
SELECT l.user_id,
COUNT(DISTINCT s.genre) AS genre_count
FROM Listens l
JOIN Songs s ON l.song_id = s.song_id
GROUP BY l.user_id
)
-- Step 2: Filter and get names
SELECT u.name
FROM Users u
JOIN UserGenreCounts ugc ON u.user_id = ugc.user_id
WHERE ugc.genre_count >= 2
- Easy to debug (can test CTE separately)
- Reusable if genre_count needed elsewhere
- More verbose for simple case
All produce: Mickey, Minnie, Daffy
When to Use Each Approach?
| Approach | Best For | Avoid When |
|---|---|---|
| Correlated Subquery | • Conceptual clarity • Row-specific logic • Small datasets |
• Large datasets • Complex aggregations |
| CTEs | • Multi-step logic • Reusable calculations • Debugging complex queries • Team readability |
• Simple one-off queries • Older SQL versions (pre-2005) |
| Window Functions | • Rankings (top-N per group) • Running totals • Peer comparisons • Modern SQL environments |
• Simple filters • When debugging needed • Older SQL versions |
| LEFT JOIN + NULL | • Finding non-matches • Performance critical • Complex exclusion logic |
• When readability important • Team unfamiliar with pattern |
| GROUP BY + HAVING | • Simple aggregations • Clear logic • Direct approach |
• Need intermediate results • Complex multi-step logic |
Remember: Readability of code beats cleverness. Query optimizers optimize for performance.