Query Equivalence
SQL is a language of options. Most problems can be solved in multiple ways. Let's put these approaches side-by-side and see what shakes out.
Important: Query Equivalence vs Output Equivalence
Query Equivalence
Same logic, different syntax
These queries are semantically identical—they'll produce the same result for ANY valid input data.
LEFT JOIN + WHERE col IS NULL
→ Always same result!
Output Equivalence
Same result, different logic
Queries produce the same result for THIS data, but not for OTHER data.
WHERE rating >= 4.6
→ Same until we get a new row when someone rates 4.55!
Example 1: Users Above Their Personal Average
Problem: Find users who have at least one rating above their personal average
Correlated Subquery
-
Intuitive for "per-user" logic
-
Can be slow (N+1 queries)
CTE + JOIN
-
Single pass through data
-
More verbose
Window Function
-
Modern SQL with Window Functions
-
Harder to debug
All produce: Mickey, Minnie, Daffy
Example 2: Users with NO High Ratings (> 4.5)
Problem: Find users who have no ratings greater than 4.5
NOT EXISTS
-
Clear intent
-
Handles NULLs correctly
LEFT JOIN + NULL
-
Often fastest
-
Less intuitive
NOT IN
- NULL handling tricky
All produce: Mickey, Pluto
Example 3: Top 3 Songs per User
Problem: Find each user's top 3 highest-rated songs
Window Function (Modern)
-
Standard approach for top-N per group
-
Handles ties consistently (ROW_NUMBER assigns unique ranks)
Correlated Subquery (Classical)
-
Works in older SQL versions
-
Complex tie handling
-
May return > 3 rows if ties
Example 4: Users Who Listen to Multiple Genres
Problem: Find users who have listened to songs from at least 2 different genres
GROUP BY + HAVING
-
Most straightforward
-
Single pass
-
Clear intent
CTE for Clarity
-
Easy to debug (can test CTE separately)
-
Reusable if genre_count needed elsewhere
-
More verbose for simple case
All produce: Mickey, Minnie, Daffy
When to Use Each Approach?
| Approach | Best For | Avoid When |
|---|---|---|
| Correlated Subquery | • Conceptual clarity • Row-specific logic • Small datasets |
• Large datasets • Complex aggregations |
| CTEs | • Multi-step logic • Reusable calculations • Debugging complex queries • Team readability |
• Simple one-off queries • Older SQL versions (pre-2005) |
| Window Functions | • Rankings (top-N per group) • Running totals • Peer comparisons • Modern SQL environments |
• Simple filters • When debugging needed • Older SQL versions |
| LEFT JOIN + NULL | • Finding non-matches • Performance critical • Complex exclusion logic |
• When readability important • Team unfamiliar with pattern |
| GROUP BY + HAVING | • Simple aggregations • Clear logic • Direct approach |
• Need intermediate results • Complex multi-step logic |
Remember: Readability of code beats cleverness. Query optimizers optimize for performance.