Query Equivalence

SQL is a language of options. Most problems can be solved in multiple ways. Let's put these approaches side-by-side and see what shakes out.

Important: Query Equivalence vs Output Equivalence

Query Equivalence

Same logic, different syntax

These queries are semantically identical—they'll produce the same result for ANY valid input data.

      WHERE NOT EXISTS (...)  

      LEFT JOIN + WHERE col IS NULL

      → Always same result!

Output Equivalence

Same result, different logic

Queries produce the same result for THIS data, but not for OTHER data.

      WHERE rating > 4.5

      WHERE rating >= 4.6

      → Same until we get a new row when someone rates 4.55!

This page shows query equivalence: Different syntactic approaches that are semantically identical.

Example 1: Users Above Their Personal Average

Problem: Find users who have at least one rating above their personal average

Correlated Subquery

Intuitive for "per-user" logic
Can be slow (N+1 queries)

CTE + JOIN

Single pass through data
More verbose

Window Function

Modern SQL with Window Functions
Harder to debug

All produce: Mickey, Minnie, Daffy

Example 2: Users with NO High Ratings (> 4.5)

Problem: Find users who have no ratings greater than 4.5

NOT EXISTS

Clear intent
Handles NULLs correctly

LEFT JOIN + NULL

Often fastest
Less intuitive

NOT IN

NULL handling tricky

All produce: Mickey, Pluto

⚠️ Warning about NOT IN: If the subquery returns ANY NULL values, NOT IN returns no results! Always add "AND column IS NOT NULL" to the subquery. See the NULL in NOT IN Death Trap →

Example 3: Top 3 Songs per User

Problem: Find each user's top 3 highest-rated songs

Window Function (Modern)

Standard approach for top-N per group
Handles ties consistently (ROW_NUMBER assigns unique ranks)

Correlated Subquery (Classical)

Works in older SQL versions
Complex tie handling
May return > 3 rows if ties

Example 4: Users Who Listen to Multiple Genres

Problem: Find users who have listened to songs from at least 2 different genres

GROUP BY + HAVING

Most straightforward
Single pass
Clear intent

CTE for Clarity

Easy to debug (can test CTE separately)
Reusable if genre_count needed elsewhere
More verbose for simple case

All produce: Mickey, Minnie, Daffy

When to Use Each Approach?

Approach	Best For	Avoid When
Correlated Subquery	• Conceptual clarity • Row-specific logic • Small datasets	• Large datasets • Complex aggregations
CTEs	• Multi-step logic • Reusable calculations • Debugging complex queries • Team readability	• Simple one-off queries • Older SQL versions (pre-2005)
Window Functions	• Rankings (top-N per group) • Running totals • Peer comparisons • Modern SQL environments	• Simple filters • When debugging needed • Older SQL versions
LEFT JOIN + NULL	• Finding non-matches • Performance critical • Complex exclusion logic	• When readability important • Team unfamiliar with pattern
GROUP BY + HAVING	• Simple aggregations • Clear logic • Direct approach	• Need intermediate results • Complex multi-step logic

Remember: Readability of code beats cleverness. Query optimizers optimize for performance.