SQL Problem Solving: Reading Complex Queries
Learn to understand and decompose complex SQL before writing your own
How to Read Simple Queries: Execution Order
SQL queries don't execute in the order they're written! Understanding the execution sequence is key to reading queries correctly.
Example: Finding Top Genres Per User
-- Goal: Find genres with 2+ songs per user, show only high-rated genres
-- Read in execution order (not written order): FROM, JOIN → WHERE → GROUP BY → HAVING → SELECT, ORDER BY
SELECT user_id, genre, -- Step 5: Choose output columns
COUNT(*) AS song_count,
AVG(rating) AS avg_rating
FROM Listens l -- Step 1: Input tables
JOIN Songs s ON l.song_id = s.song_id -- Step 1: Input tables: Join for genre info
WHERE genre IS NOT NULL -- Step 2: Filter rows (before grouping)
GROUP BY user_id, genre -- Step 3: Create groups (user-genre combos)
HAVING COUNT(*) >= 2 -- Step 4: Filter groups (after grouping)
ORDER BY user_id, avg_rating DESC -- Step 5a: Sort final output
The Mental Model
Step 1
FROM + JOIN → Get all data (9 listens × song info)
Input: All rows from joined tables
Step 2
WHERE → Filter rows (remove NULL genres)
Row-level filtering before any grouping
Step 3
GROUP BY → Create groups (user-genre combinations)
Collapse rows into groups
Step 4
HAVING → Filter groups (keep groups with 2+ songs)
Group-level filtering after aggregation
Step 5
SELECT → Choose columns & calculate
Decide what appears in output
Step 5a
ORDER BY → Sort results
Final presentation order
Remember:
Read queries in execution order (FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY), not written order!
Breaking Down Complex Problems
Before diving into SQL, learn to decompose problems systematically
The 3-Step Framework
1. Understand What You Have
- What tables exist?
- How do they connect?
- What are the key columns of each table?
2. Define What You Want
- What columns in output?
- One row per what?
- What calculations needed?
3. Connect the Dots
- Which JOINs needed?
- What filters (WHERE)?
- Any grouping (GROUP BY)?
Query Anatomy: How to Read Complex Queries
Master the art of reading queries by starting with the SQL itself
Example: Finding Power Users
📋 The Query
1WITH user_stats AS ( SELECT user_id, COUNT(DISTINCT song_id) as unique_songs, COUNT(DISTINCT genre) as unique_genres, SUM(play_duration) as total_minutes FROM listens l JOIN songs s ON l.song_id = s.song_id GROUP BY user_id ), 2power_users AS ( SELECT user_id, unique_songs, unique_genres, total_minutes, NTILE(10) OVER (ORDER BY total_minutes DESC) as decile FROM user_stats -- Uses CTE from step 1 WHERE unique_genres >= 3 ) 3SELECT u.name, p.unique_songs, p.unique_genres, ROUND(p.total_minutes / 60.0, 1) as total_hours FROM power_users p -- Uses CTE from step 2 JOIN users u ON p.user_id = u.user_id WHERE p.decile = 1 -- Top 10% only ORDER BY p.total_minutes DESC;
How to Read This Query
Building Understanding Step by Step
Where to Look
What You See
What It Means
1. Start Here
Final SELECT
• SELECT name, stats
• FROM power_users
• WHERE decile = 1
• JOIN with users table
• FROM power_users
• WHERE decile = 1
• JOIN with users table
• Get user names and listening stats
• From something called power_users
• Only the top 10% (decile = 1)
• Join to get actual names
• From something called power_users
• Only the top 10% (decile = 1)
• Join to get actual names
→ So we need to understand what makes a "power user"
2. Then Read
power_users CTE
• NTILE(10) ranking
• WHERE genres >= 3
• FROM user_stats
• WHERE genres >= 3
• FROM user_stats
• Split users into 10 groups by activity
• Keep only diverse listeners (3+ genres)
• Built from user_stats table
• Keep only diverse listeners (3+ genres)
• Built from user_stats table
→ Power user = diverse taste + high activity
3. Finally
user_stats CTE
• COUNT songs/genres
• SUM play_duration
• GROUP BY user_id
• JOIN listens + songs
• SUM play_duration
• GROUP BY user_id
• JOIN listens + songs
• Count unique songs & genres per user
• Total up listening minutes
• One row per user
• Combine listens with song metadata
• Total up listening minutes
• One row per user
• Combine listens with song metadata
→ Foundation: raw data becomes user metrics
Putting It Together
By reading backwards, we discovered the query's purpose: Find the top 10% most active listeners who explore diverse music (3+ genres). Each CTE builds on the previous one, transforming raw listening data → user metrics → rankings → final results.
The Same Query with Reading Comments
-- Goal: Find top 10% most active listeners who explore diverse music (3+ genres)
-- CTEs execute in order: user_stats → power_users → final SELECT
WITH user_stats AS ( -- CTE 1: Calculate metrics per user
SELECT
user_id,
COUNT(DISTINCT song_id) AS unique_songs, -- Variety of songs
COUNT(DISTINCT genre) AS unique_genres, -- Variety of genres
SUM(play_duration) AS total_minutes -- Total listening time
FROM listens l -- Step 1a: Input table
JOIN songs s ON l.song_id = s.song_id -- Step 1b: Join for genre info
GROUP BY user_id -- Step 2: One row per user
),
power_users AS ( -- CTE 2: Rank and filter users
SELECT
user_id,
unique_songs,
unique_genres,
total_minutes,
NTILE(10) OVER ( -- Split into 10 groups
ORDER BY total_minutes DESC -- Most active first
) AS decile
FROM user_stats -- Use CTE 1 output
WHERE unique_genres >= 3 -- Filter: diverse listeners only
)
SELECT -- Final query: Get top users
u.name, -- Step 5: Choose output columns
p.unique_songs,
p.unique_genres,
ROUND(p.total_minutes / 60.0, 1) AS total_hours
FROM power_users p -- Step 3: From CTE 2
JOIN users u ON p.user_id = u.user_id -- Step 4: Join for names
WHERE p.decile = 1 -- Step 4b: Top 10% only
ORDER BY p.total_minutes DESC; -- Step 6: Sort by activity