SQL Problem Solving: Reading Complex Queries
Understand and break down complex SQL before you start writing your own.
Example 1: Finding Top Genres Per User
The Mental Model
Step 1
FROM + JOIN → Gather all data (9 listens × song info)
Input: All rows from joined tables
Step 2
WHERE → Filter rows (eliminate NULL genres)
Row-level filtering before any grouping
Step 3
GROUP BY → Form groups (user-genre combinations)
Collapse rows into groups
Step 4
HAVING → Filter groups (retain groups with 2+ songs)
Group-level filtering after aggregation
Step 5
SELECT → Choose columns & calculate
Decide what appears in output
Step 5a
ORDER BY → Sort results
Final presentation order
Remember:
Read queries in execution order (FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY), not written order!
Example 2: Finding Power Users
📋 The Query
1WITH user_stats AS ( SELECT user_id, COUNT(DISTINCT song_id) as unique_songs, COUNT(DISTINCT genre) as unique_genres, SUM(play_duration) as total_minutes FROM listens l JOIN songs s ON l.song_id = s.song_id GROUP BY user_id ), 2power_users AS ( SELECT user_id, unique_songs, unique_genres, total_minutes, NTILE(10) OVER (ORDER BY total_minutes DESC) as decile FROM user_stats -- Uses CTE from step 1 WHERE unique_genres >= 3 ) 3SELECT u.name, p.unique_songs, p.unique_genres, ROUND(p.total_minutes / 60.0, 1) as total_hours FROM power_users p -- Uses CTE from step 2 JOIN users u ON p.user_id = u.user_id WHERE p.decile = 1 -- Top 10% only ORDER BY p.total_minutes DESC;
How to Read This Query
Building Understanding Step by Step
1. Start Here
Final SELECT
• SELECT name, stats
• FROM power_users
• WHERE decile = 1
• JOIN with users table
• FROM power_users
• WHERE decile = 1
• JOIN with users table
• Get user names and listening stats
• From something called power_users
• Only the top 10% (decile = 1)
• Join to get actual names
• From something called power_users
• Only the top 10% (decile = 1)
• Join to get actual names
→ So we need to understand what makes a "power user"
2. Then Read
power_users CTE
• NTILE(10) ranking
• WHERE genres >= 3
• FROM user_stats
• WHERE genres >= 3
• FROM user_stats
• Split users into 10 groups by activity
• Keep only diverse listeners (3+ genres)
• Built from user_stats table
• Keep only diverse listeners (3+ genres)
• Built from user_stats table
→ Power user = diverse taste + high activity
3. Finally
user_stats CTE
• COUNT songs/genres
• SUM play_duration
• GROUP BY user_id
• JOIN listens + songs
• SUM play_duration
• GROUP BY user_id
• JOIN listens + songs
• Count unique songs & genres per user
• Total up listening minutes
• One row per user
• Combine listens with song metadata
• Total up listening minutes
• One row per user
• Combine listens with song metadata
→ Foundation: raw data becomes user metrics
Putting It Together
By reading backwards, we discovered the query's purpose: Find the top 10% most active listeners who explore diverse music (3+ genres). Each CTE builds on the previous one, transforming raw listening data → user metrics → rankings → final results.