SQL Problem Solving: Reading Complex Queries

Understand and break down complex SQL before you start writing your own.

Example 1: Finding Top Genres Per User

The Mental Model

Step 1

FROM + JOIN → Gather all data (9 listens × song info)

Input: All rows from joined tables

Step 2

WHERE → Filter rows (eliminate NULL genres)

Row-level filtering before any grouping

Step 3

GROUP BY → Form groups (user-genre combinations)

Collapse rows into groups

Step 4

HAVING → Filter groups (retain groups with 2+ songs)

Group-level filtering after aggregation

Step 5

SELECT → Choose columns & calculate

Decide what appears in output

Step 5a

ORDER BY → Sort results

Final presentation order

Remember: Read queries in execution order (FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY), not written order!

Example 2: Finding Power Users

📋 The Query

1WITH user_stats AS (
SELECT 
  user_id,
  COUNT(DISTINCT song_id) as unique_songs,
  COUNT(DISTINCT genre) as unique_genres,
  SUM(play_duration) as total_minutes
FROM listens l
JOIN songs s ON l.song_id = s.song_id
GROUP BY user_id
),
2power_users AS (
SELECT 
  user_id,
  unique_songs,
  unique_genres,
  total_minutes,
  NTILE(10) OVER (ORDER BY total_minutes DESC) as decile
FROM user_stats  -- Uses CTE from step 1
WHERE unique_genres >= 3
)
3SELECT 
u.name,
p.unique_songs,
p.unique_genres,
ROUND(p.total_minutes / 60.0, 1) as total_hours
FROM power_users p  -- Uses CTE from step 2
JOIN users u ON p.user_id = u.user_id
WHERE p.decile = 1  -- Top 10% only
ORDER BY p.total_minutes DESC;

How to Read This Query

Building Understanding Step by Step

  1. Start Here

Final SELECT

• SELECT name, stats
• FROM power_users
• WHERE decile = 1
• JOIN with users table

• Get user names and listening stats
• From something called power_users
• Only the top 10% (decile = 1)
• Join to get actual names

→ So we need to understand what makes a "power user"

  1. Then Read

power_users CTE

• NTILE(10) ranking
• WHERE genres >= 3
• FROM user_stats

• Split users into 10 groups by activity
• Keep only diverse listeners (3+ genres)
• Built from user_stats table

→ Power user = diverse taste + high activity

  1. Finally

user_stats CTE

• COUNT songs/genres
• SUM play_duration
• GROUP BY user_id
• JOIN listens + songs

• Count unique songs & genres per user
• Total up listening minutes
• One row per user
• Combine listens with song metadata

→ Foundation: raw data becomes user metrics

Putting It Together

By reading backwards, we discovered the query's purpose: Find the top 10% most active listeners who explore diverse music (3+ genres). Each CTE builds on the previous one, transforming raw listening data → user metrics → rankings → final results.

The Same Query with Reading Comments