SQL Problem Solving: Reading Complex Queries

Learn to understand and decompose complex SQL before writing your own

How to Read Simple Queries: Execution Order

SQL queries don't execute in the order they're written! Understanding the execution sequence is key to reading queries correctly.

Example: Finding Top Genres Per User

-- Goal: Find genres with 2+ songs per user, show only high-rated genres
-- Read in execution order (not written order): FROM, JOIN → WHERE → GROUP BY → HAVING → SELECT, ORDER BY

SELECT user_id, genre,                         -- Step 5: Choose output columns
       COUNT(*) AS song_count,
       AVG(rating) AS avg_rating
FROM Listens l                                 -- Step 1: Input tables
JOIN Songs s ON l.song_id = s.song_id          -- Step 1: Input tables: Join for genre info
WHERE genre IS NOT NULL                        -- Step 2: Filter rows (before grouping)
GROUP BY user_id, genre                        -- Step 3: Create groups (user-genre combos)
HAVING COUNT(*) >= 2                           -- Step 4: Filter groups (after grouping)
ORDER BY user_id, avg_rating DESC              -- Step 5a: Sort final output

The Mental Model

Step 1
FROM + JOIN → Get all data (9 listens × song info)
Input: All rows from joined tables
Step 2
WHERE → Filter rows (remove NULL genres)
Row-level filtering before any grouping
Step 3
GROUP BY → Create groups (user-genre combinations)
Collapse rows into groups
Step 4
HAVING → Filter groups (keep groups with 2+ songs)
Group-level filtering after aggregation
Step 5
SELECT → Choose columns & calculate
Decide what appears in output
Step 5a
ORDER BY → Sort results
Final presentation order
Remember: Read queries in execution order (FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY), not written order!

Breaking Down Complex Problems

Before diving into SQL, learn to decompose problems systematically

The 3-Step Framework

1. Understand What You Have

  • What tables exist?
  • How do they connect?
  • What are the key columns of each table?

2. Define What You Want

  • What columns in output?
  • One row per what?
  • What calculations needed?

3. Connect the Dots

  • Which JOINs needed?
  • What filters (WHERE)?
  • Any grouping (GROUP BY)?

Query Anatomy: How to Read Complex Queries

Master the art of reading queries by starting with the SQL itself

Example: Finding Power Users

📋 The Query

1WITH user_stats AS (
  SELECT 
    user_id,
    COUNT(DISTINCT song_id) as unique_songs,
    COUNT(DISTINCT genre) as unique_genres,
    SUM(play_duration) as total_minutes
  FROM listens l
  JOIN songs s ON l.song_id = s.song_id
  GROUP BY user_id
),
2power_users AS (
  SELECT 
    user_id,
    unique_songs,
    unique_genres,
    total_minutes,
    NTILE(10) OVER (ORDER BY total_minutes DESC) as decile
  FROM user_stats  -- Uses CTE from step 1
  WHERE unique_genres >= 3
)
3SELECT 
  u.name,
  p.unique_songs,
  p.unique_genres,
  ROUND(p.total_minutes / 60.0, 1) as total_hours
FROM power_users p  -- Uses CTE from step 2
JOIN users u ON p.user_id = u.user_id
WHERE p.decile = 1  -- Top 10% only
ORDER BY p.total_minutes DESC;

How to Read This Query

Building Understanding Step by Step

Where to Look
What You See
What It Means
1. Start Here
Final SELECT
• SELECT name, stats
• FROM power_users
• WHERE decile = 1
• JOIN with users table
• Get user names and listening stats
• From something called power_users
• Only the top 10% (decile = 1)
• Join to get actual names
→ So we need to understand what makes a "power user"
2. Then Read
power_users CTE
• NTILE(10) ranking
• WHERE genres >= 3
• FROM user_stats
• Split users into 10 groups by activity
• Keep only diverse listeners (3+ genres)
• Built from user_stats table
→ Power user = diverse taste + high activity
3. Finally
user_stats CTE
• COUNT songs/genres
• SUM play_duration
• GROUP BY user_id
• JOIN listens + songs
• Count unique songs & genres per user
• Total up listening minutes
• One row per user
• Combine listens with song metadata
→ Foundation: raw data becomes user metrics
Putting It Together

By reading backwards, we discovered the query's purpose: Find the top 10% most active listeners who explore diverse music (3+ genres). Each CTE builds on the previous one, transforming raw listening data → user metrics → rankings → final results.

The Same Query with Reading Comments

-- Goal: Find top 10% most active listeners who explore diverse music (3+ genres)
-- CTEs execute in order: user_stats → power_users → final SELECT

WITH user_stats AS (                           -- CTE 1: Calculate metrics per user
  SELECT 
    user_id,
    COUNT(DISTINCT song_id) AS unique_songs,   -- Variety of songs
    COUNT(DISTINCT genre) AS unique_genres,    -- Variety of genres  
    SUM(play_duration) AS total_minutes        -- Total listening time
  FROM listens l                               -- Step 1a: Input table
  JOIN songs s ON l.song_id = s.song_id        -- Step 1b: Join for genre info
  GROUP BY user_id                             -- Step 2: One row per user
),
power_users AS (                               -- CTE 2: Rank and filter users
  SELECT 
    user_id,
    unique_songs,
    unique_genres,
    total_minutes,
    NTILE(10) OVER (                           -- Split into 10 groups
      ORDER BY total_minutes DESC              -- Most active first
    ) AS decile
  FROM user_stats                              -- Use CTE 1 output
  WHERE unique_genres >= 3                     -- Filter: diverse listeners only
)
SELECT                                          -- Final query: Get top users
  u.name,                                       -- Step 5: Choose output columns
  p.unique_songs,
  p.unique_genres,
  ROUND(p.total_minutes / 60.0, 1) AS total_hours
FROM power_users p                             -- Step 3: From CTE 2
JOIN users u ON p.user_id = u.user_id          -- Step 4: Join for names
WHERE p.decile = 1                             -- Step 4b: Top 10% only
ORDER BY p.total_minutes DESC;                 -- Step 6: Sort by activity