SQL Problem Solving: Writing Queries

Master These 5 Patterns

These 5 SQL structures solve 80% of real-world data problems. We will use the Spotify Schema (songs, artists, listens) to demonstrate how these patterns work in production before you apply them to your datasets.

  1. The Funnel: Find drop-offs (e.g., listened but didn't share).
  2. The Ladder: Rank items within groups (e.g., top songs per genre).
  3. The Timeline: Track cumulative behavior (e.g., running total of listens).
  4. The Comparison: Compare individual vs. group metrics (e.g., artist vs. genre average).
  5. The Exception: Spot outliers (e.g., viral surges in daily listening).

Pattern 1: The Funnel

Identifying user progression through stages—or the lack thereof.

Played Created Shared 1K 450 87

When to use:

  • Users who did A but never B
  • Conversion funnel analysis
  • Multi-step user journeys
  • Drop-off analysis

SQL:

-- Users: played but never shared
SELECT user_id, name
FROM users
WHERE user_id IN (
  SELECT user_id FROM listens  -- Step 1: Who listened
)
AND user_id NOT IN (
  SELECT user_id FROM shares   -- Step 2: Exclude sharers
                         -- Experiment with NULL in NOT IN death traps
                         -- For our Spotify DB, user_id is not NULL. 
                         -- But in other schemas, it may be  
)

Pattern 2: The Ladder

Ranking items within groups.

Top N per Group 1 2 3

When to use:

  • Top 3 items per category
  • Rank within departments
  • Find nth best performer
  • Percentile rankings

SQL:

-- Top 3 songs per genre (conceptual: songs table)
SELECT * FROM (
  SELECT song, genre, plays,
    ROW_NUMBER() OVER (
PARTITION BY genre  -- Restart numbering at 1 for each genre
ORDER BY plays DESC -- Highest plays gets #1
    ) AS genre_position   -- Creates: Pop #1, Pop #2..., Rock #1, Rock #2...
  FROM songs
) ranked  -- This subquery creates a table with genre_position as a column
WHERE genre_position <= 3  -- Can use alias here: it's a column from subquery

Pattern 3: The Timeline

Cumulative and moving calculations.

Running Total Time →

When to use:

  • Running total over time
  • Moving averages
  • Month-to-date calculations
  • Cumulative growth metrics

SQL:

-- Running total of listening hours (conceptual: daily_listening table)
SELECT date,
  hours,                -- Today's hours (e.g., 3)
  SUM(hours) OVER (
    ORDER BY date      
    -- Add extra logic here as needed:
-- PARTITION BY user_id  -- Separate running total per user
-- Default frame: UNBOUNDED PRECEDING to CURRENT ROW  
-- For 7-day rolling avg: ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS cumulative_hours -- Day 1: 3, Day 2: 3+5=8, Day 3: 8+2=10...
FROM daily_listening

Pattern 4: The Comparison

Individual vs Group Performance.

You vs Average You Avg

When to use:

  • Compare to department average
  • Percentile within group
  • Above/below benchmark
  • Peer comparisons

SQL:

-- Compare an artist's plays vs their genre average
SELECT artist_name, genre, plays,
  plays - AVG(plays) OVER (
    PARTITION BY genre   -- Calculate avg for each genre separately
  ) AS vs_avg          -- Positive = above genre avg, Negative = below
                 -- Same AVG repeats for all artists in the genre
FROM artist_stats

Pattern 5: The Exception Finder

Spotting outliers and anomalies.

Outlier Detection -80% +200%

When to use:

  • Sudden drops in activity
  • Unusual spikes
  • Behavior changes
  • Anomaly detection

SQL:

-- Detect >50% spikes in daily listens
WITH changes AS (
  SELECT song_id, date, listens,
    LAG(listens) OVER (PARTITION BY song_id ORDER BY date) AS prev_listens  -- Step 1: Yesterday's listening
  FROM daily_listens
),
with_pct AS (  -- Step 2: Calculate % using prev_listens (can't use alias in same SELECT)
  SELECT *,
    ((listens - prev_listens) * 100.0 / 
     NULLIF(prev_listens, 0)) AS pct_change  -- (new - old) / old × 100
  FROM changes
)
SELECT * FROM with_pct 
WHERE pct_change > 50  -- Filter for viral surges

Example 1:

Problem: "Identify breakout artists with over 50% month-over-month growth in the last 3 months."

Patterns Used: Timeline, Comparison

Step 1: Count Monthly Plays

Why: Establish a baseline with raw numbers.
What: Count plays for each artist per month.

WITH monthly_plays AS (  -- CTE: named result set
  SELECT 
    artist_id,                                -- Keep artist ID
    DATE_TRUNC('month', play_date) AS month, -- Round to month start
    COUNT(*) AS play_count                   -- Total plays
  FROM listens
  GROUP BY artist_id, month  -- One row per artist-month combo
)
-- Result preview:
-- artist_id | month    | play_count
-- 1234      | 2024-09  | 1000       -- Sept baseline
-- 1234      | 2024-10  | 1600       -- Oct growth  
-- 1234      | 2024-11  | 2500       -- Nov spike

Step 2: Calculate Growth Rate

Why: Determine growth trajectory.
What: Compare each month to the previous using LAG().

, artist_growth AS (      -- Chain another CTE
  SELECT 
    artist_id,            -- Pass through
    month,                -- Pass through  
    play_count,           -- Current month
    LAG(play_count, 1) OVER (         -- Look back 1 row
PARTITION BY artist_id          -- Reset for each artist
ORDER BY month                  -- Time sequence
    ) AS prev_month_plays,            -- Previous month value
    -- Growth % = (new - old) / old × 100
    ROUND(                            -- Clean decimals
(play_count - LAG(play_count, 1) OVER (PARTITION BY artist_id ORDER BY month)) 
* 100.0 / NULLIF(               -- Percent formula
  LAG(play_count, 1) OVER (PARTITION BY artist_id ORDER BY month), 
  0                             -- Avoid divide by 0
),
1                               -- 1 decimal place
    ) AS growth_pct
  FROM monthly_plays      -- Use CTE from Step 1
)
-- Result preview:
-- artist_id | month    | play_count | prev_month | growth_pct
-- 1234      | 2024-09  | 1000       | NULL       | NULL      -- No prior
-- 1234      | 2024-10  | 1600       | 1000       | 60.0      -- +60%
-- 1234      | 2024-11  | 2500       | 1600       | 56.3      -- +56%

Step 3: Find Breakout Artists

Why: Isolate high-growth artists.
What: Apply a >50% growth filter along with a recency filter.

SELECT                    -- Final query uses both CTEs
  a.name AS artist_name,  -- Human-readable name
  g.month,                -- Keep month
  g.play_count,           -- Show absolute numbers
  g.growth_pct            -- Show growth %
FROM artist_growth g      -- CTE from Step 2
JOIN artists a            -- Get artist names
  ON g.artist_id = a.artist_id       -- Match IDs
WHERE g.growth_pct > 50              -- Filter: >50% growth
  AND g.month >= DATE_TRUNC('month', CURRENT_DATE) - INTERVAL '3 months'  -- Last 3 months
ORDER BY g.growth_pct DESC,          -- Highest growth first
   g.month DESC                 -- Then by recency

-- Final output:
-- artist_name | month    | play_count | growth_pct
-- The Weeknd  | 2024-11  | 2500       | 56.3      -- Breakout!
-- The Weeknd  | 2024-10  | 1600       | 60.0      -- Consistent
-- Dua Lipa    | 2024-11  | 3200       | 78.0      -- Viral