SQL Problem Solving: Writing Queries
Master These 5 Patterns
These 5 SQL structures solve 80% of real-world data problems. We will use the Spotify Schema (songs, artists, listens) to demonstrate how these patterns work in production before you apply them to your datasets.
- The Funnel: Find drop-offs (e.g., listened but didn't share).
- The Ladder: Rank items within groups (e.g., top songs per genre).
- The Timeline: Track cumulative behavior (e.g., running total of listens).
- The Comparison: Compare individual vs. group metrics (e.g., artist vs. genre average).
- The Exception: Spot outliers (e.g., viral surges in daily listening).
Pattern 1: The Funnel
Identifying user progression through stages—or the lack thereof.
When to use:
- Users who did A but never B
- Conversion funnel analysis
- Multi-step user journeys
- Drop-off analysis
SQL:
-- Users: played but never shared
SELECT user_id, name
FROM users
WHERE user_id IN (
SELECT user_id FROM listens -- Step 1: Who listened
)
AND user_id NOT IN (
SELECT user_id FROM shares -- Step 2: Exclude sharers
-- Experiment with NULL in NOT IN death traps
-- For our Spotify DB, user_id is not NULL.
-- But in other schemas, it may be
)
Pattern 2: The Ladder
Ranking items within groups.
When to use:
- Top 3 items per category
- Rank within departments
- Find nth best performer
- Percentile rankings
SQL:
-- Top 3 songs per genre (conceptual: songs table)
SELECT * FROM (
SELECT song, genre, plays,
ROW_NUMBER() OVER (
PARTITION BY genre -- Restart numbering at 1 for each genre
ORDER BY plays DESC -- Highest plays gets #1
) AS genre_position -- Creates: Pop #1, Pop #2..., Rock #1, Rock #2...
FROM songs
) ranked -- This subquery creates a table with genre_position as a column
WHERE genre_position <= 3 -- Can use alias here: it's a column from subquery
Pattern 3: The Timeline
Cumulative and moving calculations.
When to use:
- Running total over time
- Moving averages
- Month-to-date calculations
- Cumulative growth metrics
SQL:
-- Running total of listening hours (conceptual: daily_listening table)
SELECT date,
hours, -- Today's hours (e.g., 3)
SUM(hours) OVER (
ORDER BY date
-- Add extra logic here as needed:
-- PARTITION BY user_id -- Separate running total per user
-- Default frame: UNBOUNDED PRECEDING to CURRENT ROW
-- For 7-day rolling avg: ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS cumulative_hours -- Day 1: 3, Day 2: 3+5=8, Day 3: 8+2=10...
FROM daily_listening
Pattern 4: The Comparison
Individual vs Group Performance.
When to use:
- Compare to department average
- Percentile within group
- Above/below benchmark
- Peer comparisons
SQL:
-- Compare an artist's plays vs their genre average
SELECT artist_name, genre, plays,
plays - AVG(plays) OVER (
PARTITION BY genre -- Calculate avg for each genre separately
) AS vs_avg -- Positive = above genre avg, Negative = below
-- Same AVG repeats for all artists in the genre
FROM artist_stats
Pattern 5: The Exception Finder
Spotting outliers and anomalies.
When to use:
- Sudden drops in activity
- Unusual spikes
- Behavior changes
- Anomaly detection
SQL:
-- Detect >50% spikes in daily listens
WITH changes AS (
SELECT song_id, date, listens,
LAG(listens) OVER (PARTITION BY song_id ORDER BY date) AS prev_listens -- Step 1: Yesterday's listening
FROM daily_listens
),
with_pct AS ( -- Step 2: Calculate % using prev_listens (can't use alias in same SELECT)
SELECT *,
((listens - prev_listens) * 100.0 /
NULLIF(prev_listens, 0)) AS pct_change -- (new - old) / old × 100
FROM changes
)
SELECT * FROM with_pct
WHERE pct_change > 50 -- Filter for viral surges
Example 1:
Problem: "Identify breakout artists with over 50% month-over-month growth in the last 3 months."
Patterns Used: Timeline, Comparison
Step 1: Count Monthly Plays
Why: Establish a baseline with raw numbers.
What: Count plays for each artist per month.
WITH monthly_plays AS ( -- CTE: named result set
SELECT
artist_id, -- Keep artist ID
DATE_TRUNC('month', play_date) AS month, -- Round to month start
COUNT(*) AS play_count -- Total plays
FROM listens
GROUP BY artist_id, month -- One row per artist-month combo
)
-- Result preview:
-- artist_id | month | play_count
-- 1234 | 2024-09 | 1000 -- Sept baseline
-- 1234 | 2024-10 | 1600 -- Oct growth
-- 1234 | 2024-11 | 2500 -- Nov spike
Step 2: Calculate Growth Rate
Why: Determine growth trajectory.
What: Compare each month to the previous using LAG().
, artist_growth AS ( -- Chain another CTE
SELECT
artist_id, -- Pass through
month, -- Pass through
play_count, -- Current month
LAG(play_count, 1) OVER ( -- Look back 1 row
PARTITION BY artist_id -- Reset for each artist
ORDER BY month -- Time sequence
) AS prev_month_plays, -- Previous month value
-- Growth % = (new - old) / old × 100
ROUND( -- Clean decimals
(play_count - LAG(play_count, 1) OVER (PARTITION BY artist_id ORDER BY month))
* 100.0 / NULLIF( -- Percent formula
LAG(play_count, 1) OVER (PARTITION BY artist_id ORDER BY month),
0 -- Avoid divide by 0
),
1 -- 1 decimal place
) AS growth_pct
FROM monthly_plays -- Use CTE from Step 1
)
-- Result preview:
-- artist_id | month | play_count | prev_month | growth_pct
-- 1234 | 2024-09 | 1000 | NULL | NULL -- No prior
-- 1234 | 2024-10 | 1600 | 1000 | 60.0 -- +60%
-- 1234 | 2024-11 | 2500 | 1600 | 56.3 -- +56%
Step 3: Find Breakout Artists
Why: Isolate high-growth artists.
What: Apply a >50% growth filter along with a recency filter.
SELECT -- Final query uses both CTEs
a.name AS artist_name, -- Human-readable name
g.month, -- Keep month
g.play_count, -- Show absolute numbers
g.growth_pct -- Show growth %
FROM artist_growth g -- CTE from Step 2
JOIN artists a -- Get artist names
ON g.artist_id = a.artist_id -- Match IDs
WHERE g.growth_pct > 50 -- Filter: >50% growth
AND g.month >= DATE_TRUNC('month', CURRENT_DATE) - INTERVAL '3 months' -- Last 3 months
ORDER BY g.growth_pct DESC, -- Highest growth first
g.month DESC -- Then by recency
-- Final output:
-- artist_name | month | play_count | growth_pct
-- The Weeknd | 2024-11 | 2500 | 56.3 -- Breakout!
-- The Weeknd | 2024-10 | 1600 | 60.0 -- Consistent
-- Dua Lipa | 2024-11 | 3200 | 78.0 -- Viral