SQL Problem Solving: Writing Queries
Transform problems into SQL solutions using patterns and references
Master Reference: Problem → Pattern → SQL Tool
Problem Words/Phrases | What You Need | SQL Tool | Example | Pattern |
---|---|---|---|---|
"total", "average", "count" | Calculate aggregate values | SUM(), AVG(), COUNT() | SELECT SUM(sales), AVG(price) | — |
"for each", "per category", "by department" | Group results | GROUP BY | GROUP BY department | — |
"having more than", "with at least" | Filter after aggregation | HAVING | HAVING COUNT(*) > 5 | — |
"above/below average", "more than max" | Single calculated value | Subquery | WHERE salary > (SELECT AVG...) | — |
"who never", "doesn't exist", "except" | Find exclusions | NOT IN / NOT EXISTS | WHERE id NOT IN (SELECT...) | 🔻 #1 |
"step-by-step", "first...then", "funnel" | Build in steps | CTEs (WITH) | WITH step1 AS (...) | 🔻 #1 |
"top N", "rank", "nth best", "percentile" | Rank within groups | ROW_NUMBER() OVER | ROW_NUMBER() OVER (PARTITION BY...) | 🪜 #2 |
"running total", "cumulative", "up to" | Calculate running values | SUM() OVER (ORDER BY) | SUM(x) OVER (ORDER BY date) | 📈 #3 |
"compared to average", "vs peers" | Compare to group metrics | AVG() OVER | value - AVG(value) OVER (...) | 📊 #4 |
"change from", "growth", "vs previous" | Detect changes over time | LAG() / LEAD() | LAG(value) OVER (ORDER BY date) | 🎯 #5 |
Pattern Recognition: The 5 Essential Patterns
Master these 5 patterns and you'll solve 80% of SQL problems
- The Funnel finds exclusions, The Ladder ranks items, The Timeline tracks changes, The Comparison measures performance, The Exception finds outliers
- Experiment and get comfortable with these patterns in your BigQuery Project1 and Project2
Pattern 1: The Funnel 🔻
Finding users who progress through stages (or don't)
📍 When to use:
- Users who did A but never B
- Conversion funnel analysis
- Multi-step user journeys
- Drop-off analysis
🔧 SQL:
-- Users: played but never shared
SELECT user_id, name
FROM users
WHERE user_id IN (
SELECT user_id FROM listens -- Step 1: Who listened
)
AND user_id NOT IN (
SELECT user_id FROM shares -- Step 2: Exclude sharers
-- Experiment with NULL in NOT IN death traps
-- For our Spotify DB, user_id is not NULL.
-- But in other schemas, it may be
)
Pattern 2: The Ladder 🪜
Ranking items within groups
📍 When to use:
- Top 3 items per category
- Rank within departments
- Find nth best performer
- Percentile rankings
🔧 SQL:
-- Top 3 per genre
SELECT * FROM (
SELECT song, genre, plays,
ROW_NUMBER() OVER (
PARTITION BY genre -- Reset per genre
ORDER BY plays DESC -- High plays = #1
) AS RANK
FROM songs
) ranked
WHERE RANK <= 3 -- Top 3 only
Pattern 3: The Timeline 📈
Cumulative and moving calculations
📍 When to use:
- Running total over time
- Moving averages
- Month-to-date calculations
- Cumulative growth metrics
🔧 SQL:
-- Running total: hours accumulated
SELECT
date,
hours, -- Daily value
SUM(hours) OVER (
ORDER BY date -- Sum: all up to now
) AS cumulative_hours
FROM daily_listening
Pattern 4: The Comparison 📊
Individual vs Group Performance
📍 When to use:
- Compare to department average
- Percentile within group
- Above/below benchmark
- Peer comparisons
🔧 SQL:
-- User vs tier average
SELECT
user_id,
tier,
hours,
hours - AVG(hours) OVER (
PARTITION BY tier -- Tier avg
) AS vs_avg -- +ve = above avg
FROM user_stats
Pattern 5: The Exception Finder 🎯
Detecting outliers and anomalies
📍 When to use:
- Sudden drops in activity
- Unusual spikes
- Behavior changes
- Anomaly detection
🔧 SQL:
-- Detect >50% changes
WITH changes AS (
SELECT *,
LAG(value) OVER (ORDER BY date) AS prev_value, -- Previous value
-- % change = (new - old) / old × 100
((value - LAG(value) OVER (ORDER BY date)) * 100.0 /
NULLIF(LAG(value) OVER (ORDER BY date), 0)) AS pct_change -- Prevent div/0
FROM metrics
)
SELECT * FROM changes
WHERE ABS(pct_change) > 50 -- Big changes only
Speed Run: Solving a Real Problem
Problem: "Find breakout artists with strong growth in past 3 months (>50% month-over-month)"
Breaking It Down in English First
🧩 Problem Decomposition
📊 Example: "The Weeknd" Monthly Plays
Sept: 1,000 plays Oct: 1,600 plays (+60% growth) ✅ Nov: 2,500 plays (+56% growth) ✅
This artist shows consistent 50%+ growth → They're a breakout artist!
The SQL Implementation
Step 1: Count Monthly Plays
📍 Uses rows 1-2
Why: Need raw numbers before we can calculate growth
What: Count plays for each artist per month
WITH monthly_plays AS ( -- CTE: named result set
SELECT
artist_id, -- Keep artist ID
DATE_TRUNC('month', play_date) AS month, -- Round to month start
COUNT(*) AS play_count -- Total plays
FROM listens
GROUP BY artist_id, month -- One row per artist-month combo
)
-- Result preview:
-- artist_id | month | play_count
-- 1234 | 2024-09 | 1000 -- Sept baseline
-- 1234 | 2024-10 | 1600 -- Oct growth
-- 1234 | 2024-11 | 2500 -- Nov spike
Step 2: Calculate Growth Rate
📍 Uses row 10
Why: Need to know if artist is growing
What: Compare each month to previous using LAG()
, artist_growth AS ( -- Chain another CTE
SELECT
artist_id, -- Pass through
month, -- Pass through
play_count, -- Current month
LAG(play_count, 1) OVER ( -- Look back 1 row
PARTITION BY artist_id -- Reset for each artist
ORDER BY month -- Time sequence
) AS prev_month_plays, -- Previous month value
-- Growth % = (new - old) / old × 100
ROUND( -- Clean decimals
(play_count - LAG(play_count, 1) OVER (PARTITION BY artist_id ORDER BY month))
* 100.0 / NULLIF( -- Percent formula
LAG(play_count, 1) OVER (PARTITION BY artist_id ORDER BY month),
0 -- Avoid divide by 0
),
1 -- 1 decimal place
) AS growth_pct
FROM monthly_plays -- Use CTE from Step 1
)
-- Result preview:
-- artist_id | month | play_count | prev_month | growth_pct
-- 1234 | 2024-09 | 1000 | NULL | NULL -- No prior
-- 1234 | 2024-10 | 1600 | 1000 | 60.0 -- +60%
-- 1234 | 2024-11 | 2500 | 1600 | 56.3 -- +56%
Step 3: Find Breakout Artists
📍 Uses row 3
Why: Filter to only high-growth artists
What: Apply >50% filter + recency filter
SELECT -- Final query uses both CTEs
a.name AS artist_name, -- Human-readable name
g.month, -- Keep month
g.play_count, -- Show absolute numbers
g.growth_pct -- Show growth %
FROM artist_growth g -- CTE from Step 2
JOIN artists a -- Get artist names
ON g.artist_id = a.artist_id -- Match IDs
WHERE g.growth_pct > 50 -- Filter: >50% growth
AND g.month >= DATE_TRUNC('month', CURRENT_DATE) - INTERVAL '3 months' -- Last 3 months
ORDER BY g.growth_pct DESC, -- Highest growth first
g.month DESC -- Then by recency
-- Final output:
-- artist_name | month | play_count | growth_pct
-- The Weeknd | 2024-11 | 2500 | 56.3 -- Breakout!
-- The Weeknd | 2024-10 | 1600 | 60.0 -- Consistent
-- Dua Lipa | 2024-11 | 3200 | 78.0 -- Viral
Key Takeaway: We decomposed "strong recent growth" into 3 clear steps:
1️⃣ Aggregate data (COUNT + GROUP BY) → 2️⃣ Calculate change (LAG) → 3️⃣ Filter results (WHERE)
Pro Tips
Rules Of Thumb
✅ Do This
- Start simple, add complexity gradually
- Test with small data first
- Use CTEs for readability
- Name things clearly (not t1, t2)
- Check for NULLs explicitly
❌ Avoid This
- Deeply nested subqueries (>3 levels)
- SELECT * in production
- Forgetting GROUP BY with aggregates
- Not handling division by zero
- Ignoring the execution order