SQL Problem Solving: Writing Queries

Transform problems into SQL solutions using patterns and references

Master Reference: Problem → Pattern → SQL Tool

Problem Words/Phrases	What You Need	SQL Tool	Example	Pattern
"total", "average", "count"	Calculate aggregate values	SUM(), AVG(), COUNT()	SELECT SUM(sales), AVG(price)	—
"for each", "per category", "by department"	Group results	GROUP BY	GROUP BY department	—
"having more than", "with at least"	Filter rows, Filter after aggregation	WHERE, HAVING	WHERE rating > 4.1, HAVING COUNT(*) > 5	—
"above/below average", "more than max"	Single calculated value	Subquery	WHERE salary > (SELECT AVG...)	—
"who never", "doesn't exist", "except"	Find exclusions	NOT IN / NOT EXISTS	WHERE id NOT IN (SELECT...)	🔻 #1
"step-by-step", "first...then", "funnel"	Build in steps	CTEs (WITH)	WITH step1 AS (...)	🔻 #1
"top N", "rank", "nth best", "percentile"	Rank within groups	ROW_NUMBER() OVER	ROW_NUMBER() OVER (PARTITION BY...)	🪜 #2
"running total", "cumulative", "up to"	Calculate running values	SUM() OVER (ORDER BY)	SUM(x) OVER (ORDER BY date)	📈 #3
"compared to average", "vs peers"	Compare to group metrics	AVG() OVER	value - AVG(value) OVER (...)	📊 #4
"change from", "growth", "vs previous"	Detect changes over time	LAG() / LEAD()	LAG(value) OVER (ORDER BY date)	🎯 #5

Pattern Recognition: The 5 Essential Patterns

Master these 5 patterns and you'll solve 80% of SQL problems

The Funnel finds exclusions, The Ladder ranks items, The Timeline tracks changes, The Comparison measures performance, The Exception finds outliers
Experiment and get comfortable with these patterns in your BigQuery Project1 and Project2
Note: Examples use conceptual tables for clarity - adapt column names to your actual schema

Pattern 1: The Funnel 🔻

Finding users who progress through stages (or don't)

📍 When to use:

Users who did A but never B
Conversion funnel analysis
Multi-step user journeys
Drop-off analysis

🔧 SQL:

-- Users: played but never shared
SELECT user_id, name
FROM users
WHERE user_id IN (
  SELECT user_id FROM listens  -- Step 1: Who listened
)
AND user_id NOT IN (
  SELECT user_id FROM shares   -- Step 2: Exclude sharers
                               -- Experiment with NULL in NOT IN death traps
                               -- For our Spotify DB, user_id is not NULL. 
                               -- But in other schemas, it may be  
)

Pattern 2: The Ladder 🪜

Ranking items within groups

📍 When to use:

Top 3 items per category
Rank within departments
Find nth best performer
Percentile rankings

🔧 SQL:

-- Top 3 songs per genre (conceptual: songs table)
SELECT * FROM (
  SELECT song, genre, plays,
    ROW_NUMBER() OVER (
      PARTITION BY genre  -- Restart numbering at 1 for each genre
      ORDER BY plays DESC -- Highest plays gets #1
    ) AS genre_position   -- Creates: Pop #1, Pop #2..., Rock #1, Rock #2...
  FROM songs
) ranked  -- This subquery creates a table with genre_position as a column
WHERE genre_position <= 3  -- Can use alias here: it's a column from subquery

Pattern 3: The Timeline 📈

Cumulative and moving calculations

📍 When to use:

Running total over time
Moving averages
Month-to-date calculations
Cumulative growth metrics

🔧 SQL:

-- Running total of listening hours (conceptual: daily_listening table)
SELECT date,
  hours,                -- Today's hours (e.g., 3)
  SUM(hours) OVER (
    ORDER BY date      
    -- Add extra logic here as needed:
      -- PARTITION BY user_id  -- Separate running total per user
      -- Default frame: UNBOUNDED PRECEDING to CURRENT ROW  
      -- For 7-day rolling avg: ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS cumulative_hours -- Day 1: 3, Day 2: 3+5=8, Day 3: 8+2=10...
FROM daily_listening

Pattern 4: The Comparison 📊

Individual vs Group Performance

📍 When to use:

Compare to department average
Percentile within group
Above/below benchmark
Peer comparisons

🔧 SQL:

-- Compare individual vs group average (conceptual: user_stats table)
SELECT user_id, tier, hours,
  hours - AVG(hours) OVER (
    PARTITION BY tier   -- Calculate avg for each tier separately
  ) AS vs_avg          -- Positive = above tier avg, Negative = below
                       -- Same AVG repeats for all users in tier
FROM user_stats

Pattern 5: The Exception Finder 🎯

Detecting outliers and anomalies

📍 When to use:

Sudden drops in activity
Unusual spikes
Behavior changes
Anomaly detection

🔧 SQL:

-- Detect >50% changes (conceptual: metrics table)
  -- Example2: detect when values match/exceed historical highs
WITH changes AS (
  SELECT *,
    LAG(value) OVER (ORDER BY date) AS prev_value  -- Step 1: Get previous value
      -- Ex2: To find new highs: MAX(value) OVER (ORDER BY date) as high_so_far
  FROM metrics
),
with_pct AS (  -- Step 2: Calculate % using prev_value (can't use alias in same SELECT)
  SELECT *,
    ((value - prev_value) * 100.0 / 
     NULLIF(prev_value, 0)) AS pct_change  -- (new - old) / old × 100
      -- Ex2: To find new highs: value >= high_so_far as is_new_high
  FROM changes
)
SELECT * FROM with_pct 
WHERE ABS(pct_change) > 50  -- Big changes only

Speed Run: Solving a Real Problem

Problem: "Find breakout artists with strong growth in past 3 months (>50% month-over-month)"

Breaking It Down in English First

🧩 Problem Decomposition

"artists" → Analyze each artist separately 📍 Row 2: GROUP BY

">50% growth" → Compare this month vs last month 📍 Row 10: LAG()

"past 3 months" → Filter to Sep, Aug, Jul 2025 📍 Row 3: WHERE filter

"strong" → Only keep artists above threshold 📍 Row 3: WHERE filter

📊 Example: "The Weeknd" Monthly Plays

Sept: 1,000 plays
Oct:  1,600 plays (+60% growth) ✅
Nov:  2,500 plays (+56% growth) ✅

This artist shows consistent 50%+ growth → They're a breakout artist!

The SQL Implementation

Step 1: Count Monthly Plays

📍 Uses rows 1-2

Why: Need raw numbers before we can calculate growth
What: Count plays for each artist per month

WITH monthly_plays AS (  -- CTE: named result set
  SELECT 
    artist_id,                                -- Keep artist ID
    DATE_TRUNC('month', play_date) AS month, -- Round to month start
    COUNT(*) AS play_count                   -- Total plays
  FROM listens
  GROUP BY artist_id, month  -- One row per artist-month combo
)
-- Result preview:
-- artist_id | month    | play_count
-- 1234      | 2024-09  | 1000       -- Sept baseline
-- 1234      | 2024-10  | 1600       -- Oct growth  
-- 1234      | 2024-11  | 2500       -- Nov spike

Step 2: Calculate Growth Rate

📍 Uses row 10

Why: Need to know if artist is growing
What: Compare each month to previous using LAG()

, artist_growth AS (      -- Chain another CTE
  SELECT 
    artist_id,            -- Pass through
    month,                -- Pass through  
    play_count,           -- Current month
    LAG(play_count, 1) OVER (         -- Look back 1 row
      PARTITION BY artist_id          -- Reset for each artist
      ORDER BY month                  -- Time sequence
    ) AS prev_month_plays,            -- Previous month value
    -- Growth % = (new - old) / old × 100
    ROUND(                            -- Clean decimals
      (play_count - LAG(play_count, 1) OVER (PARTITION BY artist_id ORDER BY month)) 
      * 100.0 / NULLIF(               -- Percent formula
        LAG(play_count, 1) OVER (PARTITION BY artist_id ORDER BY month), 
        0                             -- Avoid divide by 0
      ),
      1                               -- 1 decimal place
    ) AS growth_pct
  FROM monthly_plays      -- Use CTE from Step 1
)
-- Result preview:
-- artist_id | month    | play_count | prev_month | growth_pct
-- 1234      | 2024-09  | 1000       | NULL       | NULL      -- No prior
-- 1234      | 2024-10  | 1600       | 1000       | 60.0      -- +60%
-- 1234      | 2024-11  | 2500       | 1600       | 56.3      -- +56%

Step 3: Find Breakout Artists

📍 Uses row 3

Why: Filter to only high-growth artists
What: Apply >50% filter + recency filter

SELECT                    -- Final query uses both CTEs
  a.name AS artist_name,  -- Human-readable name
  g.month,                -- Keep month
  g.play_count,           -- Show absolute numbers
  g.growth_pct            -- Show growth %
FROM artist_growth g      -- CTE from Step 2
JOIN artists a            -- Get artist names
  ON g.artist_id = a.artist_id       -- Match IDs
WHERE g.growth_pct > 50              -- Filter: >50% growth
  AND g.month >= DATE_TRUNC('month', CURRENT_DATE) - INTERVAL '3 months'  -- Last 3 months
ORDER BY g.growth_pct DESC,          -- Highest growth first
         g.month DESC                 -- Then by recency

-- Final output:
-- artist_name | month    | play_count | growth_pct
-- The Weeknd  | 2024-11  | 2500       | 56.3      -- Breakout!
-- The Weeknd  | 2024-10  | 1600       | 60.0      -- Consistent
-- Dua Lipa    | 2024-11  | 3200       | 78.0      -- Viral

Key Takeaway: We decomposed "strong recent growth" into 3 clear steps:
1️⃣ Aggregate data (COUNT + GROUP BY) → 2️⃣ Calculate change (LAG) → 3️⃣ Filter results (WHERE)

Pro Tips

Rules Of Thumb

✅ Do This

Start simple, add complexity gradually
Test with small data first
Use CTEs for readability
Name things clearly (not t1, t2)
Check for NULLs explicitly

❌ Avoid This

Deeply nested subqueries (>3 levels)
SELECT * in production
Forgetting GROUP BY with aggregates
Not handling division by zero
Ignoring the execution order