SQL Problem Solving: Writing Queries

Transform problems into SQL solutions using patterns and references

Master Reference: Problem → Pattern → SQL Tool

Problem Words/Phrases What You Need SQL Tool Example Pattern
"total", "average", "count" Calculate aggregate values SUM(), AVG(), COUNT() SELECT SUM(sales), AVG(price)
"for each", "per category", "by department" Group results GROUP BY GROUP BY department
"having more than", "with at least" Filter rows, Filter after aggregation WHERE, HAVING WHERE rating > 4.1, HAVING COUNT(*) > 5
"above/below average", "more than max" Single calculated value Subquery WHERE salary > (SELECT AVG...)
"who never", "doesn't exist", "except" Find exclusions NOT IN / NOT EXISTS WHERE id NOT IN (SELECT...) 🔻 #1
"step-by-step", "first...then", "funnel" Build in steps CTEs (WITH) WITH step1 AS (...) 🔻 #1
"top N", "rank", "nth best", "percentile" Rank within groups ROW_NUMBER() OVER ROW_NUMBER() OVER (PARTITION BY...) 🪜 #2
"running total", "cumulative", "up to" Calculate running values SUM() OVER (ORDER BY) SUM(x) OVER (ORDER BY date) 📈 #3
"compared to average", "vs peers" Compare to group metrics AVG() OVER value - AVG(value) OVER (...) 📊 #4
"change from", "growth", "vs previous" Detect changes over time LAG() / LEAD() LAG(value) OVER (ORDER BY date) 🎯 #5

Pattern Recognition: The 5 Essential Patterns

Master these 5 patterns and you'll solve 80% of SQL problems

  • The Funnel finds exclusions, The Ladder ranks items, The Timeline tracks changes, The Comparison measures performance, The Exception finds outliers
  • Experiment and get comfortable with these patterns in your BigQuery Project1 and Project2
  • Note: Examples use conceptual tables for clarity - adapt column names to your actual schema

Pattern 1: The Funnel 🔻

Finding users who progress through stages (or don't)

Played Created Shared 1K 450 87

📍 When to use:

  • Users who did A but never B
  • Conversion funnel analysis
  • Multi-step user journeys
  • Drop-off analysis

🔧 SQL:

-- Users: played but never shared
SELECT user_id, name
FROM users
WHERE user_id IN (
  SELECT user_id FROM listens  -- Step 1: Who listened
)
AND user_id NOT IN (
  SELECT user_id FROM shares   -- Step 2: Exclude sharers
                               -- Experiment with NULL in NOT IN death traps
                               -- For our Spotify DB, user_id is not NULL. 
                               -- But in other schemas, it may be  
)

Pattern 2: The Ladder 🪜

Ranking items within groups

Top N per Group 1 2 3

📍 When to use:

  • Top 3 items per category
  • Rank within departments
  • Find nth best performer
  • Percentile rankings

🔧 SQL:

-- Top 3 songs per genre (conceptual: songs table)
SELECT * FROM (
  SELECT song, genre, plays,
    ROW_NUMBER() OVER (
      PARTITION BY genre  -- Restart numbering at 1 for each genre
      ORDER BY plays DESC -- Highest plays gets #1
    ) AS genre_position   -- Creates: Pop #1, Pop #2..., Rock #1, Rock #2...
  FROM songs
) ranked  -- This subquery creates a table with genre_position as a column
WHERE genre_position <= 3  -- Can use alias here: it's a column from subquery

Pattern 3: The Timeline 📈

Cumulative and moving calculations

Running Total Time →

📍 When to use:

  • Running total over time
  • Moving averages
  • Month-to-date calculations
  • Cumulative growth metrics

🔧 SQL:

-- Running total of listening hours (conceptual: daily_listening table)
SELECT date,
  hours,                -- Today's hours (e.g., 3)
  SUM(hours) OVER (
    ORDER BY date      
    -- Add extra logic here as needed:
      -- PARTITION BY user_id  -- Separate running total per user
      -- Default frame: UNBOUNDED PRECEDING to CURRENT ROW  
      -- For 7-day rolling avg: ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS cumulative_hours -- Day 1: 3, Day 2: 3+5=8, Day 3: 8+2=10...
FROM daily_listening

Pattern 4: The Comparison 📊

Individual vs Group Performance

You vs Average You Avg

📍 When to use:

  • Compare to department average
  • Percentile within group
  • Above/below benchmark
  • Peer comparisons

🔧 SQL:

-- Compare individual vs group average (conceptual: user_stats table)
SELECT user_id, tier, hours,
  hours - AVG(hours) OVER (
    PARTITION BY tier   -- Calculate avg for each tier separately
  ) AS vs_avg          -- Positive = above tier avg, Negative = below
                       -- Same AVG repeats for all users in tier
FROM user_stats

Pattern 5: The Exception Finder 🎯

Detecting outliers and anomalies

Outlier Detection -80% +200%

📍 When to use:

  • Sudden drops in activity
  • Unusual spikes
  • Behavior changes
  • Anomaly detection

🔧 SQL:

-- Detect >50% changes (conceptual: metrics table)
  -- Example2: detect when values match/exceed historical highs
WITH changes AS (
  SELECT *,
    LAG(value) OVER (ORDER BY date) AS prev_value  -- Step 1: Get previous value
      -- Ex2: To find new highs: MAX(value) OVER (ORDER BY date) as high_so_far
  FROM metrics
),
with_pct AS (  -- Step 2: Calculate % using prev_value (can't use alias in same SELECT)
  SELECT *,
    ((value - prev_value) * 100.0 / 
     NULLIF(prev_value, 0)) AS pct_change  -- (new - old) / old × 100
      -- Ex2: To find new highs: value >= high_so_far as is_new_high
  FROM changes
)
SELECT * FROM with_pct 
WHERE ABS(pct_change) > 50  -- Big changes only

Speed Run: Solving a Real Problem

Problem: "Find breakout artists with strong growth in past 3 months (>50% month-over-month)"

Breaking It Down in English First

🧩 Problem Decomposition

"artists" → Analyze each artist separately 📍 Row 2: GROUP BY
">50% growth" → Compare this month vs last month 📍 Row 10: LAG()
"past 3 months" → Filter to Sep, Aug, Jul 2025 📍 Row 3: WHERE filter
"strong" → Only keep artists above threshold 📍 Row 3: WHERE filter
📊 Example: "The Weeknd" Monthly Plays
Sept: 1,000 plays
Oct:  1,600 plays (+60% growth) ✅
Nov:  2,500 plays (+56% growth) ✅
    

This artist shows consistent 50%+ growth → They're a breakout artist!

The SQL Implementation

Step 1: Count Monthly Plays

📍 Uses rows 1-2

Why: Need raw numbers before we can calculate growth
What: Count plays for each artist per month

WITH monthly_plays AS (  -- CTE: named result set
  SELECT 
    artist_id,                                -- Keep artist ID
    DATE_TRUNC('month', play_date) AS month, -- Round to month start
    COUNT(*) AS play_count                   -- Total plays
  FROM listens
  GROUP BY artist_id, month  -- One row per artist-month combo
)
-- Result preview:
-- artist_id | month    | play_count
-- 1234      | 2024-09  | 1000       -- Sept baseline
-- 1234      | 2024-10  | 1600       -- Oct growth  
-- 1234      | 2024-11  | 2500       -- Nov spike

Step 2: Calculate Growth Rate

📍 Uses row 10

Why: Need to know if artist is growing
What: Compare each month to previous using LAG()

, artist_growth AS (      -- Chain another CTE
  SELECT 
    artist_id,            -- Pass through
    month,                -- Pass through  
    play_count,           -- Current month
    LAG(play_count, 1) OVER (         -- Look back 1 row
      PARTITION BY artist_id          -- Reset for each artist
      ORDER BY month                  -- Time sequence
    ) AS prev_month_plays,            -- Previous month value
    -- Growth % = (new - old) / old × 100
    ROUND(                            -- Clean decimals
      (play_count - LAG(play_count, 1) OVER (PARTITION BY artist_id ORDER BY month)) 
      * 100.0 / NULLIF(               -- Percent formula
        LAG(play_count, 1) OVER (PARTITION BY artist_id ORDER BY month), 
        0                             -- Avoid divide by 0
      ),
      1                               -- 1 decimal place
    ) AS growth_pct
  FROM monthly_plays      -- Use CTE from Step 1
)
-- Result preview:
-- artist_id | month    | play_count | prev_month | growth_pct
-- 1234      | 2024-09  | 1000       | NULL       | NULL      -- No prior
-- 1234      | 2024-10  | 1600       | 1000       | 60.0      -- +60%
-- 1234      | 2024-11  | 2500       | 1600       | 56.3      -- +56%

Step 3: Find Breakout Artists

📍 Uses row 3

Why: Filter to only high-growth artists
What: Apply >50% filter + recency filter

SELECT                    -- Final query uses both CTEs
  a.name AS artist_name,  -- Human-readable name
  g.month,                -- Keep month
  g.play_count,           -- Show absolute numbers
  g.growth_pct            -- Show growth %
FROM artist_growth g      -- CTE from Step 2
JOIN artists a            -- Get artist names
  ON g.artist_id = a.artist_id       -- Match IDs
WHERE g.growth_pct > 50              -- Filter: >50% growth
  AND g.month >= DATE_TRUNC('month', CURRENT_DATE) - INTERVAL '3 months'  -- Last 3 months
ORDER BY g.growth_pct DESC,          -- Highest growth first
         g.month DESC                 -- Then by recency

-- Final output:
-- artist_name | month    | play_count | growth_pct
-- The Weeknd  | 2024-11  | 2500       | 56.3      -- Breakout!
-- The Weeknd  | 2024-10  | 1600       | 60.0      -- Consistent
-- Dua Lipa    | 2024-11  | 3200       | 78.0      -- Viral

Key Takeaway: We decomposed "strong recent growth" into 3 clear steps:
1️⃣ Aggregate data (COUNT + GROUP BY) → 2️⃣ Calculate change (LAG) → 3️⃣ Filter results (WHERE)

Pro Tips

Rules Of Thumb

✅ Do This

  • Start simple, add complexity gradually
  • Test with small data first
  • Use CTEs for readability
  • Name things clearly (not t1, t2)
  • Check for NULLs explicitly

❌ Avoid This

  • Deeply nested subqueries (>3 levels)
  • SELECT * in production
  • Forgetting GROUP BY with aggregates
  • Not handling division by zero
  • Ignoring the execution order