SQL Problem Solving: Writing Queries

5 popular patterns: 80% of SQL problems cn be solved by combining these 5 patterns

  1. The Funnel finds exclusions, The Ladder ranks items, The Timeline tracks changes, The Comparison measures performance, The Exception finds outliers
  2. Experiment and get comfortable with these patterns in your BigQuery Project1 and Project2
  3. Use CTEs for readability, and breaking down problems into smaller chunks

Pattern 1: The Funnel

Finding users who progress through stages (or don't)

Played Created Shared 1K 450 87

When to use:

  • Users who did A but never B
  • Conversion funnel analysis
  • Multi-step user journeys
  • Drop-off analysis

SQL:

-- Users: played but never shared
SELECT user_id, name
FROM users
WHERE user_id IN (
  SELECT user_id FROM listens  -- Step 1: Who listened
)
AND user_id NOT IN (
  SELECT user_id FROM shares   -- Step 2: Exclude sharers
                               -- Experiment with NULL in NOT IN death traps
                               -- For our Spotify DB, user_id is not NULL. 
                               -- But in other schemas, it may be  
)

Pattern 2: The Ladder

Ranking items within groups

Top N per Group 1 2 3

When to use:

  • Top 3 items per category
  • Rank within departments
  • Find nth best performer
  • Percentile rankings

SQL:

-- Top 3 songs per genre (conceptual: songs table)
SELECT * FROM (
  SELECT song, genre, plays,
    ROW_NUMBER() OVER (
      PARTITION BY genre  -- Restart numbering at 1 for each genre
      ORDER BY plays DESC -- Highest plays gets #1
    ) AS genre_position   -- Creates: Pop #1, Pop #2..., Rock #1, Rock #2...
  FROM songs
) ranked  -- This subquery creates a table with genre_position as a column
WHERE genre_position <= 3  -- Can use alias here: it's a column from subquery

Pattern 3: The Timeline

Cumulative and moving calculations

Running Total Time →

When to use:

  • Running total over time
  • Moving averages
  • Month-to-date calculations
  • Cumulative growth metrics

SQL:

-- Running total of listening hours (conceptual: daily_listening table)
SELECT date,
  hours,                -- Today's hours (e.g., 3)
  SUM(hours) OVER (
    ORDER BY date      
    -- Add extra logic here as needed:
      -- PARTITION BY user_id  -- Separate running total per user
      -- Default frame: UNBOUNDED PRECEDING to CURRENT ROW  
      -- For 7-day rolling avg: ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS cumulative_hours -- Day 1: 3, Day 2: 3+5=8, Day 3: 8+2=10...
FROM daily_listening

Pattern 4: The Comparison

Individual vs Group Performance

You vs Average You Avg

When to use:

  • Compare to department average
  • Percentile within group
  • Above/below benchmark
  • Peer comparisons

SQL:

-- Compare individual vs group average (conceptual: user_stats table)
SELECT user_id, tier, hours,
  hours - AVG(hours) OVER (
    PARTITION BY tier   -- Calculate avg for each tier separately
  ) AS vs_avg          -- Positive = above tier avg, Negative = below
                       -- Same AVG repeats for all users in tier
FROM user_stats

Pattern 5: The Exception Finder

Detecting outliers and anomalies

Outlier Detection -80% +200%

When to use:

  • Sudden drops in activity
  • Unusual spikes
  • Behavior changes
  • Anomaly detection

SQL:

-- Detect >50% changes (conceptual: metrics table)
  -- Example2: detect when values match/exceed historical highs
WITH changes AS (
  SELECT *,
    LAG(value) OVER (ORDER BY date) AS prev_value  -- Step 1: Get previous value
      -- Ex2: To find new highs: MAX(value) OVER (ORDER BY date) as high_so_far
  FROM metrics
),
with_pct AS (  -- Step 2: Calculate % using prev_value (can't use alias in same SELECT)
  SELECT *,
    ((value - prev_value) * 100.0 / 
     NULLIF(prev_value, 0)) AS pct_change  -- (new - old) / old × 100
      -- Ex2: To find new highs: value >= high_so_far as is_new_high
  FROM changes
)
SELECT * FROM with_pct 
WHERE ABS(pct_change) > 50  -- Big changes only

Example 1:

Problem: "Find breakout artists with strong growth in past 3 months (>50% month-over-month)"

Patterns Used: Timeline, Comparison

Step 1: Count Monthly Plays

Why: Need raw numbers before we can calculate growth
What: Count plays for each artist per month

WITH monthly_plays AS (  -- CTE: named result set
  SELECT 
    artist_id,                                -- Keep artist ID
    DATE_TRUNC('month', play_date) AS month, -- Round to month start
    COUNT(*) AS play_count                   -- Total plays
  FROM listens
  GROUP BY artist_id, month  -- One row per artist-month combo
)
-- Result preview:
-- artist_id | month    | play_count
-- 1234      | 2024-09  | 1000       -- Sept baseline
-- 1234      | 2024-10  | 1600       -- Oct growth  
-- 1234      | 2024-11  | 2500       -- Nov spike

Step 2: Calculate Growth Rate

Why: Need to know if artist is growing
What: Compare each month to previous using LAG()

, artist_growth AS (      -- Chain another CTE
  SELECT 
    artist_id,            -- Pass through
    month,                -- Pass through  
    play_count,           -- Current month
    LAG(play_count, 1) OVER (         -- Look back 1 row
      PARTITION BY artist_id          -- Reset for each artist
      ORDER BY month                  -- Time sequence
    ) AS prev_month_plays,            -- Previous month value
    -- Growth % = (new - old) / old × 100
    ROUND(                            -- Clean decimals
      (play_count - LAG(play_count, 1) OVER (PARTITION BY artist_id ORDER BY month)) 
      * 100.0 / NULLIF(               -- Percent formula
        LAG(play_count, 1) OVER (PARTITION BY artist_id ORDER BY month), 
        0                             -- Avoid divide by 0
      ),
      1                               -- 1 decimal place
    ) AS growth_pct
  FROM monthly_plays      -- Use CTE from Step 1
)
-- Result preview:
-- artist_id | month    | play_count | prev_month | growth_pct
-- 1234      | 2024-09  | 1000       | NULL       | NULL      -- No prior
-- 1234      | 2024-10  | 1600       | 1000       | 60.0      -- +60%
-- 1234      | 2024-11  | 2500       | 1600       | 56.3      -- +56%

Step 3: Find Breakout Artists

Why: Filter to only high-growth artists
What: Apply >50% filter + recency filter

SELECT                    -- Final query uses both CTEs
  a.name AS artist_name,  -- Human-readable name
  g.month,                -- Keep month
  g.play_count,           -- Show absolute numbers
  g.growth_pct            -- Show growth %
FROM artist_growth g      -- CTE from Step 2
JOIN artists a            -- Get artist names
  ON g.artist_id = a.artist_id       -- Match IDs
WHERE g.growth_pct > 50              -- Filter: >50% growth
  AND g.month >= DATE_TRUNC('month', CURRENT_DATE) - INTERVAL '3 months'  -- Last 3 months
ORDER BY g.growth_pct DESC,          -- Highest growth first
         g.month DESC                 -- Then by recency

-- Final output:
-- artist_name | month    | play_count | growth_pct
-- The Weeknd  | 2024-11  | 2500       | 56.3      -- Breakout!
-- The Weeknd  | 2024-10  | 1600       | 60.0      -- Consistent
-- Dua Lipa    | 2024-11  | 3200       | 78.0      -- Viral