SQL Problem Solving: Your Visual Playbook

Master the art of reading and writing complex SQL queries

Breaking Down Complex Problems

Before diving into SQL, learn to decompose problems systematically

The 3-Step Framework

1. Understand What You Have

  • What tables exist?
  • How do they connect?
  • What's the grain of each table?

2. Define What You Want

  • What columns in output?
  • One row per what?
  • What calculations needed?

3. Connect the Dots

  • Which JOINs needed?
  • What filters (WHERE)?
  • Any grouping (GROUP BY)?

Query Anatomy: How to Read Complex Queries

Before writing queries, learn to read and understand them

The Onion View: Peeling Back the Layers

Complex Query Example: Finding Power Users

Final SELECT CTE: power_users CTE: user_stats Raw Data
-- Layer 1: Calculate user statistics
WITH user_stats AS (
  SELECT 
    user_id,
    COUNT(DISTINCT song_id) as unique_songs,
    COUNT(DISTINCT genre) as unique_genres,
    SUM(play_duration) as total_minutes
  FROM listens l
  JOIN songs s ON l.song_id = s.song_id
  GROUP BY user_id
),
-- Layer 2: Identify power users
power_users AS (
  SELECT 
    user_id,
    unique_songs,
    unique_genres,
    total_minutes,
    NTILE(10) OVER (ORDER BY total_minutes DESC) as decile
  FROM user_stats
  WHERE unique_genres >= 3
)
-- Layer 3: Final selection
SELECT 
  u.name,
  p.unique_songs,
  p.unique_genres,
  ROUND(p.total_minutes / 60.0, 1) as total_hours
FROM power_users p
JOIN users u ON p.user_id = u.user_id
WHERE p.decile = 1
ORDER BY p.total_minutes DESC;
🔍 How to Read It:
  1. Start from the innermost CTE (user_stats)
  2. Understand what each CTE produces
  3. Follow the data transformation through layers
  4. The final SELECT just formats the presentation

Pattern Recognition: The 5 Essential Patterns

Master these patterns and you'll solve 90% of SQL problems

Pattern 1: The Funnel 🔻

Finding users who progress through stages (or don't)

Played Created Shared 1K 450 87

📍 When to use:

  • Users who did A but never B
  • Conversion funnel analysis
  • Multi-step user journeys
  • Drop-off analysis

🔧 SQL:

-- Find users who played songs but never shared
-- Step 1: Get all users who have listened
-- Step 2: Exclude users who have shared
SELECT user_id, name
FROM users
WHERE user_id IN (
  SELECT user_id FROM listens  -- Has listened
)
AND user_id NOT IN (
  SELECT user_id FROM shares   -- Never shared
)

Pattern 2: The Ladder 🪜

Ranking items within groups

Top N per Group 1 2 3

📍 When to use:

  • Top 3 items per category
  • Rank within departments
  • Find nth best performer
  • Percentile rankings

🔧 SQL:

-- Top 3 songs per genre using ROW_NUMBER()
-- ROW_NUMBER assigns 1, 2, 3... within each genre
SELECT * FROM (
  SELECT song, genre, plays,
    ROW_NUMBER() OVER (
      PARTITION BY genre  -- Restart numbering for each genre
      ORDER BY plays DESC -- Most played gets rank 1
    ) as rank
  FROM songs
) ranked
WHERE rank <= 3  -- Keep only top 3

Pattern 3: The Timeline 📈

Cumulative and moving calculations

Running Total Time →

📍 When to use:

  • Running total over time
  • Moving averages
  • Month-to-date calculations
  • Cumulative growth metrics

🔧 SQL:

-- Calculate running total of listening hours
-- Each row shows total hours up to that date
SELECT 
  date,
  hours,  -- Daily hours
  SUM(hours) OVER (
    ORDER BY date  -- Sum all previous + current
  ) as cumulative_hours
FROM daily_listening

Pattern 4: The Comparison 📊

Individual vs Group Performance

You vs Average You Avg

📍 When to use:

  • Compare to department average
  • Percentile within group
  • Above/below benchmark
  • Peer comparisons

🔧 SQL:

-- Compare each user to their tier's average
-- Shows how far above/below average they are
SELECT 
  user_id,
  tier,
  hours,
  hours - AVG(hours) OVER (
    PARTITION BY tier  -- Calculate avg per tier
  ) as vs_avg  -- Positive = above average
FROM user_stats

Pattern 5: The Exception Finder 🎯

Detecting outliers and anomalies

Outlier Detection -80% +200%

📍 When to use:

  • Sudden drops in activity
  • Unusual spikes
  • Behavior changes
  • Anomaly detection

🔧 SQL:

-- Find big changes (>50% increase or decrease)
-- LAG() gets the previous row's value when ordered by date
WITH changes AS (
  SELECT *,
    LAG(value) OVER (ORDER BY date) as prev_value,
    -- Calculate percent change from previous period
    -- NULLIF prevents division by zero
    ((value - LAG(value) OVER (ORDER BY date)) * 100.0 / 
     NULLIF(LAG(value) OVER (ORDER BY date), 0)) as pct_change
  FROM metrics
)
SELECT * FROM changes 
WHERE ABS(pct_change) > 50  -- Keep only big changes

Master Reference: Problem → Pattern → SQL Tool

Problem Words/Phrases What You Need SQL Tool Example Pattern
"total", "average", "count" Calculate aggregate values SUM(), AVG(), COUNT() SELECT SUM(sales), AVG(price)
"for each", "per category", "by department" Group results GROUP BY GROUP BY department
"having more than", "with at least" Filter after aggregation HAVING HAVING COUNT(*) > 5
"above/below average", "more than max" Single calculated value Subquery WHERE salary > (SELECT AVG...)
"who never", "doesn't exist", "except" Find exclusions NOT IN / NOT EXISTS WHERE id NOT IN (SELECT...) 🔻 #1
"step-by-step", "first...then", "funnel" Build in steps CTEs (WITH) WITH step1 AS (...) 🔻 #1
"top N", "rank", "nth best", "percentile" Rank within groups ROW_NUMBER() OVER ROW_NUMBER() OVER (PARTITION BY...) 🪜 #2
"running total", "cumulative", "up to" Calculate running values SUM() OVER (ORDER BY) SUM(x) OVER (ORDER BY date) 📈 #3
"compared to average", "vs peers" Compare to group metrics AVG() OVER value - AVG(value) OVER (...) 📊 #4
"change from", "growth", "vs previous" Detect changes over time LAG() / LEAD() LAG(value) OVER (ORDER BY date) 🎯 #5

Speed Run: Solving a Real Problem

Problem: "Find breakout artists with strong recent growth (>50% month-over-month)"

Breaking It Down in English First

🧩 Problem Decomposition

"artists" → Analyze each artist separately 📍 Row 2: GROUP BY
"strong growth" → Compare this month vs last month 📍 Row 10: LAG()
"recent" → Last 3 months only 📍 Row 3: WHERE filter
">50%" → Filter high-growth artists 📍 Row 3: WHERE filter
📊 Example: "The Weeknd" Monthly Plays
Sept: 1,000 plays
Oct:  1,600 plays (+60% growth) ✅
Nov:  2,500 plays (+56% growth) ✅
    

This artist shows consistent 50%+ growth → They're a breakout artist!

The SQL Implementation

Step 1: Count Monthly Plays

📍 Uses rows 1-2

Why: Need raw numbers before we can calculate growth
What: Count plays for each artist per month

WITH monthly_plays AS (  -- Row 6: Build step-by-step with CTEs
  SELECT 
    artist_id,
    DATE_TRUNC('month', play_date) as month,
    COUNT(*) as play_count  -- Row 1: COUNT() aggregation
  FROM listens
  GROUP BY artist_id, month  -- Row 2: GROUP BY each artist
)
-- Result: artist_id | month     | play_count
--         1234      | 2024-09   | 1000
--         1234      | 2024-10   | 1600
--         1234      | 2024-11   | 2500

Step 2: Calculate Growth Rate

📍 Uses row 10

Why: Need to know if artist is growing
What: Compare each month to previous using LAG()

, artist_growth AS (
  SELECT 
    artist_id,
    month,
    play_count,
    LAG(play_count, 1) OVER (  -- Row 10: Get previous month's value
      PARTITION BY artist_id    -- Separate LAG() per artist
      ORDER BY month
    ) as prev_month_plays,
    -- Calculate % growth: (current - previous) / previous * 100
    ROUND(
      (play_count - LAG(play_count, 1) OVER (PARTITION BY artist_id ORDER BY month)) 
      * 100.0 / NULLIF(LAG(play_count, 1) OVER (PARTITION BY artist_id ORDER BY month), 0),
      1
    ) as growth_pct
  FROM monthly_plays
)
-- Result: artist_id | month   | play_count | prev_month | growth_pct
--         1234      | 2024-09 | 1000       | NULL       | NULL
--         1234      | 2024-10 | 1600       | 1000       | 60.0
--         1234      | 2024-11 | 2500       | 1600       | 56.3

Step 3: Find Breakout Artists

📍 Uses row 3

Why: Filter to only high-growth artists
What: Apply >50% filter + recency filter

SELECT 
  a.name as artist_name,
  g.month,
  g.play_count,
  g.growth_pct
FROM artist_growth g
JOIN artists a ON g.artist_id = a.artist_id
WHERE g.growth_pct > 50  -- Row 3: Filter high growth
  AND g.month >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '3 months')  -- Recent only
ORDER BY g.growth_pct DESC, g.month DESC

-- Final Result:
-- artist_name   | month    | play_count | growth_pct
-- The Weeknd    | 2024-11  | 2500       | 56.3
-- The Weeknd    | 2024-10  | 1600       | 60.0
-- Dua Lipa      | 2024-11  | 3200       | 78.0
-- ...

💡 Key Takeaway: We decomposed "strong recent growth" into 3 clear steps:
1️⃣ Aggregate data (COUNT + GROUP BY) → 2️⃣ Calculate change (LAG) → 3️⃣ Filter results (WHERE)

Pro Tips: Think Like the Database

💡 Golden Rules

✅ Do This

  • Start simple, add complexity gradually
  • Test with small data first
  • Use CTEs for readability
  • Name things clearly (not t1, t2)
  • Check for NULLs explicitly

❌ Avoid This

  • Deeply nested subqueries (>3 levels)
  • SELECT * in production
  • Forgetting GROUP BY with aggregates
  • Not handling division by zero
  • Ignoring index opportunities

🎯 Remember: Complex queries are just simple queries combined. Master the patterns, and you'll solve any problem!