SQL Problem Solving: Your Visual Playbook
Master the art of reading and writing complex SQL queries
Breaking Down Complex Problems
Before diving into SQL, learn to decompose problems systematically
The 3-Step Framework
1. Understand What You Have
- What tables exist?
- How do they connect?
- What's the grain of each table?
2. Define What You Want
- What columns in output?
- One row per what?
- What calculations needed?
3. Connect the Dots
- Which JOINs needed?
- What filters (WHERE)?
- Any grouping (GROUP BY)?
Query Anatomy: How to Read Complex Queries
Before writing queries, learn to read and understand them
The Onion View: Peeling Back the Layers
Complex Query Example: Finding Power Users
-- Layer 1: Calculate user statistics WITH user_stats AS ( SELECT user_id, COUNT(DISTINCT song_id) as unique_songs, COUNT(DISTINCT genre) as unique_genres, SUM(play_duration) as total_minutes FROM listens l JOIN songs s ON l.song_id = s.song_id GROUP BY user_id ), -- Layer 2: Identify power users power_users AS ( SELECT user_id, unique_songs, unique_genres, total_minutes, NTILE(10) OVER (ORDER BY total_minutes DESC) as decile FROM user_stats WHERE unique_genres >= 3 ) -- Layer 3: Final selection SELECT u.name, p.unique_songs, p.unique_genres, ROUND(p.total_minutes / 60.0, 1) as total_hours FROM power_users p JOIN users u ON p.user_id = u.user_id WHERE p.decile = 1 ORDER BY p.total_minutes DESC;
🔍 How to Read It:
- Start from the innermost CTE (user_stats)
- Understand what each CTE produces
- Follow the data transformation through layers
- The final SELECT just formats the presentation
Pattern Recognition: The 5 Essential Patterns
Master these patterns and you'll solve 90% of SQL problems
Pattern 1: The Funnel 🔻
Finding users who progress through stages (or don't)
📍 When to use:
- Users who did A but never B
- Conversion funnel analysis
- Multi-step user journeys
- Drop-off analysis
🔧 SQL:
-- Find users who played songs but never shared -- Step 1: Get all users who have listened -- Step 2: Exclude users who have shared SELECT user_id, name FROM users WHERE user_id IN ( SELECT user_id FROM listens -- Has listened ) AND user_id NOT IN ( SELECT user_id FROM shares -- Never shared )
Pattern 2: The Ladder 🪜
Ranking items within groups
📍 When to use:
- Top 3 items per category
- Rank within departments
- Find nth best performer
- Percentile rankings
🔧 SQL:
-- Top 3 songs per genre using ROW_NUMBER() -- ROW_NUMBER assigns 1, 2, 3... within each genre SELECT * FROM ( SELECT song, genre, plays, ROW_NUMBER() OVER ( PARTITION BY genre -- Restart numbering for each genre ORDER BY plays DESC -- Most played gets rank 1 ) as rank FROM songs ) ranked WHERE rank <= 3 -- Keep only top 3
Pattern 3: The Timeline 📈
Cumulative and moving calculations
📍 When to use:
- Running total over time
- Moving averages
- Month-to-date calculations
- Cumulative growth metrics
🔧 SQL:
-- Calculate running total of listening hours -- Each row shows total hours up to that date SELECT date, hours, -- Daily hours SUM(hours) OVER ( ORDER BY date -- Sum all previous + current ) as cumulative_hours FROM daily_listening
Pattern 4: The Comparison 📊
Individual vs Group Performance
📍 When to use:
- Compare to department average
- Percentile within group
- Above/below benchmark
- Peer comparisons
🔧 SQL:
-- Compare each user to their tier's average -- Shows how far above/below average they are SELECT user_id, tier, hours, hours - AVG(hours) OVER ( PARTITION BY tier -- Calculate avg per tier ) as vs_avg -- Positive = above average FROM user_stats
Pattern 5: The Exception Finder 🎯
Detecting outliers and anomalies
📍 When to use:
- Sudden drops in activity
- Unusual spikes
- Behavior changes
- Anomaly detection
🔧 SQL:
-- Find big changes (>50% increase or decrease) -- LAG() gets the previous row's value when ordered by date WITH changes AS ( SELECT *, LAG(value) OVER (ORDER BY date) as prev_value, -- Calculate percent change from previous period -- NULLIF prevents division by zero ((value - LAG(value) OVER (ORDER BY date)) * 100.0 / NULLIF(LAG(value) OVER (ORDER BY date), 0)) as pct_change FROM metrics ) SELECT * FROM changes WHERE ABS(pct_change) > 50 -- Keep only big changes
Master Reference: Problem → Pattern → SQL Tool
| Problem Words/Phrases | What You Need | SQL Tool | Example | Pattern |
|---|---|---|---|---|
| "total", "average", "count" | Calculate aggregate values | SUM(), AVG(), COUNT() | SELECT SUM(sales), AVG(price) | — |
| "for each", "per category", "by department" | Group results | GROUP BY | GROUP BY department | — |
| "having more than", "with at least" | Filter after aggregation | HAVING | HAVING COUNT(*) > 5 | — |
| "above/below average", "more than max" | Single calculated value | Subquery | WHERE salary > (SELECT AVG...) | — |
| "who never", "doesn't exist", "except" | Find exclusions | NOT IN / NOT EXISTS | WHERE id NOT IN (SELECT...) | 🔻 #1 |
| "step-by-step", "first...then", "funnel" | Build in steps | CTEs (WITH) | WITH step1 AS (...) | 🔻 #1 |
| "top N", "rank", "nth best", "percentile" | Rank within groups | ROW_NUMBER() OVER | ROW_NUMBER() OVER (PARTITION BY...) | 🪜 #2 |
| "running total", "cumulative", "up to" | Calculate running values | SUM() OVER (ORDER BY) | SUM(x) OVER (ORDER BY date) | 📈 #3 |
| "compared to average", "vs peers" | Compare to group metrics | AVG() OVER | value - AVG(value) OVER (...) | 📊 #4 |
| "change from", "growth", "vs previous" | Detect changes over time | LAG() / LEAD() | LAG(value) OVER (ORDER BY date) | 🎯 #5 |
Speed Run: Solving a Real Problem
Problem: "Find breakout artists with strong recent growth (>50% month-over-month)"
Breaking It Down in English First
🧩 Problem Decomposition
📊 Example: "The Weeknd" Monthly Plays
Sept: 1,000 plays
Oct: 1,600 plays (+60% growth) ✅
Nov: 2,500 plays (+56% growth) ✅
This artist shows consistent 50%+ growth → They're a breakout artist!
The SQL Implementation
Step 1: Count Monthly Plays
📍 Uses rows 1-2
Why: Need raw numbers before we can calculate growth
What: Count plays for each artist per month
WITH monthly_plays AS ( -- Row 6: Build step-by-step with CTEs SELECT artist_id, DATE_TRUNC('month', play_date) as month, COUNT(*) as play_count -- Row 1: COUNT() aggregation FROM listens GROUP BY artist_id, month -- Row 2: GROUP BY each artist ) -- Result: artist_id | month | play_count -- 1234 | 2024-09 | 1000 -- 1234 | 2024-10 | 1600 -- 1234 | 2024-11 | 2500
Step 2: Calculate Growth Rate
📍 Uses row 10
Why: Need to know if artist is growing
What: Compare each month to previous using LAG()
, artist_growth AS (
SELECT
artist_id,
month,
play_count,
LAG(play_count, 1) OVER ( -- Row 10: Get previous month's value
PARTITION BY artist_id -- Separate LAG() per artist
ORDER BY month
) as prev_month_plays,
-- Calculate % growth: (current - previous) / previous * 100
ROUND(
(play_count - LAG(play_count, 1) OVER (PARTITION BY artist_id ORDER BY month))
* 100.0 / NULLIF(LAG(play_count, 1) OVER (PARTITION BY artist_id ORDER BY month), 0),
1
) as growth_pct
FROM monthly_plays
)
-- Result: artist_id | month | play_count | prev_month | growth_pct
-- 1234 | 2024-09 | 1000 | NULL | NULL
-- 1234 | 2024-10 | 1600 | 1000 | 60.0
-- 1234 | 2024-11 | 2500 | 1600 | 56.3
Step 3: Find Breakout Artists
📍 Uses row 3
Why: Filter to only high-growth artists
What: Apply >50% filter + recency filter
SELECT a.name as artist_name, g.month, g.play_count, g.growth_pct FROM artist_growth g JOIN artists a ON g.artist_id = a.artist_id WHERE g.growth_pct > 50 -- Row 3: Filter high growth AND g.month >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '3 months') -- Recent only ORDER BY g.growth_pct DESC, g.month DESC -- Final Result: -- artist_name | month | play_count | growth_pct -- The Weeknd | 2024-11 | 2500 | 56.3 -- The Weeknd | 2024-10 | 1600 | 60.0 -- Dua Lipa | 2024-11 | 3200 | 78.0 -- ...
💡 Key Takeaway: We decomposed "strong recent growth" into 3 clear steps:
1️⃣ Aggregate data (COUNT + GROUP BY) → 2️⃣ Calculate change (LAG) → 3️⃣ Filter results (WHERE)
Pro Tips: Think Like the Database
💡 Golden Rules
✅ Do This
- Start simple, add complexity gradually
- Test with small data first
- Use CTEs for readability
- Name things clearly (not t1, t2)
- Check for NULLs explicitly
❌ Avoid This
- Deeply nested subqueries (>3 levels)
- SELECT * in production
- Forgetting GROUP BY with aggregates
- Not handling division by zero
- Ignoring index opportunities
🎯 Remember: Complex queries are just simple queries combined. Master the patterns, and you'll solve any problem!