CTEs: Common Table Expressions

Core Concept

Goal: Find all users who listen to Rock genre songs.

CTEs: Transform Nested Chaos into Modular Function-like Blocks Nested Subqueries Hard to Read, Debug, Maintain SELECT name FROM Users WHERE user_id IN (...) SELECT user_id FROM Listens WHERE song_id IN (...) SELECT song_id FROM Songs WHERE genre = 'Rock' Problems: • Read inside-out • Hard to debug steps ? ? ? CTE Approach Linear, Clear, Maintainable WITH RockSongs Step 1: Find Rock songs SELECT song_id FROM Songs WHERE genre = 'Rock' → {3,4,5,9} RockListeners Step 2: Find users who listened SELECT user_id FROM Listens WHERE song_id IN RockSongs → {1,2,3} Final Query Step 3: Get user names SELECT name FROM Users WHERE user_id IN RockListeners → {Mickey, Minnie, Daffy} Benefits: ✓ Each step has a name ✓ Easy to debug ✓ Self-documenting ✓ Reuse CTEs ✓ Test intermediate results Transform

Why CTEs?

CTE vs Subquery Comparison

Subquery Approach (Often Hard to Read)

-- Find users who listen to Rock genre songs (read inside-out)
SELECT name FROM Users
WHERE user_id IN (
    -- Step 2 (middle): Find users who listened to Rock songs
    SELECT user_id FROM Listens
    WHERE song_id IN (
        -- Step 1 (innermost): Find all Rock songs first
        SELECT song_id FROM Songs
        WHERE genre = 'Rock'
    )
)
-- Step 3 (outer): Get names of those users

CTE Approach (Linear and Clear)

-- Find users who listen to Rock genre songs (step-by-step clarity)
WITH RockSongs AS (
    -- Step 1: Find all Rock genre songs
    SELECT song_id FROM Songs
    WHERE genre = 'Rock'
),
RockListeners AS (
    -- Step 2: Find users who listened to those songs
    SELECT DISTINCT user_id FROM Listens
    WHERE song_id IN (SELECT song_id FROM RockSongs)
)
-- Step 3: Get names of those users
SELECT name FROM Users
WHERE user_id IN (SELECT user_id FROM RockListeners)

Simple CTE: HighRatedSongs

-- Find songs with average rating above 4.0
┌─────────────────────────────────────────────────────┐
│ WITH HighRatedSongs AS (                            │
│     -- Calculate average rating per song            │SELECT song_id, AVG(rating) AS avg_rating       │
│     FROM Listens                                    │
│     GROUP BY song_id                                │
│     HAVING AVG(rating) > 4.0                        │
│ )                                                   │
└─────────────────────────────────────────────────────┘
-- Join with Songs table to get titles
SELECT s.title, h.avg_rating
FROM HighRatedSongs h                -- Use CTE
JOIN Songs s ON h.song_id = s.song_id

HighRatedSongs CTE

song_idavg_ratingNotes
24.6Song 2: (4.2 + 4.7 + 4.9) / 3 = 4.6 > 4.0
74.6Song 7: Single rating 4.6 > 4.0

Final Output

titleavg_ratingNotes
Willow4.6From HighRatedSongs CTE
Yellow Submarine4.6From HighRatedSongs CTE

Multiple CTEs: User Statistics

-- Calculate multiple statistics per user
┌─────────────────────────────────────────────────────┐
│ WITH UserListenCounts AS (                          │
│     -- Count total listens per user                 │SELECT user_id, COUNT(*) AS listen_count        │
│     FROM Listens                                    │
│     GROUP BY user_id                                │
│ ),                                                  │
│ UserHighRatings AS (                                │
│     -- Average of only high ratings (4+) per user   │SELECT user_id,                                 │
│            AVG(rating) AS high_avg,                 │
│            COUNT(*) AS high_count                   │
│     FROM Listens                                    │
│     WHERE rating >= 4.0  -- Only consider 4+ stars │GROUP BY user_id                                │
│ )                                                   │
└─────────────────────────────────────────────────────┘
-- Combine both CTEs with Users table
SELECT u.name, 
       lc.listen_count,
       hr.high_avg,
       hr.high_count
FROM Users u
LEFT JOIN UserListenCounts lc ON u.user_id = lc.user_id
LEFT JOIN UserHighRatings hr ON u.user_id = hr.user_id

Note: These could be combined using CASE statements (e.g., COUNT(CASE WHEN rating >= 4.0 THEN 1 END)), but separating them is clearer: each CTE has one purpose, filtering logic is obvious, and it demonstrates how multiple CTEs work together.

UserListenCounts CTE

user_idlisten_countNotes
13Mickey has 3 listens
23Minnie has 3 listens
33Daffy has 3 listens

UserHighRatings CTE

user_idhigh_avghigh_countNotes
14.352Mickey: (4.5 + 4.2) / 2
24.652Minnie: (4.7 + 4.6) / 2
34.91Daffy: Only 4.9 >= 4.0

Final Output

namelisten_counthigh_avghigh_count
Mickey34.352
Minnie34.652
Daffy34.91
PlutoNULLNULLNULL

Common Patterns