CTEs: Common Table Expressions

Concept. A CTE (WITH name AS (...)) names an intermediate query result so the rest of the query can reference it like a table.

Intuition. Pre-compute each user's average rating in a CTE called UserAvgs, then JOIN it back to Listens to find rows above each user's own average. The CTE is a named intermediate table — no parenthesis nesting.

Example. Transform deeply nested subqueries into discrete, sequentially readable blocks using the WITH syntax.

Goal: Identify users who listen to Rock genre songs.

CTEs: Transform Nested Chaos into Modular Function-like Blocks Nested Subqueries Hard to Read, Debug, Maintain SELECT name FROM Users WHERE user_id IN (...) SELECT user_id FROM Listens WHERE song_id IN (...) SELECT song_id FROM Songs WHERE genre = 'Rock' Problems: • Read inside-out • Hard to debug steps ? ? ? CTE Approach Linear, Clear, Maintainable WITH RockSongs Step 1: Find Rock songs SELECT song_id FROM Songs WHERE genre = 'Rock' → {3,4,5,9} RockListeners Step 2: Find users who listened SELECT user_id FROM Listens WHERE song_id IN RockSongs → {1,2,3} Final Query Step 3: Get user names SELECT name FROM Users WHERE user_id IN RockListeners → {Mickey, Minnie, Daffy} Benefits: ✓ Each step has a name ✓ Easy to debug ✓ Self-documenting ✓ Reuse CTEs ✓ Test intermediate results Transform

Why CTEs?

  • Comparable to functions in Python/Java - Named, reusable computations isolated from the primary instruction.

  • Linear execution flow - Top-to-bottom parser logic.

  • Debug intermediate steps - test each CTE on its own.

  • Reusable - reference the same CTE multiple times.

  • Recursion support for hierarchical data (a unique feature of CTEs).

CTE vs Subquery Comparison

Subquery Approach (Often Hard to Read)

-- Find users who listen to Rock genre songs (read inside-out)
SELECT name FROM Users
WHERE user_id IN (
    -- Step 2 (middle): Find users who listened to Rock songs
    SELECT user_id FROM Listens
    WHERE song_id IN (
        -- Step 1 (innermost): Find all Rock songs first
        SELECT song_id FROM Songs
        WHERE genre = 'Rock'
    )
)
-- Step 3 (outer): Get names of those users

CTE Approach (Linear and Clear)

-- Find users who listen to Rock genre songs (step-by-step clarity)
WITH RockSongs AS (
    -- Step 1: Find all Rock genre songs
    SELECT song_id FROM Songs
    WHERE genre = 'Rock'
),
RockListeners AS (
    -- Step 2: Find users who listened to those songs
    SELECT DISTINCT user_id FROM Listens
    WHERE song_id IN (SELECT song_id FROM RockSongs)
)
-- Step 3: Get names of those users
SELECT name FROM Users
WHERE user_id IN (SELECT user_id FROM RockListeners)

Simple CTE: HighRatedSongs

-- Find songs with average rating above 4.0
┌─────────────────────────────────────────────────────┐
│ WITH HighRatedSongs AS (                            │
│     -- Calculate average rating per song            │SELECT song_id, AVG(rating) AS avg_rating       │
│     FROM Listens                                    │
│     GROUP BY song_id                                │
│     HAVING AVG(rating) > 4.0                        │
│ )                                                   │
└─────────────────────────────────────────────────────┘
-- Join with Songs table to get titles
SELECT s.title, h.avg_rating
FROM HighRatedSongs h                -- Use CTE
JOIN Songs s ON h.song_id = s.song_id

HighRatedSongs CTE

song_idavg_ratingNotes
24.6Song 2: (4.2 + 4.7 + 4.9) / 3 = 4.6 > 4.0
74.6Song 7: Single rating 4.6 > 4.0

Final Output

titleavg_ratingNotes
Willow4.6From HighRatedSongs CTE
Yellow Submarine4.6From HighRatedSongs CTE
  • CTE calculates averages first, then filters.

  • Only songs 2 and 7 pass the >4.0 threshold.

  • Main query joins CTE with Songs for titles.


Multiple CTEs: User Statistics

-- Calculate multiple statistics per user
┌─────────────────────────────────────────────────────┐
│ WITH UserListenCounts AS (                          │
│     -- Count total listens per user                 │SELECT user_id, COUNT(*) AS listen_count        │
│     FROM Listens                                    │
│     GROUP BY user_id                                │
│ ),                                                  │
│ UserHighRatings AS (                                │
│     -- Average of only high ratings (4+) per user   │SELECT user_id,                                 │
│            AVG(rating) AS high_avg,                 │
│            COUNT(*) AS high_count                   │
│     FROM Listens                                    │
│     WHERE rating >= 4.0  -- Only consider 4+ stars │GROUP BY user_id                                │
│ )                                                   │
└─────────────────────────────────────────────────────┘
-- Combine both CTEs with Users table
SELECT u.name, 
       lc.listen_count,
       hr.high_avg,
       hr.high_count
FROM Users u
LEFT JOIN UserListenCounts lc ON u.user_id = lc.user_id
LEFT JOIN UserHighRatings hr ON u.user_id = hr.user_id

Note: These could be combined using CASE statements (e.g., COUNT(CASE WHEN rating >= 4.0 THEN 1 END)), but separating them is clearer: each CTE has one purpose, filtering logic is obvious, and it demonstrates how multiple CTEs work together.

UserListenCounts CTE

user_idlisten_countNotes
13Mickey has 3 listens
23Minnie has 3 listens
33Daffy has 3 listens

UserHighRatings CTE

user_idhigh_avghigh_countNotes
14.352Mickey: (4.5 + 4.2) / 2
24.652Minnie: (4.7 + 4.6) / 2
34.91Daffy: Only 4.9 >= 4.0

Final Output

namelisten_counthigh_avghigh_count
Mickey34.352
Minnie34.652
Daffy34.91
PlutoNULLNULLNULL
  • Two CTEs with different purposes: all listens vs high ratings only.

  • UserHighRatings filters to ratings >= 4.0 before aggregating.

  • Main query combines both CTEs with LEFT JOINs.

  • Pluto gets NULL values (no listens to count).


Common Patterns

  • CTEs are computed once and reused (unlike subqueries).

  • Join conditions on CTE results should be indexed in source tables.

  • Complex CTEs may be materialized for better performance.