CTEs: Common Table Expressions
Core Concept
- Example: Break complex queries into named steps (WITH HighRatedSongs AS ... transforms nested chaos into readable flow)
Goal: Find all users who listen to Rock genre songs.
Why CTEs?
-
Like functions in python/java - named, reusable computations within a query
-
Linear readability - read top-to-bottom instead of inside-out nesting
-
Debug intermediate steps - test each CTE independently
-
Reusable - reference the same CTE multiple times
-
Recursion support for hierarchical data (unique to CTEs)
CTE vs Subquery Comparison
Subquery Approach (Often Hard to Read)
-- Find users who listen to Rock genre songs (read inside-out)
SELECT name FROM Users
WHERE user_id IN (
-- Step 2 (middle): Find users who listened to Rock songs
SELECT user_id FROM Listens
WHERE song_id IN (
-- Step 1 (innermost): Find all Rock songs first
SELECT song_id FROM Songs
WHERE genre = 'Rock'
)
)
-- Step 3 (outer): Get names of those users
CTE Approach (Linear and Clear)
-- Find users who listen to Rock genre songs (step-by-step clarity)
WITH RockSongs AS (
-- Step 1: Find all Rock genre songs
SELECT song_id FROM Songs
WHERE genre = 'Rock'
),
RockListeners AS (
-- Step 2: Find users who listened to those songs
SELECT DISTINCT user_id FROM Listens
WHERE song_id IN (SELECT song_id FROM RockSongs)
)
-- Step 3: Get names of those users
SELECT name FROM Users
WHERE user_id IN (SELECT user_id FROM RockListeners)
Simple CTE: HighRatedSongs
-- Find songs with average rating above 4.0
┌─────────────────────────────────────────────────────┐
│ WITH HighRatedSongs AS ( │
│ -- Calculate average rating per song │
│ SELECT song_id, AVG(rating) AS avg_rating │
│ FROM Listens │
│ GROUP BY song_id │
│ HAVING AVG(rating) > 4.0 │
│ ) │
└─────────────────────────────────────────────────────┘
-- Join with Songs table to get titles
SELECT s.title, h.avg_rating
FROM HighRatedSongs h -- Use CTE
JOIN Songs s ON h.song_id = s.song_id
HighRatedSongs CTE
song_id | avg_rating | Notes |
---|---|---|
2 | 4.6 | Song 2: (4.2 + 4.7 + 4.9) / 3 = 4.6 > 4.0 |
7 | 4.6 | Song 7: Single rating 4.6 > 4.0 |
Final Output
title | avg_rating | Notes |
---|---|---|
Willow | 4.6 | From HighRatedSongs CTE |
Yellow Submarine | 4.6 | From HighRatedSongs CTE |
-
CTE calculates averages first, then filters
-
Only songs 2 and 7 pass the >4.0 threshold
-
Main query joins CTE with Songs for titles
Multiple CTEs: User Statistics
-- Calculate multiple statistics per user
┌─────────────────────────────────────────────────────┐
│ WITH UserListenCounts AS ( │
│ -- Count total listens per user │
│ SELECT user_id, COUNT(*) AS listen_count │
│ FROM Listens │
│ GROUP BY user_id │
│ ), │
│ UserHighRatings AS ( │
│ -- Average of only high ratings (4+) per user │
│ SELECT user_id, │
│ AVG(rating) AS high_avg, │
│ COUNT(*) AS high_count │
│ FROM Listens │
│ WHERE rating >= 4.0 -- Only consider 4+ stars │
│ GROUP BY user_id │
│ ) │
└─────────────────────────────────────────────────────┘
-- Combine both CTEs with Users table
SELECT u.name,
lc.listen_count,
hr.high_avg,
hr.high_count
FROM Users u
LEFT JOIN UserListenCounts lc ON u.user_id = lc.user_id
LEFT JOIN UserHighRatings hr ON u.user_id = hr.user_id
Note: These could be combined using CASE statements (e.g., COUNT(CASE WHEN rating >= 4.0 THEN 1 END)
), but separating them is clearer: each CTE has one purpose, filtering logic is obvious, and it demonstrates how multiple CTEs work together.
UserListenCounts CTE
user_id | listen_count | Notes | |
---|---|---|---|
1 | 3 | Mickey has 3 listens | |
2 | 3 | Minnie has 3 listens | |
3 | 3 | Daffy has 3 listens |
UserHighRatings CTE
user_id | high_avg | high_count | Notes | |
---|---|---|---|---|
1 | 4.35 | 2 | Mickey: (4.5 + 4.2) / 2 | |
2 | 4.65 | 2 | Minnie: (4.7 + 4.6) / 2 | |
3 | 4.9 | 1 | Daffy: Only 4.9 >= 4.0 |
Final Output
name | listen_count | high_avg | high_count | |
---|---|---|---|---|
Mickey | 3 | 4.35 | 2 | |
Minnie | 3 | 4.65 | 2 | |
Daffy | 3 | 4.9 | 1 | |
Pluto | NULL | NULL | NULL |
-
Two CTEs with different purposes: all listens vs high ratings only
-
UserHighRatings filters to ratings >= 4.0 before aggregating
-
Main query combines both CTEs with LEFT JOINs
-
Pluto gets NULL values (no listens to count)
Common Patterns
-
CTEs are computed once and reused (unlike subqueries)
-
Join conditions on CTE results should be indexed in source tables
-
Complex CTEs may be materialized for better performance