Window Functions: Calculations Without Collapsing
Core Concept
- Example: Rank songs per user without losing any rows (RANK() OVER PARTITION BY user_id keeps all 9 rows, adds rank 1-3)
Window Functions are GROUP BY's more powerful sibling.
Key Rules
-
No Row Reduction: Unlike GROUP BY, all rows survive
-
PARTITION BY: Defines groups (optional - omit for whole table)
-
ORDER BY: Sets sequence within partitions (required for some functions)
-
NULL Handling: NULLs group together in PARTITION BY, sort first/last in ORDER BY
GROUP BY vs PARTITION BY: The Key Difference
GROUP BY: Collapses Rows
-- Calculate average rating per user (9 rows → 3 rows)
SELECT user_id, AVG(rating) AS avg_rating
FROM Listens
GROUP BY user_id -- One row per group
user_id | avg_rating | Notes | |
---|---|---|---|
1 | 4.2 | AVG(4.5, 4.2, 3.9) = 12.6/3 | |
2 | 4.4 | AVG(4.7, 4.6, 3.9) = 13.2/3 | |
3 | 3.9 | AVG(2.9, 4.9, NULL) = 7.8/2 (NULL ignored) |
Result: 9 rows → 3 rows
PARTITION BY: Keep All Rows
-- Add average rating per user to every row (9 rows → 9 rows)
SELECT user_id, song_id, rating,
AVG(rating) OVER ( -- OVER makes it a window function
PARTITION BY user_id -- Calculations and rankings within user_id partitions
) AS user_avg -- All rows kept
FROM Listens
user_id | song_id | rating | user_avg | Notes | |
---|---|---|---|---|---|
1 | 1 | 4.5 | 4.2 | Partition 1: AVG(4.5, 4.2, 3.9) | |
1 | 2 | 4.2 | 4.2 | Partition 1: AVG(4.5, 4.2, 3.9) | |
1 | 6 | 3.9 | 4.2 | Partition 1: AVG(4.5, 4.2, 3.9) | |
2 | 2 | 4.7 | 4.4 | Partition 2: AVG(4.7, 4.6, 3.9) | |
2 | 7 | 4.6 | 4.4 | Partition 2: AVG(4.7, 4.6, 3.9) | |
2 | 8 | 3.9 | 4.4 | Partition 2: AVG(4.7, 4.6, 3.9) | |
3 | 1 | 2.9 | 3.9 | Partition 3: AVG(2.9, 4.9, NULL) = 3.9 | |
3 | 2 | 4.9 | 3.9 | Partition 3: AVG(2.9, 4.9, NULL) = 3.9 | |
3 | 6 | NULL | 3.9 | Partition 3: AVG ignores NULL |
Result: 9 rows → 9 rows (all kept)
Ranking Functions: ROW_NUMBER, RANK, DENSE_RANK
-- Rank each user's songs by rating (three different ranking methods)
-- Note: In some DBs, NULLs sort last with ORDER BY rating DESC
-- Be explicit like below, if your app wants to define ordering
SELECT user_id, song_id, rating,
ROW_NUMBER() OVER (
PARTITION BY user_id -- Separate ranking per user
ORDER BY rating DESC -- Highest rating gets rank 1
NULLS LAST -- Force NULLs to appear last in ranking
) AS row_num, -- Always unique: 1,2,3
RANK() OVER (
PARTITION BY user_id -- Same partitioning as above
ORDER BY rating DESC -- Same ordering
-- Recommend adding NULLS LAST
) AS RANK, -- Allows ties: 1,2,2,4
DENSE_RANK() OVER (
PARTITION BY user_id -- Same partitioning as above
ORDER BY rating DESC -- Same ordering
-- Recommend adding NULLS LAST
) AS DENSE_RANK -- Ties without gaps: 1,2,2,3
FROM Listens
Step 1: Partition by user_id
user_id | song_id | rating | Notes | |
---|---|---|---|---|
1 | 1 | 4.5 | Partition 1 | |
1 | 2 | 4.2 | Partition 1 | |
1 | 6 | 3.9 | Partition 1 | |
2 | 2 | 4.7 | Partition 2 | |
2 | 7 | 4.6 | Partition 2 | |
2 | 8 | 3.9 | Partition 2 | |
3 | 1 | 2.9 | Partition 3 | |
3 | 2 | 4.9 | Partition 3 | |
3 | 6 | NULL | Partition 3 |
Step 2: Order by rating DESC within each partition
user_id | song_id | rating | Notes | |
---|---|---|---|---|
1 | 1 | 4.5 | Highest in partition 1 | |
1 | 2 | 4.2 | Second in partition 1 | |
1 | 6 | 3.9 | Third in partition 1 | |
2 | 2 | 4.7 | Highest in partition 2 | |
2 | 7 | 4.6 | Second in partition 2 | |
2 | 8 | 3.9 | Third in partition 2 | |
3 | 2 | 4.9 | Highest in partition 3 | |
3 | 1 | 2.9 | Second in partition 3 | |
3 | 6 | NULL | NULL sorts last with DESC |
Step 3: Apply ranking functions
user_id | song_id | rating | row_num | rank | dense_rank | Notes | |
---|---|---|---|---|---|---|---|
1 | 1 | 4.5 | 1 | 1 | 1 | Highest | |
1 | 2 | 4.2 | 2 | 2 | 2 | Second | |
1 | 6 | 3.9 | 3 | 3 | 3 | Third | |
2 | 2 | 4.7 | 1 | 1 | 1 | Highest | |
2 | 7 | 4.6 | 2 | 2 | 2 | Second | |
2 | 8 | 3.9 | 3 | 3 | 3 | Third | |
3 | 2 | 4.9 | 1 | 1 | 1 | Highest | |
3 | 1 | 2.9 | 2 | 2 | 2 | Second | |
3 | 6 | NULL | 3 | 3 | 3 | NULL gets rank 3 |
Key Differences:
-
ROW_NUMBER: Always unique (1, 2, 3, 4)
-
RANK: Ties share, gaps after (1, 2, 2, 4)
-
DENSE_RANK: Ties share, no gaps (1, 2, 2, 3)
Common Patterns
-
Ranking: RANK(), ROW_NUMBER(), DENSE_RANK()
-
Running Totals: SUM() OVER (ORDER BY...)
-
Moving Averages: AVG() OVER (ROWS BETWEEN...)
-
Lead/Lag: Compare to previous/next rows
-
Percentiles: NTILE(), PERCENT_RANK()
Common Mistakes
The Missing PARTITION BY Bug
-- WRONG: Creates global ranking across all users
SELECT user_id, rating,
RANK() OVER (
ORDER BY rating DESC -- No PARTITION BY = one big group!
) AS RANK
FROM Listens
-- CORRECT: Rank songs within each user's listens
SELECT user_id, rating,
RANK() OVER (
PARTITION BY user_id -- Separate ranking per user
ORDER BY rating DESC
) AS RANK
FROM Listens
Problem: Without PARTITION BY, you get global ranking
Wrong ORDER BY Direction
-- WRONG: ASC gives rank 1 to lowest rating (2.9 would be #1)
RANK() OVER (ORDER BY rating ASC)
-- CORRECT: DESC gives rank 1 to highest rating (4.9 would be #1)
RANK() OVER (ORDER BY rating DESC)
Problem: ASC vs DESC changes ranking meaning
Forgetting NULLs in ORDER BY
-- PROBLEM: NULL position varies by database system
RANK() OVER (ORDER BY rating DESC) -- Where do NULLs go?
-- EXPLICIT: Force NULLs to appear last in ranking
RANK() OVER (ORDER BY rating DESC NULLS LAST) -- NULLs get worst rank
Problem: NULL sorting varies by database (PostgreSQL: NULLs first, MySQL: NULLs last)