Window Functions: Calculations Without Collapsing
Core Concept
Window Functions are GROUP BY's more powerful sibling, designed for those who want to keep all their data intact while still performing complex calculations. Imagine ranking songs per user without losing any rows. Using RANK() OVER PARTITION BY user_id, you keep all nine rows and simply add ranks 1 through 3.
GROUP BY vs PARTITION BY: The Key Difference
GROUP BY: Collapses Rows
GROUP BY is the hammer that turns your detailed data into summarized chunks. It’s efficient, but it also means you lose granularity. Here’s what happens:
| user_id | avg_rating | Notes | |
|---|---|---|---|
| 1 | 4.2 | AVG(4.5, 4.2, 3.9) = 12.6/3 | |
| 2 | 4.4 | AVG(4.7, 4.6, 3.9) = 13.2/3 | |
| 3 | 3.9 | AVG(2.9, 4.9, NULL) = 7.8/2 (NULL ignored) |
Result: 9 rows → 3 rows
PARTITION BY: Keep All Rows
PARTITION BY is the scalpel that lets you slice your data into partitions while keeping every detail intact. You get the best of both worlds: detailed data with the power of aggregation.
| user_id | song_id | rating | user_avg | Notes | |
|---|---|---|---|---|---|
| 1 | 1 | 4.5 | 4.2 | Partition 1: AVG(4.5, 4.2, 3.9) | |
| 1 | 2 | 4.2 | 4.2 | Partition 1: AVG(4.5, 4.2, 3.9) | |
| 1 | 6 | 3.9 | 4.2 | Partition 1: AVG(4.5, 4.2, 3.9) | |
| 2 | 2 | 4.7 | 4.4 | Partition 2: AVG(4.7, 4.6, 3.9) | |
| 2 | 7 | 4.6 | 4.4 | Partition 2: AVG(4.7, 4.6, 3.9) | |
| 2 | 8 | 3.9 | 4.4 | Partition 2: AVG(4.7, 4.6, 3.9) | |
| 3 | 1 | 2.9 | 3.9 | Partition 3: AVG(2.9, 4.9, NULL) = 3.9 | |
| 3 | 2 | 4.9 | 3.9 | Partition 3: AVG(2.9, 4.9, NULL) = 3.9 | |
| 3 | 6 | NULL | 3.9 | Partition 3: AVG ignores NULL |
Result: 9 rows → 9 rows (all kept)
Ranking Functions: ROW_NUMBER, RANK, DENSE_RANK
Window functions let you rank data without collapsing it. Here's how it works:
Step 1: Partition by user_id
| user_id | song_id | rating | Notes | |
|---|---|---|---|---|
| 1 | 1 | 4.5 | Partition 1 | |
| 1 | 2 | 4.2 | Partition 1 | |
| 1 | 6 | 3.9 | Partition 1 | |
| 2 | 2 | 4.7 | Partition 2 | |
| 2 | 7 | 4.6 | Partition 2 | |
| 2 | 8 | 3.9 | Partition 2 | |
| 3 | 1 | 2.9 | Partition 3 | |
| 3 | 2 | 4.9 | Partition 3 | |
| 3 | 6 | NULL | Partition 3 |
Step 2: Order by rating DESC within each partition
| user_id | song_id | rating | Notes | |
|---|---|---|---|---|
| 1 | 1 | 4.5 | Highest in partition 1 | |
| 1 | 2 | 4.2 | Second in partition 1 | |
| 1 | 6 | 3.9 | Third in partition 1 | |
| 2 | 2 | 4.7 | Highest in partition 2 | |
| 2 | 7 | 4.6 | Second in partition 2 | |
| 2 | 8 | 3.9 | Third in partition 2 | |
| 3 | 2 | 4.9 | Highest in partition 3 | |
| 3 | 1 | 2.9 | Second in partition 3 | |
| 3 | 6 | NULL | NULL sorts last with DESC |
Step 3: Apply ranking functions
| user_id | song_id | rating | row_num | rank | dense_rank | Notes | |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 4.5 | 1 | 1 | 1 | Highest | |
| 1 | 2 | 4.2 | 2 | 2 | 2 | Second | |
| 1 | 6 | 3.9 | 3 | 3 | 3 | Third | |
| 2 | 2 | 4.7 | 1 | 1 | 1 | Highest | |
| 2 | 7 | 4.6 | 2 | 2 | 2 | Second | |
| 2 | 8 | 3.9 | 3 | 3 | 3 | Third | |
| 3 | 2 | 4.9 | 1 | 1 | 1 | Highest | |
| 3 | 1 | 2.9 | 2 | 2 | 2 | Second | |
| 3 | 6 | NULL | 3 | 3 | 3 | NULL gets rank 3 |
Key Differences:
-
ROW_NUMBER: Always unique (1, 2, 3, 4)
-
RANK: Ties share, gaps after (1, 2, 2, 4)
-
DENSE_RANK: Ties share, no gaps (1, 2, 2, 3)
Key Rules
-
No Row Reduction: Unlike GROUP BY, all rows survive.
-
PARTITION BY: Defines groups (optional - omit for whole table).
-
ORDER BY: Sets sequence within partitions (required for some functions).
-
NULL Handling: NULLs group together in PARTITION BY, sort first/last in ORDER BY.
Common Patterns
-
Ranking: RANK(), ROW_NUMBER(), DENSE_RANK()
-
Running Totals: SUM() OVER (ORDER BY...)
-
Moving Averages: AVG() OVER (ROWS BETWEEN...)
-
Lead/Lag: Compare to previous/next rows
-
Percentiles: NTILE(), PERCENT_RANK()
Common Mistakes
The Missing PARTITION BY Bug
Without PARTITION BY, you get a global ranking, which might not be what you intended.
Wrong ORDER BY Direction
ASC vs DESC changes the ranking meaning, so double-check your order.
Forgetting NULLs in ORDER BY
NULL sorting varies by database (PostgreSQL: NULLs first, MySQL: NULLs last), which can lead to unexpected results.