Window Functions: Calculations Without Collapsing

Core Concept

Window Functions are GROUP BY's more powerful sibling.

GROUP BY user_id Collapses 9 rows → 3 groups 9 Listen Records Mickey, song_id=1 Mickey, song_id=2 Mickey, song_id=6 Minnie, song_id=2 Minnie, song_id=7 Minnie, song_id=8 Daffy, song_id=1 Daffy, song_id=2 Daffy, song_id=6 3 Summary Groups Mickey (user_id=1) COUNT(*) = 3, AVG = 4.2 Minnie (user_id=2) COUNT(*) = 3, AVG = 4.4 Daffy (user_id=3) COUNT(*) = 3, AVG = 3.9 SELECT user_id, COUNT(*) as song_count, AVG(rating) as avg_rating FROM Listens GROUP BY user_id
PARTITION BY user_id Keeps all 9 rows + adds rank 9 Listen Records Mickey, song_id=1 Mickey, song_id=2 Mickey, song_id=6 Minnie, song_id=2 Minnie, song_id=7 Minnie, song_id=8 Daffy, song_id=1 Daffy, song_id=2 Daffy, song_id=6 9 Rows + Rank Column Mickey, song_id=1 1 Mickey, song_id=2 2 Mickey, song_id=6 3 Minnie, song_id=2 1 Minnie, song_id=7 2 Minnie, song_id=8 3 Daffy, song_id=1 1 Daffy, song_id=2 2 Daffy, song_id=6 3 RANK SELECT *, RANK() OVER ( PARTITION BY user_id ORDER BY rating DESC ) as rank FROM Listens

Key Rules

  1. No Row Reduction: Unlike GROUP BY, all rows survive

  2. PARTITION BY: Defines groups (optional - omit for whole table)

  3. ORDER BY: Sets sequence within partitions (required for some functions)

  4. NULL Handling: NULLs group together in PARTITION BY, sort first/last in ORDER BY


GROUP BY vs PARTITION BY: The Key Difference

GROUP BY: Collapses Rows

-- Calculate average rating per user (9 rows → 3 rows)
SELECT user_id, AVG(rating) AS avg_rating
FROM Listens
GROUP BY user_id  -- One row per group
user_idavg_ratingNotes
14.2AVG(4.5, 4.2, 3.9) = 12.6/3
24.4AVG(4.7, 4.6, 3.9) = 13.2/3
33.9AVG(2.9, 4.9, NULL) = 7.8/2 (NULL ignored)

Result: 9 rows → 3 rows

PARTITION BY: Keep All Rows

-- Add average rating per user to every row (9 rows → 9 rows)
SELECT user_id, song_id, rating,
       AVG(rating) OVER (     -- OVER makes it a window function
        PARTITION BY user_id  -- Calculations and rankings within user_id partitions
       ) AS user_avg          -- All rows kept
FROM Listens
user_idsong_idratinguser_avgNotes
114.54.2Partition 1: AVG(4.5, 4.2, 3.9)
124.24.2Partition 1: AVG(4.5, 4.2, 3.9)
163.94.2Partition 1: AVG(4.5, 4.2, 3.9)
224.74.4Partition 2: AVG(4.7, 4.6, 3.9)
274.64.4Partition 2: AVG(4.7, 4.6, 3.9)
283.94.4Partition 2: AVG(4.7, 4.6, 3.9)
312.93.9Partition 3: AVG(2.9, 4.9, NULL) = 3.9
324.93.9Partition 3: AVG(2.9, 4.9, NULL) = 3.9
36NULL3.9Partition 3: AVG ignores NULL

Result: 9 rows → 9 rows (all kept)


Ranking Functions: ROW_NUMBER, RANK, DENSE_RANK

-- Rank each user's songs by rating (three different ranking methods)
-- Note: In some DBs, NULLs sort last with ORDER BY rating DESC
--       Be explicit like below, if your app wants to define ordering
SELECT user_id, song_id, rating,
       ROW_NUMBER() OVER (
           PARTITION BY user_id    -- Separate ranking per user
           ORDER BY rating DESC    -- Highest rating gets rank 1
                    NULLS LAST     -- Force NULLs to appear last in ranking
       ) AS row_num,               -- Always unique: 1,2,3
       RANK() OVER (
           PARTITION BY user_id    -- Same partitioning as above
           ORDER BY rating DESC    -- Same ordering
                                   -- Recommend adding NULLS LAST
       ) AS RANK,                  -- Allows ties: 1,2,2,4
       DENSE_RANK() OVER (
           PARTITION BY user_id    -- Same partitioning as above
           ORDER BY rating DESC    -- Same ordering
                                   -- Recommend adding NULLS LAST
       ) AS DENSE_RANK             -- Ties without gaps: 1,2,2,3
FROM Listens

Step 1: Partition by user_id

user_idsong_idratingNotes
114.5Partition 1
124.2Partition 1
163.9Partition 1
224.7Partition 2
274.6Partition 2
283.9Partition 2
312.9Partition 3
324.9Partition 3
36NULLPartition 3

Step 2: Order by rating DESC within each partition

user_idsong_idratingNotes
114.5Highest in partition 1
124.2Second in partition 1
163.9Third in partition 1
224.7Highest in partition 2
274.6Second in partition 2
283.9Third in partition 2
324.9Highest in partition 3
312.9Second in partition 3
36NULLNULL sorts last with DESC

Step 3: Apply ranking functions

user_idsong_idratingrow_numrankdense_rankNotes
114.5111Highest
124.2222Second
163.9333Third
224.7111Highest
274.6222Second
283.9333Third
324.9111Highest
312.9222Second
36NULL333NULL gets rank 3

Key Differences:


Common Patterns

Common Mistakes

The Missing PARTITION BY Bug

-- WRONG: Creates global ranking across all users
SELECT user_id, rating,
       RANK() OVER (
           ORDER BY rating DESC  -- No PARTITION BY = one big group!
       ) AS RANK
FROM Listens

-- CORRECT: Rank songs within each user's listens
SELECT user_id, rating,
       RANK() OVER (
           PARTITION BY user_id  -- Separate ranking per user
           ORDER BY rating DESC
       ) AS RANK
FROM Listens

Problem: Without PARTITION BY, you get global ranking

Wrong ORDER BY Direction

-- WRONG: ASC gives rank 1 to lowest rating (2.9 would be #1)
RANK() OVER (ORDER BY rating ASC)

-- CORRECT: DESC gives rank 1 to highest rating (4.9 would be #1)
RANK() OVER (ORDER BY rating DESC)

Problem: ASC vs DESC changes ranking meaning

Forgetting NULLs in ORDER BY

-- PROBLEM: NULL position varies by database system
RANK() OVER (ORDER BY rating DESC)  -- Where do NULLs go?

-- EXPLICIT: Force NULLs to appear last in ranking
RANK() OVER (ORDER BY rating DESC NULLS LAST)  -- NULLs get worst rank

Problem: NULL sorting varies by database (PostgreSQL: NULLs first, MySQL: NULLs last)