Window Functions: Calculations Without Collapsing

Concept. Window functions compute aggregates across a partition defined by OVER, returning one result per input row instead of collapsing them like GROUP BY. The OVER PARTITION BY syntax adds derived metrics while preserving row granularity.

Intuition. AVG(rating) OVER (PARTITION BY user_id) computes Mickey's average alongside every one of Mickey's listen rows without collapsing them. The Listens table comes back at 9 rows, plus one new column.

GROUP BY vs PARTITION BY: The Key Difference

GROUP BY collapses rows; PARTITION BY keeps every row Two neutral panels over the same nine Listens rows: three Mickey (blue) 4.5, 4.2, 3.9; three Minnie (orange) 4.7, 4.6, 3.9; three Daffy (purple) 2.9, 4.9, NULL. Left, GROUP BY: the nine rows collapse into three summary rows, one average per user (Mickey 4.2, Minnie 4.4, Daffy 3.9). Right, PARTITION BY: all nine rows are kept, each gaining a user_avg column with that user's average. Color key: blue is Mickey, orange is Minnie, purple is Daffy. GROUP BY collapses; a window keeps every row Same per-user average. GROUP BY returns three rows; PARTITION BY keeps all nine and adds a column. GROUP BY · collapses 9 listen rows 3 summary rows Mickey  4.5 Mickey  4.2 Mickey  3.9 Minnie  4.7 Minnie  4.6 Minnie  3.9 Daffy  2.9 Daffy  4.9 Daffy  NULL GROUP BY user_id Mickey  avg 4.2 Minnie  avg 4.4 Daffy  avg 3.9 9 rows → 3 rows PARTITION BY · keeps every row all 9 rows kept, each gains a user_avg column Mickey  4.5user_avg 4.2 Mickey  4.2user_avg 4.2 Mickey  3.9user_avg 4.2 Minnie  4.7user_avg 4.4 Minnie  4.6user_avg 4.4 Minnie  3.9user_avg 4.4 Daffy  2.9user_avg 3.9 Daffy  4.9user_avg 3.9 Daffy  NULLuser_avg 3.9 9 rows → 9 rows, plus a column Color key  Mickey  ·  Minnie  ·  Daffy identity bars

Figure 1. The same per-user average, two ways. GROUP BY collapses the nine listen rows into one summary row per user (Mickey 4.2, Minnie 4.4, Daffy 3.9), losing the detail. A window function, AVG(rating) OVER (PARTITION BY user_id), keeps all nine rows and attaches each user's average as a new column. Same math, but the window preserves row granularity.

GROUP BY: Collapses Rows

GROUP BY collapses detailed entries into aggregate summaries, to return the mathematical output.

-- One row per user; the per-row Listens detail is gone.
SELECT user_id, AVG(rating) AS avg_rating
FROM Listens
GROUP BY user_id;
user_idavg_ratingNotes
14.2AVG(4.5, 4.2, 3.9) = 12.6/3
24.4AVG(4.7, 4.6, 3.9) = 13.2/3
33.9AVG(2.9, 4.9, NULL) = 7.8/2 (NULL ignored)

Result: 9 rows → 3 rows

PARTITION BY: Keep All Rows

PARTITION BY isolates data ranges for aggregate mathematical calculation while strictly preserving the underlying query results.

-- All 9 listen rows preserved, with each user's avg attached as a new column.
SELECT user_id, song_id, rating,
       AVG(rating) OVER (PARTITION BY user_id) AS user_avg
FROM Listens
ORDER BY user_id, song_id;
user_idsong_idratinguser_avgNotes
114.54.2Partition 1: AVG(4.5, 4.2, 3.9)
124.24.2Partition 1: AVG(4.5, 4.2, 3.9)
163.94.2Partition 1: AVG(4.5, 4.2, 3.9)
224.74.4Partition 2: AVG(4.7, 4.6, 3.9)
274.64.4Partition 2: AVG(4.7, 4.6, 3.9)
283.94.4Partition 2: AVG(4.7, 4.6, 3.9)
312.93.9Partition 3: AVG(2.9, 4.9, NULL) = 3.9
324.93.9Partition 3: AVG(2.9, 4.9, NULL) = 3.9
36NULL3.9Partition 3: AVG ignores NULL

Result: 9 rows → 9 rows (all kept)


Ranking Functions

Window functions can rank rows without collapsing them. Start with one function, RANK. It adds a rank column, ordered by rating within each user, highest first:

-- Rank each user's listens by rating, highest first.
SELECT
  user_id,
  song_id,
  rating,
  RANK() OVER (PARTITION BY user_id ORDER BY rating DESC) AS RANK
FROM Listens;

Walk it in three steps:

Step 1: Partition by user_id

user_idsong_idratingNotes
114.5Partition 1
124.2Partition 1
163.9Partition 1
224.7Partition 2
274.6Partition 2
283.9Partition 2
312.9Partition 3
324.9Partition 3
36NULLPartition 3

Step 2: Order by rating DESC within each partition

user_idsong_idratingNotes
114.5Highest in partition 1
124.2Second in partition 1
163.9Third in partition 1
224.7Highest in partition 2
274.6Second in partition 2
283.9Third in partition 2
324.9Highest in partition 3
312.9Second in partition 3
36NULLNULL sorts last with DESC

Step 3: Assign the rank

user_idsong_idratingrankNotes
114.51Highest
124.22Second
163.93Third
224.71Highest
274.62Second
283.93Third
324.91Highest
312.92Second
36NULL3NULL sorts last, gets rank 3

That is ranking: partition, order, number. One function, one column.

Handling ties

RANK is one of three ranking functions. With no duplicate ratings they all produce the numbers above, so the choice only matters on a tie. The full query asks for all three at once:

SELECT user_id, song_id, rating,
  ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY rating DESC) AS ROW_NUMBER,
  RANK()       OVER (PARTITION BY user_id ORDER BY rating DESC) AS RANK,
  DENSE_RANK() OVER (PARTITION BY user_id ORDER BY rating DESC) AS DENSE_RANK
FROM Listens;

To see them diverge, imagine Mickey had a 4th listen rated 4.5, a tie with his existing 4.5:

ROW_NUMBER, RANK, and DENSE_RANK differ only on a tie Mickey's listens sorted by rating descending, with a hypothetical tie of two 4.5 ratings. Four rows: 4.5, 4.5, 4.2, 3.9. ROW_NUMBER is 1, 2, 3, 4, always unique. RANK is 1, 1, 3, 4: the tie shares rank 1, then RANK skips to 3 leaving a gap. DENSE_RANK is 1, 1, 2, 3: the tie shares rank 1, then DENSE_RANK stays packed with no gap. The two tied rows are shaded. Color key: blue is Mickey, the shaded band is the tie. Three ways to rank, when there is a tie Mickey's listens by rating, with a hypothetical second 4.5. The functions agree until the tie. rating ROW_NUMBER RANK DENSE_RANK 4.5 1 1 1 4.5 2 1 1 tie 4.2 3 3 2 3.9 4 4 3 After the tie: RANK skips to 3 (no rank 2); DENSE_RANK stays packed at 2. Color key  Mickey identity bar  ·  shaded band = the two tied rows

Figure 2. The three ranking functions agree until a tie. ROW_NUMBER is always unique (1, 2, 3, 4). RANK lets the tie share a rank then skips, leaving a gap (1, 1, 3, 4). DENSE_RANK lets the tie share a rank but stays packed, no gap (1, 1, 2, 3). The shaded band is Mickey's hypothetical 4.5 tie.

Three answers to one question:

  • ROW_NUMBER: every row gets a unique number, ties broken arbitrarily. Use it for de-duplication and pagination.

  • RANK: ties share a rank, then it skips, leaving a gap. Use it for leaderboards where "joint 2nd, then 4th" is correct.

  • DENSE_RANK: ties share a rank with no gap. Use it when the next distinct value should be the next number.


Key Rules

  1. No Row Reduction: Unlike GROUP BY, all rows survive.

  2. PARTITION BY: Defines groups (optional - omit for whole table).

  3. ORDER BY: Sets sequence within partitions (required for some functions).

  4. NULL Handling: NULLs group together in PARTITION BY, sort first/last in ORDER BY.


Common Patterns

  • Ranking: RANK(), ROW_NUMBER(), DENSE_RANK()

  • Running Totals: SUM() OVER (ORDER BY...)

  • Moving Averages: AVG() OVER (ROWS BETWEEN...)

  • Lead/Lag: Compare to previous/next rows

  • Percentiles: NTILE(), PERCENT_RANK()

Common Mistakes

The Missing PARTITION BY Bug

Without PARTITION BY, you get a global ranking, which might not be what you intended.

Wrong ORDER BY Direction

ASC vs DESC changes the ranking meaning, so double-check your order.

Forgetting NULLs in ORDER BY

NULL sorting varies by database (PostgreSQL: NULLs first, MySQL: NULLs last), which can lead to unexpected results.