HAVING: Filtering Group Aggregates

Core Concept

HAVING is your tool for evaluating group aggregates. It's the big picture filter, focusing on counts, averages, and sums, unlike WHERE, which zeroes in on individual rows.

HAVING: The Group Gatekeeper Stage 1: GROUP BY Form Groups with Aggregates Mickey (user_id=1) COUNT(*) = 3 AVG(rating) = 4.2 Minnie (user_id=2) COUNT(*) = 3 AVG(rating) = 4.4 Daffy (user_id=3) COUNT(*) = 3 AVG(rating) = 3.9 Stage 2: HAVING Filter HAVING AVG(rating) > 4.0 4.2 > 4.0 ✓ Mickey passes 4.4 > 4.0 ✓ Minnie passes 3.9 > 4.0 ✗ Daffy blocked Stage 3: Output Groups That Passed Mickey COUNT = 3 AVG = 4.2 Minnie COUNT = 3 AVG = 4.4 Daffy Filtered out Key Differences: WHERE: Filters rows (before grouping) HAVING: Filters groups (after grouping) 3 groups enter → HAVING checks each → 2 groups survive

Simple HAVING

-- Find users whose average rating exceeds 4.0
SELECT                         -- Last: Select columns to output
    user_id,
    COUNT(*) AS listen_count,
    AVG(rating) AS avg_rating
FROM Listens                   -- First: Get rows
GROUP BY user_id               -- Next: Create groups
HAVING AVG(rating) > 4.0       -- Then: Filter groups by aggregate condition
ORDER BY user_id               -- Last: Sort surviving groups

Step 1: GROUP BY

user_idCOUNT(*)AVG(rating)Notes
13(4.5+4.2+3.9)/3User 1's
23(4.7+4.6+3.9)/3User 2's
33(2.9+4.9+NULL)/2User 3's

Step 2: HAVING AVG() > 4

user_idAVG(rating)Notes: HAVING Check
14.24.2 > 4.0 ✓ ; Keep
24.44.4 > 4.0 ✓ ; Keep
33.93.9 > 4.0 ✗ ; Remove

Output

user_idlisten_countavg_rating
134.2
234.4

HAVING with COUNT

-- Find songs played at least twice
SELECT 
    song_id,
    COUNT(*) AS play_count,
    AVG(rating) AS avg_rating
FROM Listens
GROUP BY song_id               -- Group by each song
HAVING COUNT(*) >= 2           -- Keep only groups with 2+ rows
ORDER BY song_id               -- Sort output

Step 1: Form Groups by song_id

song_idCOUNT(*)AVG(rating)Notes:
listen_ids
12(4.5+2.9)/21,7
23(4.2+ 4.7+ 4.9)/ 32,4,8
623.93,9
714.65
813.96

Step 2: HAVING COUNT(*) >= 2

song_idCOUNT(*)Notes: HAVING Check
122 >= 2 ✓ ; Keep
233 >= 2 ✓ ; Keep
622 >= 2 ✓ ; Keep
711 >= 2 ✗ ; Remove
811 >= 2 ✗ ; Remove

Output

song_id play_count avg_rating
1 2 3.7
2 3 4.6
6 2 3.9

Key Points:


HAVING with Multiple Conditions

-- Find active users (3+ listens) with high average ratings (>4.0)
SELECT 
    user_id,
    COUNT(*) AS listen_count,
    AVG(rating) AS avg_rating,
    MIN(rating) AS min_rating
FROM Listens
GROUP BY user_id
HAVING COUNT(*) >= 3           -- Multiple HAVING conditions with AND
   AND AVG(rating) > 4.0       -- Both must be true for group to survive
ORDER BY user_id

Step 1: Calculate Group Aggregates

user_idCOUNT(*)AVG(rating)MIN(rating)Annotations
134.23.9All aggregates calculated
234.43.9All aggregates calculated
333.92.9NULL excluded from AVG/MIN

Step 2: Evaluate HAVING Conditions

user_idCondition 1Condition 2CombinedResultAnnotations
1COUNT(*)=3 >= 3 ✓AVG=4.2 > 4.0 ✓✓ AND ✓KeepBoth conditions met
2COUNT(*)=3 >= 3 ✓AVG=4.4 > 4.0 ✓✓ AND ✓KeepBoth conditions met
3COUNT(*)=3 >= 3 ✓AVG=3.9 > 4.0 ✗✓ AND ✗RemoveAverage too low

Output

user_idlisten_countavg_ratingmin_rating
134.23.9
234.43.9

Key Points:


WHERE and HAVING

-- Find users with 2+ ratings above 3.5

SELECT 
    user_id,
    COUNT(*) AS high_rating_count,
    AVG(rating) AS avg_high_rating
FROM Listens
WHERE rating > 3.5             -- 1st: WHERE filters individual rows
GROUP BY user_id               -- 2nd: GROUP BY forms groups from survivors
HAVING COUNT(*) >= 2           -- 3rd: HAVING filters entire groups
ORDER BY user_id               -- 4th: ORDER BY sorts final results

Step 1: WHERE Filters Rows

listen_iduser_idratingWHERE CheckAction
114.54.5 > 3.5 ✓Keep
214.24.2 > 3.5 ✓Keep
313.93.9 > 3.5 ✓Keep
424.74.7 > 3.5 ✓Keep
524.64.6 > 3.5 ✓Keep
623.93.9 > 3.5 ✓Keep
732.92.9 > 3.5 ✗Remove
834.94.9 > 3.5 ✓Keep
93NULLNULL > 3.5 ?Remove

Step 2: GROUP BY on Filtered Data

user_idFiltered RowsCOUNT(*)AVG(rating)
11,2,334.2
24,5,634.4
3814.9

Step 3: HAVING Filters Groups

user_idCOUNT(*)HAVING CheckResultAnnotations
133 >= 2 ✓KeepEnough high ratings
233 >= 2 ✓KeepEnough high ratings
311 >= 2 ✗RemoveToo few high ratings

Output

user_idhigh_rating_countavg_high_rating
134.2
234.4

Key Points:


Key Rules

  1. Execution Order: HAVING runs after GROUP BY.

  2. Aggregate Conditions: Can use aggregate functions like AVG(), COUNT()

  3. Group-Level Filter: Operates on entire groups, not individual rows

  4. Requires GROUP BY: HAVING only makes sense with grouped data

  5. Different from WHERE: WHERE filters rows, HAVING filters groups

SQL Execution Pipeline

Input:  FROM
Logic:  WHEREGROUP BYHAVING
        (no aggregates) → (form groups) → (aggregates exist!)
Output: SELECT/ORDER BY

Common Patterns

Common Mistakes

HAVING on rows

-- WRONG: HAVING does not work on individual rows
SELECT * FROM Listens
HAVING rating > 4.0

-- CORRECT: Use WHERE to filter individual rows
SELECT * FROM Listens
WHERE rating > 4.0

Problem: HAVING only works with grouped data

The HAVING Alias Problem

-- WRONG: Alias not yet created when HAVING runs
SELECT user_id, AVG(rating) AS avg_r
FROM Listens
GROUP BY user_id
HAVING avg_r > 4.0

-- CORRECT: Repeat the aggregate expression in HAVING
SELECT user_id, AVG(rating) AS avg_r
FROM Listens
GROUP BY user_id
HAVING AVG(rating) > 4.0

Problem: HAVING executes before SELECT creates aliases

Wrong Filter, Wrong Place

-- WRONG: HAVING can't access individual row values
SELECT user_id, COUNT(*)
FROM Listens
GROUP BY user_id
HAVING rating > 4.0            -- Error: Which rating? Group has many!

-- CORRECT: WHERE for row conditions, HAVING for group conditions
SELECT user_id, COUNT(*)
FROM Listens
WHERE rating > 4.0             -- Filter rows BEFORE grouping
GROUP BY user_id
HAVING COUNT(*) > 2            -- Then filter groups by aggregate

Problem: HAVING should use aggregate functions or GROUP BY columns

NULL Aggregate Confusion

-- Surprising: AVG ignores NULL ratings
SELECT user_id, AVG(rating)
FROM Listens
GROUP BY user_id
HAVING AVG(rating) > 4.0       -- AVG skips NULLs (may mislead!)

-- Explicit: Show total vs rated counts for clarity
SELECT user_id, 
       COUNT(*) AS total,       -- Includes NULL rows
       COUNT(rating) AS rated,  -- Excludes NULL ratings
       AVG(rating) AS avg_rating
FROM Listens
GROUP BY user_id
HAVING COUNT(rating) > 0       -- Ensure at least 1 non-NULL rating
   AND AVG(rating) > 4.0       -- Then check average

Problem: AVG ignores NULL, which affects HAVING conditions