HAVING: Filtering Group Aggregates

Core Concept

HAVING tests groups against aggregates. Unlike WHERE (which focuses on rows), it sees the big picture - counts, averages, sums.

HAVING: The Group Gatekeeper Stage 1: GROUP BY Form Groups with Aggregates Mickey (user_id=1) COUNT(*) = 3 AVG(rating) = 4.2 Minnie (user_id=2) COUNT(*) = 3 AVG(rating) = 4.4 Daffy (user_id=3) COUNT(*) = 3 AVG(rating) = 3.9 Stage 2: HAVING Filter HAVING AVG(rating) > 4.0 4.2 > 4.0 ✓ Mickey passes 4.4 > 4.0 ✓ Minnie passes 3.9 > 4.0 ✗ Daffy blocked Stage 3: Output Groups That Passed Mickey COUNT = 3 AVG = 4.2 Minnie COUNT = 3 AVG = 4.4 Daffy Filtered out Key Differences: WHERE: Filters rows (before grouping) HAVING: Filters groups (after grouping) 3 groups enter → HAVING checks each → 2 groups survive

Key Rules

  1. Execution Order: HAVING runs after GROUP BY.

  2. Aggregate Conditions: Can use aggregate functions like AVG(), COUNT()

  3. Group-Level Filter: Operates on entire groups, not individual rows

  4. Requires GROUP BY: HAVING only makes sense with grouped data

  5. Different from WHERE: WHERE filters rows, HAVING filters groups

SQL Execution Pipeline

Input:  FROM
Logic:  WHEREGROUP BYHAVING
        (no aggregates) → (form groups) → (aggregates exist!)
Output: SELECT/ORDER BY

Simple HAVING

-- Find users whose average rating exceeds 4.0
SELECT                         -- Last: Select columns to output
    user_id,
    COUNT(*) AS listen_count,
    AVG(rating) AS avg_rating
FROM Listens                   -- First: Get rows
GROUP BY user_id               -- Next: Create groups
HAVING AVG(rating) > 4.0       -- Then: Filter groups by aggregate condition
ORDER BY user_id               -- Last: Sort surviving groups

Step 1: GROUP BY

user_idCOUNT(*)AVG(rating)Notes
13(4.5+4.2+3.9)/3User 1's
23(4.7+4.6+3.9)/3User 2's
33(2.9+4.9+NULL)/2User 3's

Step 2: HAVING AVG() > 4

user_idAVG(rating)Notes: HAVING Check
14.24.2 > 4.0 ✓ ; Keep
24.44.4 > 4.0 ✓ ; Keep
33.93.9 > 4.0 ✗ ; Remove

Output

user_idlisten_countavg_rating
134.2
234.4

HAVING with COUNT

-- Find songs played at least twice
SELECT 
    song_id,
    COUNT(*) AS play_count,
    AVG(rating) AS avg_rating
FROM Listens
GROUP BY song_id               -- Group by each song
HAVING COUNT(*) >= 2           -- Keep only groups with 2+ rows
ORDER BY song_id               -- Sort output

Step 1: Form Groups by song_id

song_idCOUNT(*)AVG(rating)Notes:
listen_ids
12(4.5+2.9)/21,7
23(4.2+ 4.7+ 4.9)/ 32,4,8
623.93,9
714.65
813.96

Step 2: HAVING COUNT(*) >= 2

song_idCOUNT(*)Notes: HAVING Check
122 >= 2 ✓ ; Keep
233 >= 2 ✓ ; Keep
622 >= 2 ✓ ; Keep
711 >= 2 ✗ ; Remove
811 >= 2 ✗ ; Remove

Output

song_id play_count avg_rating
1 2 3.7
2 3 4.6
6 2 3.9

Key Points:


HAVING with Multiple Conditions

-- Find active users (3+ listens) with high average ratings (>4.0)
SELECT 
    user_id,
    COUNT(*) AS listen_count,
    AVG(rating) AS avg_rating,
    MIN(rating) AS min_rating
FROM Listens
GROUP BY user_id
HAVING COUNT(*) >= 3           -- Multiple HAVING conditions with AND
   AND AVG(rating) > 4.0       -- Both must be true for group to survive
ORDER BY user_id

Step 1: Calculate Group Aggregates

user_idCOUNT(*)AVG(rating)MIN(rating)Annotations
134.23.9All aggregates calculated
234.43.9All aggregates calculated
333.92.9NULL excluded from AVG/MIN

Step 2: Evaluate HAVING Conditions

user_idCondition 1Condition 2CombinedResultAnnotations
1COUNT(*)=3 >= 3 ✓AVG=4.2 > 4.0 ✓✓ AND ✓KeepBoth conditions met
2COUNT(*)=3 >= 3 ✓AVG=4.4 > 4.0 ✓✓ AND ✓KeepBoth conditions met
3COUNT(*)=3 >= 3 ✓AVG=3.9 > 4.0 ✗✓ AND ✗RemoveAverage too low

Output

user_idlisten_countavg_ratingmin_rating
134.23.9
234.43.9

Key Points:


WHERE and HAVING

-- Find users with 2+ ratings above 3.5

SELECT 
    user_id,
    COUNT(*) AS high_rating_count,
    AVG(rating) AS avg_high_rating
FROM Listens
WHERE rating > 3.5             -- 1st: WHERE filters individual rows
GROUP BY user_id               -- 2nd: GROUP BY forms groups from survivors
HAVING COUNT(*) >= 2           -- 3rd: HAVING filters entire groups
ORDER BY user_id               -- 4th: ORDER BY sorts final results

Step 1: WHERE Filters Rows

listen_iduser_idratingWHERE CheckAction
114.54.5 > 3.5 ✓Keep
214.24.2 > 3.5 ✓Keep
313.93.9 > 3.5 ✓Keep
424.74.7 > 3.5 ✓Keep
524.64.6 > 3.5 ✓Keep
623.93.9 > 3.5 ✓Keep
732.92.9 > 3.5 ✗Remove
834.94.9 > 3.5 ✓Keep
93NULLNULL > 3.5 ?Remove

Step 2: GROUP BY on Filtered Data

user_idFiltered RowsCOUNT(*)AVG(rating)
11,2,334.2
24,5,634.4
3814.9

Step 3: HAVING Filters Groups

user_idCOUNT(*)HAVING CheckResultAnnotations
133 >= 2 ✓KeepEnough high ratings
233 >= 2 ✓KeepEnough high ratings
311 >= 2 ✗RemoveToo few high ratings

Output

user_idhigh_rating_countavg_high_rating
134.2
234.4

Key Points:


Common Patterns

Common Mistakes

HAVING on rows

-- WRONG: HAVING does not work on individual rows
SELECT * FROM Listens
HAVING rating > 4.0

-- CORRECT: Use WHERE to filter individual rows
SELECT * FROM Listens
WHERE rating > 4.0

Problem: HAVING only works with grouped data

The HAVING Alias Problem

-- WRONG: Alias not yet created when HAVING runs
SELECT user_id, AVG(rating) AS avg_r
FROM Listens
GROUP BY user_id
HAVING avg_r > 4.0

-- CORRECT: Repeat the aggregate expression in HAVING
SELECT user_id, AVG(rating) AS avg_r
FROM Listens
GROUP BY user_id
HAVING AVG(rating) > 4.0

Problem: HAVING executes before SELECT creates aliases

Wrong Filter, Wrong Place

-- WRONG: HAVING can't access individual row values
SELECT user_id, COUNT(*)
FROM Listens
GROUP BY user_id
HAVING rating > 4.0            -- Error: Which rating? Group has many!

-- CORRECT: WHERE for row conditions, HAVING for group conditions
SELECT user_id, COUNT(*)
FROM Listens
WHERE rating > 4.0             -- Filter rows BEFORE grouping
GROUP BY user_id
HAVING COUNT(*) > 2            -- Then filter groups by aggregate

Problem: HAVING should use aggregate functions or GROUP BY columns

NULL Aggregate Confusion

-- Surprising: AVG ignores NULL ratings
SELECT user_id, AVG(rating)
FROM Listens
GROUP BY user_id
HAVING AVG(rating) > 4.0       -- AVG skips NULLs (may mislead!)

-- Explicit: Show total vs rated counts for clarity
SELECT user_id, 
       COUNT(*) AS total,       -- Includes NULL rows
       COUNT(rating) AS rated,  -- Excludes NULL ratings
       AVG(rating) AS avg_rating
FROM Listens
GROUP BY user_id
HAVING COUNT(rating) > 0       -- Ensure at least 1 non-NULL rating
   AND AVG(rating) > 4.0       -- Then check average

Problem: AVG ignores NULL, which affects HAVING conditions