HAVING: Filtering Group Aggregates
Concept. HAVING filters the groups produced by GROUP BY, using conditions on aggregate values that did not exist before grouping.
Intuition. After GROUP BY collapses the 9-row Listens table into 3 user rows, HAVING filters those rows, for example, keep only users whose AVG(rating) > 4. WHERE can't do this because the average doesn't exist until after the bucketing.
Example. Identify users with an average rating above 4.0. This requires a GROUP BY operation first, followed by a HAVING clause to filter the groups.
Simple HAVING
-- Find users whose average rating exceeds 4.0
SELECT -- Last: Select columns to output
user_id,
COUNT(*) AS listen_count,
AVG(rating) AS avg_rating
FROM Listens -- First: Get rows
GROUP BY user_id -- Next: Create groups
HAVING AVG(rating) > 4.0 -- Then: Filter groups by aggregate condition
ORDER BY user_id -- Last: Sort surviving groups
Step 1: GROUP BY
| user_id | COUNT(*) | AVG(rating) | Notes | |
|---|---|---|---|---|
| 1 | 3 | (4.5+4.2+3.9)/3 | User 1's | |
| 2 | 3 | (4.7+4.6+3.9)/3 | User 2's | |
| 3 | 3 | (2.9+4.9+NULL)/2 | User 3's |
Step 2: HAVING AVG() > 4
| user_id | AVG(rating) | Notes: HAVING Check | |
|---|---|---|---|
| 1 | 4.2 | 4.2 > 4.0 ✓ ; Keep | |
| 2 | 4.4 | 4.4 > 4.0 ✓ ; Keep | |
| 3 | 3.9 | 3.9 > 4.0 ✗ ; Remove |
Output
| user_id | listen_count | avg_rating | |
|---|---|---|---|
| 1 | 3 | 4.2 | |
| 2 | 3 | 4.4 |
-
GROUP BY first, HAVING second
-
FALSE = entire group gone
-
No partial elimination
HAVING with COUNT
-- Find songs played at least twice
SELECT
song_id,
COUNT(*) AS play_count,
AVG(rating) AS avg_rating
FROM Listens
GROUP BY song_id -- Group by each song
HAVING COUNT(*) >= 2 -- Keep only groups with 2+ rows
ORDER BY song_id -- Sort output
Step 1: Form Groups by song_id
| song_id | COUNT(*) | AVG(rating) | Notes: listen_ids |
|---|---|---|---|
| 1 | 2 | (4.5+2.9)/2 | 1,7 |
| 2 | 3 | (4.2+ 4.7+ 4.9)/ 3 | 2,4,8 |
| 6 | 2 | 3.9 | 3,9 |
| 7 | 1 | 4.6 | 5 |
| 8 | 1 | 3.9 | 6 |
Step 2: HAVING COUNT(*) >= 2
| song_id | COUNT(*) | Notes: HAVING Check |
|---|---|---|
| 1 | 2 | 2 >= 2 ✓ ; Keep |
| 2 | 3 | 3 >= 2 ✓ ; Keep |
| 6 | 2 | 2 >= 2 ✓ ; Keep |
| 7 | 1 | 1 >= 2 ✗ ; Remove |
| 8 | 1 | 1 >= 2 ✗ ; Remove |
Output
| song_id | play_count | avg_rating |
|---|---|---|
| 1 | 2 | 3.7 |
| 2 | 3 | 4.6 |
| 6 | 2 | 3.9 |
Key Points:
-
Unpopular songs eliminated
-
Threshold: 2+ plays
-
NULLs counted in COUNT(*)
HAVING with Multiple Conditions
-- Find active users (3+ listens) with high average ratings (>4.0)
SELECT
user_id,
COUNT(*) AS listen_count,
AVG(rating) AS avg_rating,
MIN(rating) AS min_rating
FROM Listens
GROUP BY user_id
HAVING COUNT(*) >= 3 -- Multiple HAVING conditions with AND
AND AVG(rating) > 4.0 -- Both must be true for group to survive
ORDER BY user_id
Step 1: Calculate Group Aggregates
| user_id | COUNT(*) | AVG(rating) | MIN(rating) | Annotations | |
|---|---|---|---|---|---|
| 1 | 3 | 4.2 | 3.9 | All aggregates calculated | |
| 2 | 3 | 4.4 | 3.9 | All aggregates calculated | |
| 3 | 3 | 3.9 | 2.9 | NULL excluded from AVG/MIN |
Step 2: Evaluate HAVING Conditions
| user_id | Condition 1 | Condition 2 | Combined | Result | Annotations | |
|---|---|---|---|---|---|---|
| 1 | COUNT(*)=3 >= 3 ✓ | AVG=4.2 > 4.0 ✓ | ✓ AND ✓ | Keep | Both conditions met | |
| 2 | COUNT(*)=3 >= 3 ✓ | AVG=4.4 > 4.0 ✓ | ✓ AND ✓ | Keep | Both conditions met | |
| 3 | COUNT(*)=3 >= 3 ✓ | AVG=3.9 > 4.0 ✗ | ✓ AND ✗ | Remove | Average too low |
Output
| user_id | listen_count | avg_rating | min_rating | |
|---|---|---|---|---|
| 1 | 3 | 4.2 | 3.9 | |
| 2 | 3 | 4.4 | 3.9 |
Key Points:
-
AND: both must pass
-
Failing one = eliminated
-
Mix any aggregates
WHERE and HAVING
-- Find users with 2+ ratings above 3.5
SELECT
user_id,
COUNT(*) AS high_rating_count,
AVG(rating) AS avg_high_rating
FROM Listens
WHERE rating > 3.5 -- 1st: WHERE filters individual rows
GROUP BY user_id -- 2nd: GROUP BY forms groups from survivors
HAVING COUNT(*) >= 2 -- 3rd: HAVING filters entire groups
ORDER BY user_id -- 4th: ORDER BY sorts final results
Step 1: WHERE Filters Rows
| listen_id | user_id | rating | WHERE Check | Action | |
|---|---|---|---|---|---|
| 1 | 1 | 4.5 | 4.5 > 3.5 ✓ | Keep | |
| 2 | 1 | 4.2 | 4.2 > 3.5 ✓ | Keep | |
| 3 | 1 | 3.9 | 3.9 > 3.5 ✓ | Keep | |
| 4 | 2 | 4.7 | 4.7 > 3.5 ✓ | Keep | |
| 5 | 2 | 4.6 | 4.6 > 3.5 ✓ | Keep | |
| 6 | 2 | 3.9 | 3.9 > 3.5 ✓ | Keep | |
| 7 | 3 | 2.9 | 2.9 > 3.5 ✗ | Remove | |
| 8 | 3 | 4.9 | 4.9 > 3.5 ✓ | Keep | |
| 9 | 3 | NULL | NULL > 3.5 ? | Remove |
Step 2: GROUP BY on Filtered Data
| user_id | Filtered Rows | COUNT(*) | AVG(rating) | |
|---|---|---|---|---|
| 1 | 1,2,3 | 3 | 4.2 | |
| 2 | 4,5,6 | 3 | 4.4 | |
| 3 | 8 | 1 | 4.9 |
Step 3: HAVING Filters Groups
| user_id | COUNT(*) | HAVING Check | Result | Annotations | |
|---|---|---|---|---|---|
| 1 | 3 | 3 >= 2 ✓ | Keep | Enough high ratings | |
| 2 | 3 | 3 >= 2 ✓ | Keep | Enough high ratings | |
| 3 | 1 | 1 >= 2 ✗ | Remove | Too few high ratings |
Output
| user_id | high_rating_count | avg_high_rating | |
|---|---|---|---|
| 1 | 3 | 4.2 | |
| 2 | 3 | 4.4 |
Key Points:
-
WHERE filters rows (pre-GROUP)
-
HAVING filters groups (post-GROUP)
Key Rules
-
Execution Order: HAVING runs after GROUP BY.
-
Aggregate Conditions: Can use aggregate functions like AVG(), COUNT()
-
Group-Level Filter: Operates on entire groups, not individual rows
-
Requires GROUP BY: HAVING only makes sense with grouped data
-
Different from WHERE: WHERE filters rows, HAVING filters groups
SQL Execution Pipeline
Input: FROM
Logic: WHERE → GROUP BY → HAVING
(no aggregates) → (form groups) → (aggregates exist!)
Output: SELECT/ORDER BY
Common Patterns
-
Aggregate Filters:
HAVING COUNT(*) > 2keeps groups with 3+ rows -
Average Checks:
HAVING AVG(rating) > 4.0filters by group average -
Multiple Conditions:
HAVING COUNT(*) > 1 AND AVG(rating) > 4.0 -
With WHERE: WHERE filters rows first, then HAVING filters groups
-
Column References: Can use GROUP BY columns or aggregates
Common Mistakes
HAVING on rows
-- WRONG: HAVING does not work on individual rows
SELECT * FROM Listens
HAVING rating > 4.0
-- CORRECT: Use WHERE to filter individual rows
SELECT * FROM Listens
WHERE rating > 4.0
Problem: HAVING only works with grouped data
The HAVING Alias Problem
-- WRONG: Alias not yet created when HAVING runs
SELECT user_id, AVG(rating) AS avg_r
FROM Listens
GROUP BY user_id
HAVING avg_r > 4.0
-- CORRECT: Repeat the aggregate expression in HAVING
SELECT user_id, AVG(rating) AS avg_r
FROM Listens
GROUP BY user_id
HAVING AVG(rating) > 4.0
Problem: HAVING executes before SELECT creates aliases
Wrong Filter, Wrong Place
-- WRONG: HAVING can't access individual row values
SELECT user_id, COUNT(*)
FROM Listens
GROUP BY user_id
HAVING rating > 4.0 -- Error: Which rating? Group has many!
-- CORRECT: WHERE for row conditions, HAVING for group conditions
SELECT user_id, COUNT(*)
FROM Listens
WHERE rating > 4.0 -- Filter rows BEFORE grouping
GROUP BY user_id
HAVING COUNT(*) > 2 -- Then filter groups by aggregate
Problem: HAVING should use aggregate functions or GROUP BY columns
NULL Aggregate Confusion
-- Surprising: AVG ignores NULL ratings
SELECT user_id, AVG(rating)
FROM Listens
GROUP BY user_id
HAVING AVG(rating) > 4.0 -- AVG skips NULLs (may mislead!)
-- Explicit: Show total vs rated counts for clarity
SELECT user_id,
COUNT(*) AS total, -- Includes NULL rows
COUNT(rating) AS rated, -- Excludes NULL ratings
AVG(rating) AS avg_rating
FROM Listens
GROUP BY user_id
HAVING COUNT(rating) > 0 -- Ensure at least 1 non-NULL rating
AND AVG(rating) > 4.0 -- Then check average
Problem: AVG ignores NULL, which affects HAVING conditions