HAVING: Filtering Group Aggregates
Core Concept
- Example: Find users whose average rating exceeds 4.0 (needs GROUP BY first, then HAVING filters those groups)
HAVING tests groups against aggregates. Unlike WHERE (which focuses on rows), it sees the big picture - counts, averages, sums.
Key Rules
-
Execution Order: HAVING runs after GROUP BY.
-
Aggregate Conditions: Can use aggregate functions like AVG(), COUNT()
-
Group-Level Filter: Operates on entire groups, not individual rows
-
Requires GROUP BY: HAVING only makes sense with grouped data
-
Different from WHERE: WHERE filters rows, HAVING filters groups
SQL Execution Pipeline
Input: FROM
Logic: WHERE → GROUP BY → HAVING
(no aggregates) → (form groups) → (aggregates exist!)
Output: SELECT/ORDER BY
Simple HAVING
-- Find users whose average rating exceeds 4.0
SELECT -- Last: Select columns to output
user_id,
COUNT(*) AS listen_count,
AVG(rating) AS avg_rating
FROM Listens -- First: Get rows
GROUP BY user_id -- Next: Create groups
HAVING AVG(rating) > 4.0 -- Then: Filter groups by aggregate condition
ORDER BY user_id -- Last: Sort surviving groups
Step 1: GROUP BY
user_id | COUNT(*) | AVG(rating) | Notes | |
---|---|---|---|---|
1 | 3 | (4.5+4.2+3.9)/3 | User 1's | |
2 | 3 | (4.7+4.6+3.9)/3 | User 2's | |
3 | 3 | (2.9+4.9+NULL)/2 | User 3's |
Step 2: HAVING AVG() > 4
user_id | AVG(rating) | Notes: HAVING Check | |
---|---|---|---|
1 | 4.2 | 4.2 > 4.0 ✓ ; Keep | |
2 | 4.4 | 4.4 > 4.0 ✓ ; Keep | |
3 | 3.9 | 3.9 > 4.0 ✗ ; Remove |
Output
user_id | listen_count | avg_rating | |
---|---|---|---|
1 | 3 | 4.2 | |
2 | 3 | 4.4 |
-
GROUP BY first, HAVING second
-
FALSE = entire group gone
-
No partial elimination
HAVING with COUNT
-- Find songs played at least twice
SELECT
song_id,
COUNT(*) AS play_count,
AVG(rating) AS avg_rating
FROM Listens
GROUP BY song_id -- Group by each song
HAVING COUNT(*) >= 2 -- Keep only groups with 2+ rows
ORDER BY song_id -- Sort output
Step 1: Form Groups by song_id
song_id | COUNT(*) | AVG(rating) | Notes: listen_ids |
---|---|---|---|
1 | 2 | (4.5+2.9)/2 | 1,7 |
2 | 3 | (4.2+ 4.7+ 4.9)/ 3 | 2,4,8 |
6 | 2 | 3.9 | 3,9 |
7 | 1 | 4.6 | 5 |
8 | 1 | 3.9 | 6 |
Step 2: HAVING COUNT(*) >= 2
song_id | COUNT(*) | Notes: HAVING Check |
---|---|---|
1 | 2 | 2 >= 2 ✓ ; Keep |
2 | 3 | 3 >= 2 ✓ ; Keep |
6 | 2 | 2 >= 2 ✓ ; Keep |
7 | 1 | 1 >= 2 ✗ ; Remove |
8 | 1 | 1 >= 2 ✗ ; Remove |
Output
song_id | play_count | avg_rating |
---|---|---|
1 | 2 | 3.7 |
2 | 3 | 4.6 |
6 | 2 | 3.9 |
Key Points:
-
Unpopular songs eliminated
-
Threshold: 2+ plays
-
NULLs counted in COUNT(*)
HAVING with Multiple Conditions
-- Find active users (3+ listens) with high average ratings (>4.0)
SELECT
user_id,
COUNT(*) AS listen_count,
AVG(rating) AS avg_rating,
MIN(rating) AS min_rating
FROM Listens
GROUP BY user_id
HAVING COUNT(*) >= 3 -- Multiple HAVING conditions with AND
AND AVG(rating) > 4.0 -- Both must be true for group to survive
ORDER BY user_id
Step 1: Calculate Group Aggregates
user_id | COUNT(*) | AVG(rating) | MIN(rating) | Annotations | |
---|---|---|---|---|---|
1 | 3 | 4.2 | 3.9 | All aggregates calculated | |
2 | 3 | 4.4 | 3.9 | All aggregates calculated | |
3 | 3 | 3.9 | 2.9 | NULL excluded from AVG/MIN |
Step 2: Evaluate HAVING Conditions
user_id | Condition 1 | Condition 2 | Combined | Result | Annotations | |
---|---|---|---|---|---|---|
1 | COUNT(*)=3 >= 3 ✓ | AVG=4.2 > 4.0 ✓ | ✓ AND ✓ | Keep | Both conditions met | |
2 | COUNT(*)=3 >= 3 ✓ | AVG=4.4 > 4.0 ✓ | ✓ AND ✓ | Keep | Both conditions met | |
3 | COUNT(*)=3 >= 3 ✓ | AVG=3.9 > 4.0 ✗ | ✓ AND ✗ | Remove | Average too low |
Output
user_id | listen_count | avg_rating | min_rating | |
---|---|---|---|---|
1 | 3 | 4.2 | 3.9 | |
2 | 3 | 4.4 | 3.9 |
Key Points:
-
AND: both must pass
-
Failing one = eliminated
-
Mix any aggregates
WHERE and HAVING
-- Find users with 2+ ratings above 3.5
SELECT
user_id,
COUNT(*) AS high_rating_count,
AVG(rating) AS avg_high_rating
FROM Listens
WHERE rating > 3.5 -- 1st: WHERE filters individual rows
GROUP BY user_id -- 2nd: GROUP BY forms groups from survivors
HAVING COUNT(*) >= 2 -- 3rd: HAVING filters entire groups
ORDER BY user_id -- 4th: ORDER BY sorts final results
Step 1: WHERE Filters Rows
listen_id | user_id | rating | WHERE Check | Action | |
---|---|---|---|---|---|
1 | 1 | 4.5 | 4.5 > 3.5 ✓ | Keep | |
2 | 1 | 4.2 | 4.2 > 3.5 ✓ | Keep | |
3 | 1 | 3.9 | 3.9 > 3.5 ✓ | Keep | |
4 | 2 | 4.7 | 4.7 > 3.5 ✓ | Keep | |
5 | 2 | 4.6 | 4.6 > 3.5 ✓ | Keep | |
6 | 2 | 3.9 | 3.9 > 3.5 ✓ | Keep | |
7 | 3 | 2.9 | 2.9 > 3.5 ✗ | Remove | |
8 | 3 | 4.9 | 4.9 > 3.5 ✓ | Keep | |
9 | 3 | NULL | NULL > 3.5 ? | Remove |
Step 2: GROUP BY on Filtered Data
user_id | Filtered Rows | COUNT(*) | AVG(rating) | |
---|---|---|---|---|
1 | 1,2,3 | 3 | 4.2 | |
2 | 4,5,6 | 3 | 4.4 | |
3 | 8 | 1 | 4.9 |
Step 3: HAVING Filters Groups
user_id | COUNT(*) | HAVING Check | Result | Annotations | |
---|---|---|---|---|---|
1 | 3 | 3 >= 2 ✓ | Keep | Enough high ratings | |
2 | 3 | 3 >= 2 ✓ | Keep | Enough high ratings | |
3 | 1 | 1 >= 2 ✗ | Remove | Too few high ratings |
Output
user_id | high_rating_count | avg_high_rating | |
---|---|---|---|
1 | 3 | 4.2 | |
2 | 3 | 4.4 |
Key Points:
-
WHERE filters rows (pre-GROUP)
-
HAVING filters groups (post-GROUP)
Common Patterns
-
Aggregate Filters:
HAVING COUNT(*) > 2
keeps groups with 3+ rows -
Average Checks:
HAVING AVG(rating) > 4.0
filters by group average -
Multiple Conditions:
HAVING COUNT(*) > 1 AND AVG(rating) > 4.0
-
With WHERE: WHERE filters rows first, then HAVING filters groups
-
Column References: Can use GROUP BY columns or aggregates
Common Mistakes
HAVING on rows
-- WRONG: HAVING does not work on individual rows
SELECT * FROM Listens
HAVING rating > 4.0
-- CORRECT: Use WHERE to filter individual rows
SELECT * FROM Listens
WHERE rating > 4.0
Problem: HAVING only works with grouped data
The HAVING Alias Problem
-- WRONG: Alias not yet created when HAVING runs
SELECT user_id, AVG(rating) AS avg_r
FROM Listens
GROUP BY user_id
HAVING avg_r > 4.0
-- CORRECT: Repeat the aggregate expression in HAVING
SELECT user_id, AVG(rating) AS avg_r
FROM Listens
GROUP BY user_id
HAVING AVG(rating) > 4.0
Problem: HAVING executes before SELECT creates aliases
Wrong Filter, Wrong Place
-- WRONG: HAVING can't access individual row values
SELECT user_id, COUNT(*)
FROM Listens
GROUP BY user_id
HAVING rating > 4.0 -- Error: Which rating? Group has many!
-- CORRECT: WHERE for row conditions, HAVING for group conditions
SELECT user_id, COUNT(*)
FROM Listens
WHERE rating > 4.0 -- Filter rows BEFORE grouping
GROUP BY user_id
HAVING COUNT(*) > 2 -- Then filter groups by aggregate
Problem: HAVING should use aggregate functions or GROUP BY columns
NULL Aggregate Confusion
-- Surprising: AVG ignores NULL ratings
SELECT user_id, AVG(rating)
FROM Listens
GROUP BY user_id
HAVING AVG(rating) > 4.0 -- AVG skips NULLs (may mislead!)
-- Explicit: Show total vs rated counts for clarity
SELECT user_id,
COUNT(*) AS total, -- Includes NULL rows
COUNT(rating) AS rated, -- Excludes NULL ratings
AVG(rating) AS avg_rating
FROM Listens
GROUP BY user_id
HAVING COUNT(rating) > 0 -- Ensure at least 1 non-NULL rating
AND AVG(rating) > 4.0 -- Then check average
Problem: AVG ignores NULL, which affects HAVING conditions