SQL Sets and Set Operations
Sets vs Multi-sets: A Critical Distinction
⚠️ Key Concept: SQL Tables are Multi-sets (Bags)
Mathematical Set
- No duplicates allowed
- {1, 2, 3} - each element once
- Order doesn't matter
SQL Multi-set (Bag)
- Duplicates ARE allowed
- {1, 1, 2, 3, 3, 3} - elements can repeat
- Each duplicate counts separately
SQL Set Operations Overview
UNION: Combine and Remove Duplicates
Creates a true set from two queries - duplicate rows appear only once.
-- Artists from Pop songs
SELECT DISTINCT artist -- Query A: Pop artists
FROM songs
WHERE genre = 'Pop'
UNION -- UNION: Combines and removes duplicates
-- Artists from Rock songs
SELECT DISTINCT artist -- Query B: Rock artists
FROM songs
WHERE genre = 'Rock' -- Result: All artists from EITHER genre
Result: Taylor Swift, Ed Sheeran (each artist appears once)
UNION ALL: Combine and Keep Duplicates
Preserves the multi-set nature - duplicate rows are kept.
-- All Pop listens
SELECT user_id, song_id
FROM listens l
JOIN songs s ON l.song_id = s.song_id
WHERE genre = 'Pop' -- First result set
UNION ALL -- UNION ALL: Keeps ALL rows (duplicates too!)
-- All Classic listens
SELECT user_id, song_id
FROM listens l
JOIN songs s ON l.song_id = s.song_id
WHERE genre = 'Classic' -- If user listened to both, appears twice
Operation | Duplicates | Use When |
---|---|---|
UNION | Removed | Want unique results |
UNION ALL | Kept | Need all rows for counting/aggregation |
Performance: UNION ALL is faster (no duplicate removal)
INTERSECT: Find Common Rows
Returns only rows that appear in BOTH queries.
-- Users who listened to Pop
SELECT DISTINCT user_id
FROM listens l
JOIN songs s ON l.song_id = s.song_id
WHERE genre = 'Pop' -- Query A: Pop listeners
INTERSECT -- INTERSECT: Only rows in BOTH results
-- Users who listened to Classic
SELECT DISTINCT user_id
FROM listens l
JOIN songs s ON l.song_id = s.song_id
WHERE genre = 'Classic' -- Result: Users who like BOTH genres
Result: Users 1, 2, and 3 (Mickey, Minnie, and Daffy all listened to both Pop and Classic)
EXCEPT: Find Difference
Returns rows from first query that are NOT in second query.
-- All songs
SELECT song_id, title FROM songs -- Query A: All songs in database
EXCEPT -- EXCEPT: A minus B (set difference)
-- Songs that have been listened to
SELECT DISTINCT s.song_id, s.title
FROM songs s
JOIN listens l ON s.song_id = l.song_id -- Query B: Songs with listens
-- Result: Songs NEVER listened to
Result: Songs 3, 4, 5, 9, 10 (Shape of You, Photograph, Shivers, Bad Blood, DJ Mix) - never played
Quick Reference
Operation | Removes Duplicates? | SQL Example | Use Case |
---|---|---|---|
UNION | ✅ Yes | SELECT ... UNION SELECT ... |
Combine unique results |
UNION ALL | ❌ No | SELECT ... UNION ALL SELECT ... |
Keep all rows for counting |
INTERSECT | ✅ Yes | SELECT ... INTERSECT SELECT ... |
Find common elements |
EXCEPT | ✅ Yes | SELECT ... EXCEPT SELECT ... |
Find differences |
Performance Tips
-
UNION ALL is faster than UNION - no sorting/deduplication needed
-
Index columns used in set operations for better performance
-
Consider JOIN alternatives - sometimes faster than set operations
Practice: Multi-set vs Set Behavior
-- This table has duplicates (multi-set)
SELECT genre FROM songs; -- Raw data: Duplicates preserved
-- Result: Pop, Pop, Rock, Rock, Rock, Classic, Classic, Classic, Rock, NULL
-- Remove duplicates to get a set
SELECT DISTINCT genre FROM songs; -- DISTINCT: Each value once only
-- Result: Pop, Rock, Classic, NULL -- NULL counts as a distinct value!
-- UNION removes duplicates (set operation)
SELECT genre FROM songs WHERE artist = 'Taylor Swift'
UNION -- Set operation: Removes duplicates
SELECT genre FROM songs WHERE artist = 'Ed Sheeran';
-- Result: Pop, Rock (no duplicates)
-- UNION ALL keeps duplicates (multi-set operation)
SELECT genre FROM songs WHERE artist = 'Taylor Swift'
UNION ALL -- Multi-set: Preserves ALL duplicates
SELECT genre FROM songs WHERE artist = 'Ed Sheeran';
-- Result: Pop, Pop, Rock, Rock, Rock, Rock (all occurrences)