SQL Sets and Set Operations

Sets vs Multi-sets: A Critical Distinction

⚠️ Key Concept: SQL Tables are Multi-sets (Bags)

Mathematical Set

  • No duplicates allowed
  • {1, 2, 3} - each element once
  • Order doesn't matter

SQL Multi-set (Bag)

  • Duplicates ARE allowed
  • {1, 1, 2, 3, 3, 3} - elements can repeat
  • Each duplicate counts separately

SQL Set Operations Overview

SQL Set Operations: Combining Query Results Query A: Users who like Pop = {Mickey, Minnie, Daffy} | Query B: Users who like Rock = {Minnie, Daffy, Pluto} UNION Combine & Remove Duplicates A B Result: {Mickey, Minnie, Daffy, Pluto} All unique users from both queries Minnie and Daffy appear only once 4 distinct users UNION ALL Combine & Keep All Duplicates A (3) + B (3) Result: {Mickey, Minnie, Daffy, Minnie, Daffy, Pluto} All rows preserved 6 total rows (with duplicates) INTERSECT Common Elements Only A B Result: {Minnie, Daffy} Users in BOTH Pop AND Rock Mickey only likes Pop, Pluto only Rock 2 users like both genres EXCEPT In A but NOT in B A B Result: {Mickey} Only in Pop, not in Rock Minnie & Daffy removed 1 Pop-only user

UNION: Combine and Remove Duplicates

Creates a true set from two queries - duplicate rows appear only once.

-- Artists from Pop songs
SELECT DISTINCT artist                -- Query A: Pop artists
FROM songs 
WHERE genre = 'Pop'

UNION                                 -- UNION: Combines and removes duplicates

-- Artists from Rock songs
SELECT DISTINCT artist                -- Query B: Rock artists
FROM songs 
WHERE genre = 'Rock'                  -- Result: All artists from EITHER genre
Pop Rock UNION All unique artists (no duplicates)

Result: Taylor Swift, Ed Sheeran (each artist appears once)


UNION ALL: Combine and Keep Duplicates

Preserves the multi-set nature - duplicate rows are kept.

-- All Pop listens
SELECT user_id, song_id 
FROM listens l
JOIN songs s ON l.song_id = s.song_id
WHERE genre = 'Pop'                   -- First result set

UNION ALL                             -- UNION ALL: Keeps ALL rows (duplicates too!)

-- All Classic listens
SELECT user_id, song_id 
FROM listens l
JOIN songs s ON l.song_id = s.song_id
WHERE genre = 'Classic'               -- If user listened to both, appears twice
Operation Duplicates Use When
UNION Removed Want unique results
UNION ALL Kept Need all rows for counting/aggregation

Performance: UNION ALL is faster (no duplicate removal)


INTERSECT: Find Common Rows

Returns only rows that appear in BOTH queries.

-- Users who listened to Pop
SELECT DISTINCT user_id 
FROM listens l
JOIN songs s ON l.song_id = s.song_id
WHERE genre = 'Pop'                   -- Query A: Pop listeners

INTERSECT                             -- INTERSECT: Only rows in BOTH results

-- Users who listened to Classic
SELECT DISTINCT user_id 
FROM listens l
JOIN songs s ON l.song_id = s.song_id
WHERE genre = 'Classic'               -- Result: Users who like BOTH genres
Pop Classic INTERSECT Users who like BOTH

Result: Users 1, 2, and 3 (Mickey, Minnie, and Daffy all listened to both Pop and Classic)


EXCEPT: Find Difference

Returns rows from first query that are NOT in second query.

-- All songs
SELECT song_id, title FROM songs       -- Query A: All songs in database

EXCEPT                                 -- EXCEPT: A minus B (set difference)

-- Songs that have been listened to
SELECT DISTINCT s.song_id, s.title 
FROM songs s
JOIN listens l ON s.song_id = l.song_id  -- Query B: Songs with listens
                                         -- Result: Songs NEVER listened to
All Songs Played EXCEPT Songs never played

Result: Songs 3, 4, 5, 9, 10 (Shape of You, Photograph, Shivers, Bad Blood, DJ Mix) - never played


Quick Reference

Operation Removes Duplicates? SQL Example Use Case
UNION ✅ Yes SELECT ... UNION SELECT ... Combine unique results
UNION ALL ❌ No SELECT ... UNION ALL SELECT ... Keep all rows for counting
INTERSECT ✅ Yes SELECT ... INTERSECT SELECT ... Find common elements
EXCEPT ✅ Yes SELECT ... EXCEPT SELECT ... Find differences

Performance Tips

  1. UNION ALL is faster than UNION - no sorting/deduplication needed

  2. Index columns used in set operations for better performance

  3. Consider JOIN alternatives - sometimes faster than set operations


Practice: Multi-set vs Set Behavior

-- This table has duplicates (multi-set)
SELECT genre FROM songs;              -- Raw data: Duplicates preserved
-- Result: Pop, Pop, Rock, Rock, Rock, Classic, Classic, Classic, Rock, NULL

-- Remove duplicates to get a set
SELECT DISTINCT genre FROM songs;     -- DISTINCT: Each value once only
-- Result: Pop, Rock, Classic, NULL   -- NULL counts as a distinct value!

-- UNION removes duplicates (set operation)
SELECT genre FROM songs WHERE artist = 'Taylor Swift'
UNION                                 -- Set operation: Removes duplicates
SELECT genre FROM songs WHERE artist = 'Ed Sheeran';
-- Result: Pop, Rock (no duplicates)

-- UNION ALL keeps duplicates (multi-set operation)  
SELECT genre FROM songs WHERE artist = 'Taylor Swift'
UNION ALL                             -- Multi-set: Preserves ALL duplicates
SELECT genre FROM songs WHERE artist = 'Ed Sheeran';
-- Result: Pop, Pop, Rock, Rock, Rock, Rock (all occurrences)