JOINs: Combining Tables
Core Concept
- Example: Match users with their listening history (combines Users and Listens tables on user_id)
JOINs combine rows from two or more tables based on a related column between them.
Key Rules
-
Join Condition: The ON clause specifies how tables relate (e.g.,
Users.user_id = Listens.user_id
) -
Join Order: Multiple JOINs are evaluated left-to-right, building intermediate results
-
NULL Handling: Unmatched rows in outer joins get NULL values for missing columns
-
Cartesian (or Cross) Product: Missing ON clause creates every possible combination (usually wrong!)
-
Column Ambiguity: Use table aliases to clarify which table a column comes from
-
Result Ordering: Without ORDER BY, rows return in random/undefined order (DB-dependent). Use ORDER BY if you need ascending (or descending) consistent ordering. (As we will see later in Section 3, the extra ordering will require a SORT and that incurs a performance penalty.)
INNER JOIN
-- Get all users with their listening history (excludes users without listens)
SELECT u.user_id, u.name, l.listen_id, l.song_id, l.rating -- Last: Select columns to output
FROM Users u -- Start with Users table (aliased as 'u')
INNER JOIN Listens l -- Only keep rows with matches in BOTH tables
ON u.user_id = l.user_id -- Join condition: matching user_ids
ORDER BY u.user_id, l.listen_id -- Sort by user first, then listen
Users Table (4 rows)
user_id | name | ||
---|---|---|---|
1 | Mickey | mickey@example.com | |
2 | Minnie | minnie@example.com | |
3 | Daffy | daffy@example.com | |
4 | Pluto | pluto@example.com |
Listens Table (9 rows)
listen_id | user_id | song_id | rating | listen_time | |
---|---|---|---|---|---|
1 | 1 | 1 | 4.5 | 2024-08-30 14:35:00 | |
2 | 1 | 2 | 4.2 | NULL | |
3 | 1 | 6 | 3.9 | 2024-08-29 10:15:00 | |
4 | 2 | 2 | 4.7 | NULL | |
5 | 2 | 7 | 4.6 | 2024-08-28 09:20:00 | |
6 | 2 | 8 | 3.9 | 2024-08-27 16:45:00 | |
7 | 3 | 1 | 2.9 | NULL | |
8 | 3 | 2 | 4.9 | 2024-08-26 12:30:00 | |
9 | 3 | 6 | NULL | NULL |
Output (9 rows)
user_id | name | listen_id | song_id | rating | Annotations | |
---|---|---|---|---|---|---|
1 | Mickey | 1 | 1 | 4.5 | Mickey → listen 1 | |
1 | Mickey | 2 | 2 | 4.2 | Mickey → listen 2 | |
1 | Mickey | 3 | 6 | 3.9 | Mickey → listen 3 | |
2 | Minnie | 4 | 2 | 4.7 | Minnie → listen 4 | |
2 | Minnie | 5 | 7 | 4.6 | Minnie → listen 5 | |
2 | Minnie | 6 | 8 | 3.9 | Minnie → listen 6 | |
3 | Daffy | 7 | 1 | 2.9 | Daffy → listen 7 | |
3 | Daffy | 8 | 2 | 4.9 | Daffy → listen 8 | |
3 | Daffy | 9 | 6 | NULL | Daffy → listen 9 |
-
Only matching rows are included
-
Pluto (user_id=4) excluded - no listens
-
Each match creates a result row
-
Cartesian product filtered by ON
Conceptually, INNER JOIN is like a double for loop (Python):
# Simplified conceptual model of INNER JOIN
users = [(1, 'Mickey'), (2, 'Minnie'), (3, 'Daffy'), (4, 'Pluto')]
listens = [(1, 1, 4.5), (2, 1, 4.2), (4, 2, 4.7), (7, 3, 2.9)] # (listen_id, user_id, rating)
result = []
for user IN users:
for listen IN listens:
if user[0] == listen[1]: # ON u.user_id = l.user_id
result.append(user + listen) # Combine matching rows
# Result: Only rows WHERE user_id matches IN both tables
# Pluto (user_id=4) NOT IN result - no matching listens
LEFT JOIN
-- Get all users, including those without any listening history
SELECT u.user_id, u.name, l.listen_id, l.song_id, l.rating
FROM Users u -- LEFT table: ALL users preserved
LEFT JOIN Listens l -- Add matching listens, NULLs if no match
ON u.user_id = l.user_id -- Join condition
ORDER BY u.user_id, l.listen_id -- NULLs sort last (or first, DB-dependent)
user_id | name | listen_id | song_id | rating | Annotations | |
---|---|---|---|---|---|---|
1 | Mickey | 1 | 1 | 4.5 | Mickey → listen 1 | |
1 | Mickey | 2 | 2 | 4.2 | Mickey → listen 2 | |
1 | Mickey | 3 | 6 | 3.9 | Mickey → listen 3 | |
2 | Minnie | 4 | 2 | 4.7 | Minnie → listen 4 | |
2 | Minnie | 5 | 7 | 4.6 | Minnie → listen 5 | |
2 | Minnie | 6 | 8 | 3.9 | Minnie → listen 6 | |
3 | Daffy | 7 | 1 | 2.9 | Daffy → listen 7 | |
3 | Daffy | 8 | 2 | 4.9 | Daffy → listen 8 | |
3 | Daffy | 9 | 6 | NULL | Daffy → listen 9 | |
4 | Pluto | NULL | NULL | NULL | Pluto preserved (no listens) |
-
All users included (even Pluto)
-
Non-matching rows get NULL values
-
Preserves left table completely
-
More inclusive than INNER JOIN
RIGHT JOIN
-- Get all listens with user info (all listens preserved)
SELECT u.user_id, u.name, l.listen_id, l.song_id, l.rating
FROM Users u -- LEFT table (may have NULLs)
RIGHT JOIN Listens l -- RIGHT table: ALL listens preserved
ON u.user_id = l.user_id -- NULLs for orphaned listens
ORDER BY l.listen_id -- Sort by listen_id
user_id | name | listen_id | song_id | rating | Annotations | |
---|---|---|---|---|---|---|
1 | Mickey | 1 | 1 | 4.5 | Listen 1 → Mickey | |
1 | Mickey | 2 | 2 | 4.2 | Listen 2 → Mickey | |
1 | Mickey | 3 | 6 | 3.9 | Listen 3 → Mickey | |
2 | Minnie | 4 | 2 | 4.7 | Listen 4 → Minnie | |
2 | Minnie | 5 | 7 | 4.6 | Listen 5 → Minnie | |
2 | Minnie | 6 | 8 | 3.9 | Listen 6 → Minnie | |
3 | Daffy | 7 | 1 | 2.9 | Listen 7 → Daffy | |
3 | Daffy | 8 | 2 | 4.9 | Listen 8 → Daffy | |
3 | Daffy | 9 | 6 | NULL | Listen 9 → Daffy |
-
All right table rows kept
-
No match = NULLs on left
-
RIGHT table drives results
-
Flip of LEFT JOIN
FULL OUTER JOIN
-- Get all users and all listens, preserving both sides
SELECT u.user_id, u.name, l.listen_id, l.song_id, l.rating
FROM Users u
FULL OUTER JOIN Listens l -- Preserve ALL rows from BOTH tables
ON u.user_id = l.user_id -- NULLs for non-matching sides
user_id | name | listen_id | song_id | rating | Annotations | |
---|---|---|---|---|---|---|
1 | Mickey | 1 | 1 | 4.5 | Mickey → listen 1 | |
1 | Mickey | 2 | 2 | 4.2 | Mickey → listen 2 | |
1 | Mickey | 3 | 6 | 3.9 | Mickey → listen 3 | |
2 | Minnie | 4 | 2 | 4.7 | Minnie → listen 4 | |
2 | Minnie | 5 | 7 | 4.6 | Minnie → listen 5 | |
2 | Minnie | 6 | 8 | 3.9 | Minnie → listen 6 | |
3 | Daffy | 7 | 1 | 2.9 | Daffy → listen 7 | |
3 | Daffy | 8 | 2 | 4.9 | Daffy → listen 8 | |
3 | Daffy | 9 | 6 | NULL | Daffy → listen 9 | |
4 | Pluto | NULL | NULL | NULL | Pluto preserved |
-
Everything from both tables
-
NULLs fill the gaps
-
LEFT + RIGHT combined
-
SQLLite does not support FULL OUTER. UNION simulates.
Multiple JOINs
-- Get user names with song details for high-rated listens (rating > 4.0)
SELECT u.name, s.title, s.artist, l.rating
FROM Users u
INNER JOIN Listens l -- First join: Users ⟷ Listens
ON u.user_id = l.user_id -- Match users to their listens
INNER JOIN Songs s -- Second join: Result ⟷ Songs
ON l.song_id = s.song_id -- Match listens to song details
WHERE l.rating > 4.0 -- Filter AFTER all joins complete
ORDER BY u.name, l.rating DESC -- Sort by user, then rating (high to low)
name | title | artist | rating | Annotations | |
---|---|---|---|---|---|
Daffy | Willow | Taylor Swift | 4.9 | U3 → L8 → S2 | |
Mickey | Evermore | Taylor Swift | 4.5 | U1 → L1 → S1 | |
Mickey | Willow | Taylor Swift | 4.2 | U1 → L2 → S2 | |
Minnie | Willow | Taylor Swift | 4.7 | U2 → L4 → S2 | |
Minnie | Yellow Sub | The Beatles | 4.6 | U2 → L5 → S7 |
-
Connect Users → Listens → Songs
-
Apply WHERE after all JOINs
-
Each JOIN evaluated sequentially
-
Powerful for complex relationships
JOIN ON: Beyond Simple Equality
The ON clause accepts ANY logical condition, not just =
-- Inequality: Users who gave high ratings
SELECT u.name, l.song_id, l.rating
FROM Users u
JOIN Listens l ON u.user_id = l.user_id AND l.rating > 4.0
-- Range: Songs released in 2023 with listens
SELECT s.title, l.listen_time
FROM Songs s
JOIN Listens l ON s.song_id = l.song_id
AND s.release_date BETWEEN '2023-01-01' AND '2023-12-31'
-- Pattern matching: Gmail users and their listens
SELECT u.email, l.song_id
FROM Users u
JOIN Listens l ON u.user_id = l.user_id
AND u.email LIKE '%@gmail.com'
-- Complex: Listens where user played > 50% of song
SELECT u.name, s.title, l.listen_duration, s.duration
FROM Listens l
JOIN Users u ON l.user_id = u.user_id
JOIN Songs s ON l.song_id = s.song_id
AND l.listen_duration > s.duration * 0.5
Key Point: Use AND/OR to combine conditions. The ON clause is evaluated for each potential row pair.
Common Patterns
-
One-to-Many: One user has many listens (Users → Listens)
-
Many-to-Many: Users listen to songs through a junction table (Users → Listens → Songs)
-
Optional Relationships: LEFT JOIN preserves all customers even without orders
-
Chaining: Connect 3+ tables by joining results with next table
Common Mistakes
4 users × 9 listens = 36 rows (see example below)
Forgetting JOIN conditions
-- Usually WRONG: Missing ON clause creates cartesian product
SELECT * FROM Users, Listens -- Creates 4×9 = 36 rows!
-- CORRECT: Explicit JOIN with ON condition
SELECT * FROM Users
JOIN Listens ON Users.user_id = Listens.user_id -- Only matching pairs
Problem: Missing ON clause creates cartesian product
Wrong JOIN type for requirement
-- WRONG: INNER JOIN excludes users without listens
SELECT * FROM Users u
INNER JOIN Listens l ON u.user_id = l.user_id -- Pluto disappears!
-- CORRECT: LEFT JOIN preserves all users even without listens
SELECT * FROM Users u
LEFT JOIN Listens l ON u.user_id = l.user_id -- Pluto gets NULL columns
Problem: INNER JOIN excludes non-matching rows
Ambiguous column names
-- WRONG: Ambiguous user_id column exists in both tables
SELECT user_id FROM Users JOIN Listens ON ... -- Error: Which user_id?
-- CORRECT: Specify table name to resolve ambiguity
SELECT Users.user_id FROM Users JOIN Listens ON ... -- Or u.user_id with alias
Problem: Both tables have user_id column