JOINs: Combining Tables

Core Concept

JOINs combine rows from two or more tables based on a related column between them.

SQL JOINs: Basic Operations Tables A B INNER JOINs SELECT A.color, B.color FROM A JOIN B ON A.color = B.color Returns only rows where colors match in both tables. 5 rows: 4 green + 1 blue Cross Product SELECT A.color, B.color FROM A, B Every row from A paired with every row from B. 16 rows total (4 × 4)
SQL JOINs: Outer Joins & Conditions LEFT JOIN Null SELECT A.color, B.color FROM A LEFT JOIN B ON A.color = B.color RIGHT JOIN Null SELECT A.color, B.color FROM A RIGHT JOIN B ON A.color = B.color OUTER JOIN Null Null SELECT A.color, B.color FROM A FULL OUTER JOIN B ON A.color = B.color JOINs on any condition SELECT A.color, B.color FROM A JOIN B ON A.color not in ('Green', 'Blue')
⚠️ Note on Ordering: The visual examples show one possible ordering. Without ORDER BY, your results may appear in any order - even different each time you run the query!

Key Rules

  1. Join Condition: The ON clause specifies how tables relate (e.g., Users.user_id = Listens.user_id)

  2. Join Order: Multiple JOINs are evaluated left-to-right, building intermediate results

  3. NULL Handling: Unmatched rows in outer joins get NULL values for missing columns

  4. Cartesian (or Cross) Product: Missing ON clause creates every possible combination (usually wrong!)

  5. Column Ambiguity: Use table aliases to clarify which table a column comes from

  6. Result Ordering: Without ORDER BY, rows return in random/undefined order (DB-dependent). Use ORDER BY if you need ascending (or descending) consistent ordering. (As we will see later in Section 3, the extra ordering will require a SORT and that incurs a performance penalty.)

INNER JOIN

-- Get all users with their listening history (excludes users without listens)
SELECT u.user_id, u.name, l.listen_id, l.song_id, l.rating  -- Last: Select columns to output
FROM Users u                          -- Start with Users table (aliased as 'u')
INNER JOIN Listens l                  -- Only keep rows with matches in BOTH tables
  ON u.user_id = l.user_id            -- Join condition: matching user_ids
ORDER BY u.user_id, l.listen_id      -- Sort by user first, then listen

Users Table (4 rows)

user_idnameemail
1Mickeymickey@example.com
2Minnieminnie@example.com
3Daffydaffy@example.com
4Plutopluto@example.com

Listens Table (9 rows)

listen_iduser_idsong_idratinglisten_time
1114.52024-08-30 14:35:00
2124.2NULL
3163.92024-08-29 10:15:00
4224.7NULL
5274.62024-08-28 09:20:00
6283.92024-08-27 16:45:00
7312.9NULL
8324.92024-08-26 12:30:00
936NULLNULL

Output (9 rows)

user_idnamelisten_idsong_idratingAnnotations
1Mickey114.5Mickey → listen 1
1Mickey224.2Mickey → listen 2
1Mickey363.9Mickey → listen 3
2Minnie424.7Minnie → listen 4
2Minnie574.6Minnie → listen 5
2Minnie683.9Minnie → listen 6
3Daffy712.9Daffy → listen 7
3Daffy824.9Daffy → listen 8
3Daffy96NULLDaffy → listen 9

Conceptually, INNER JOIN is like a double for loop (Python):

# Simplified conceptual model of INNER JOIN
users = [(1, 'Mickey'), (2, 'Minnie'), (3, 'Daffy'), (4, 'Pluto')]
listens = [(1, 1, 4.5), (2, 1, 4.2), (4, 2, 4.7), (7, 3, 2.9)]  # (listen_id, user_id, rating)

result = []
for user IN users:
    for listen IN listens:
        if user[0] == listen[1]:  # ON u.user_id = l.user_id
            result.append(user + listen)  # Combine matching rows

# Result: Only rows WHERE user_id matches IN both tables
# Pluto (user_id=4) NOT IN result - no matching listens

LEFT JOIN

-- Get all users, including those without any listening history
SELECT u.user_id, u.name, l.listen_id, l.song_id, l.rating
FROM Users u                          -- LEFT table: ALL users preserved
LEFT JOIN Listens l                   -- Add matching listens, NULLs if no match
  ON u.user_id = l.user_id            -- Join condition
ORDER BY u.user_id, l.listen_id      -- NULLs sort last (or first, DB-dependent)
user_idnamelisten_idsong_idratingAnnotations
1Mickey114.5Mickey → listen 1
1Mickey224.2Mickey → listen 2
1Mickey363.9Mickey → listen 3
2Minnie424.7Minnie → listen 4
2Minnie574.6Minnie → listen 5
2Minnie683.9Minnie → listen 6
3Daffy712.9Daffy → listen 7
3Daffy824.9Daffy → listen 8
3Daffy96NULLDaffy → listen 9
4PlutoNULLNULLNULLPluto preserved (no listens)

RIGHT JOIN

-- Get all listens with user info (all listens preserved)
SELECT u.user_id, u.name, l.listen_id, l.song_id, l.rating
FROM Users u                          -- LEFT table (may have NULLs)
RIGHT JOIN Listens l                  -- RIGHT table: ALL listens preserved
  ON u.user_id = l.user_id            -- NULLs for orphaned listens
ORDER BY l.listen_id                  -- Sort by listen_id
user_idnamelisten_idsong_idratingAnnotations
1Mickey114.5Listen 1 → Mickey
1Mickey224.2Listen 2 → Mickey
1Mickey363.9Listen 3 → Mickey
2Minnie424.7Listen 4 → Minnie
2Minnie574.6Listen 5 → Minnie
2Minnie683.9Listen 6 → Minnie
3Daffy712.9Listen 7 → Daffy
3Daffy824.9Listen 8 → Daffy
3Daffy96NULLListen 9 → Daffy

FULL OUTER JOIN

-- Get all users and all listens, preserving both sides
SELECT u.user_id, u.name, l.listen_id, l.song_id, l.rating
FROM Users u
FULL OUTER JOIN Listens l            -- Preserve ALL rows from BOTH tables
ON u.user_id = l.user_id              -- NULLs for non-matching sides
user_idnamelisten_idsong_idratingAnnotations
1Mickey114.5Mickey → listen 1
1Mickey224.2Mickey → listen 2
1Mickey363.9Mickey → listen 3
2Minnie424.7Minnie → listen 4
2Minnie574.6Minnie → listen 5
2Minnie683.9Minnie → listen 6
3Daffy712.9Daffy → listen 7
3Daffy824.9Daffy → listen 8
3Daffy96NULLDaffy → listen 9
4PlutoNULLNULLNULLPluto preserved

Multiple JOINs

-- Get user names with song details for high-rated listens (rating > 4.0)
SELECT u.name, s.title, s.artist, l.rating
FROM Users u
INNER JOIN Listens l                  -- First join: Users ⟷ Listens
  ON u.user_id = l.user_id            -- Match users to their listens
INNER JOIN Songs s                    -- Second join: Result ⟷ Songs
  ON l.song_id = s.song_id            -- Match listens to song details
WHERE l.rating > 4.0                  -- Filter AFTER all joins complete
ORDER BY u.name, l.rating DESC        -- Sort by user, then rating (high to low)
nametitleartistratingAnnotations
DaffyWillowTaylor Swift4.9U3 → L8 → S2
MickeyEvermoreTaylor Swift4.5U1 → L1 → S1
MickeyWillowTaylor Swift4.2U1 → L2 → S2
MinnieWillowTaylor Swift4.7U2 → L4 → S2
MinnieYellow SubThe Beatles4.6U2 → L5 → S7

JOIN ON: Beyond Simple Equality

The ON clause accepts ANY logical condition, not just =

-- Inequality: Users who gave high ratings
SELECT u.name, l.song_id, l.rating
FROM Users u 
JOIN Listens l ON u.user_id = l.user_id AND l.rating > 4.0

-- Range: Songs released in 2023 with listens
SELECT s.title, l.listen_time
FROM Songs s
JOIN Listens l ON s.song_id = l.song_id 
               AND s.release_date BETWEEN '2023-01-01' AND '2023-12-31'

-- Pattern matching: Gmail users and their listens
SELECT u.email, l.song_id
FROM Users u
JOIN Listens l ON u.user_id = l.user_id 
               AND u.email LIKE '%@gmail.com'

-- Complex: Listens where user played > 50% of song
SELECT u.name, s.title, l.listen_duration, s.duration
FROM Listens l
JOIN Users u ON l.user_id = u.user_id
JOIN Songs s ON l.song_id = s.song_id 
            AND l.listen_duration > s.duration * 0.5

Key Point: Use AND/OR to combine conditions. The ON clause is evaluated for each potential row pair.


Common Patterns

Common Mistakes

⚠️ Cartesian Product Warning: Missing ON clause multiplies rows!
4 users × 9 listens = 36 rows (see example below)

Forgetting JOIN conditions

-- Usually WRONG: Missing ON clause creates cartesian product
SELECT * FROM Users, Listens          -- Creates 4×9 = 36 rows!

-- CORRECT: Explicit JOIN with ON condition
SELECT * FROM Users 
JOIN Listens ON Users.user_id = Listens.user_id  -- Only matching pairs

Problem: Missing ON clause creates cartesian product

Wrong JOIN type for requirement

-- WRONG: INNER JOIN excludes users without listens
SELECT * FROM Users u 
INNER JOIN Listens l ON u.user_id = l.user_id    -- Pluto disappears!

-- CORRECT: LEFT JOIN preserves all users even without listens
SELECT * FROM Users u 
LEFT JOIN Listens l ON u.user_id = l.user_id     -- Pluto gets NULL columns

Problem: INNER JOIN excludes non-matching rows

Ambiguous column names

-- WRONG: Ambiguous user_id column exists in both tables
SELECT user_id FROM Users JOIN Listens ON ...     -- Error: Which user_id?

-- CORRECT: Specify table name to resolve ambiguity
SELECT Users.user_id FROM Users JOIN Listens ON ...  -- Or u.user_id with alias

Problem: Both tables have user_id column