Hash Partition Join: Joining Tables Bigger Than RAM

When Both Tables Don't Fit in Memory

Hash Partition Join: Same Hash → Same Partition → Join Users Table 100 pages (6.4GB) user_id | name | country --------|----------|-------- 42 | Alice | USA 89 | Bob | UK 17 | Charlie | Canada ... Listens Table 1000 pages (64GB) user_id | song_id | plays --------|---------|------ 42 | 101 | 5 89 | 202 | 3 42 | 303 | 7 ... Phase 1: Partition Both Tables Same Hash Function h(user_id) % 10 Users Partitions U0 U1 U2 ... U9 Listens Partitions L0 L1 L2 ... L9 Key: user_id=42 → h(42)%10=2 → Goes to partition 2 in BOTH tables All rows with same user_id end up in matching partition numbers! Phase 2: Join Matching Partitions Join(U0, L0) → Load both in RAM, hash join 6.4MB + 64MB = 70MB (fits in RAM!) Join(U2, L2) → user_id=42 matches found here! Alice's listening history assembled ... Join all 10 partition pairs independently ... Final Result user_id | name | song_id | plays --------|--------|---------|------ 42 | Alice | 101 | 5 42 | Alice | 303 | 7 Hash Partition Join Algorithm Phase 1: Partition (2N + 2M IOs) 1. Read Users table, partition by h(user_id) % B → B files 2. Read Listens table, partition by h(user_id) % B → B files Phase 2: Join (N + M IOs) 3. For each partition i from 0 to B-1: - Load Users[i] and Listens[i] into RAM - Perform in-memory hash join

The Join Problem: Tables Too Big for RAM

Users
6.4GB table
Listens
64GB table
6.4GB
Available RAM
Join?
Won't fit!

The Query We Want

-- Get user details with their listening history
SELECT u.name, u.country, l.song_id, l.plays
FROM Users u
JOIN Listens l ON u.user_id = l.user_id;

-- Problem: Can't load both tables in RAM!
-- Users: 6.4GB + Listens: 64GB = 70.4GB > 6.4GB RAM

Hash Partition Join: The Solution

Core Insight

If we partition both tables using the same hash function on the join key, then:

📋 PSEUDOCODE: Hash Partition Join

HashPartitionJoin(T1, T2, key):    // Assume |T1| < |T2|
    B = √(|T2|/RAM)    // partition count

    // Phase 1: Partition both tables (uses B-1 buffers + 1 input)
    for row in T1: write to T1_p[h(row.key) % B]
    for row in T2: write to T2_p[h(row.key) % B]    // same hash!

    // Phase 2: Join matching partitions
    for i in 0..B-1:
        HT = hash_table(T1_p[i])    // build on smaller
        for row in T2_p[i]:          // probe with larger
            if HT[row.key]: output matches

    Approximate Cost: 3(N+M) + OUT IOs

Two-Table Join Example

Joining Users and Listens

SELECT u.*, l.* 
FROM Users u 
JOIN Listens l ON u.user_id = l.user_id
WHERE u.user_id = 42;

Step-by-step execution:

Given:

- Users: 100 pages (6.4GB)

- Listens: 1000 pages (64GB)  

- RAM: 100 pages (6.4GB)

- B = 10 partitions

Phase 1 - Partition:

1. Read each Users page

2. Hash each user_id: h(42)%10=2, h(89)%10=9, etc.

3. Write to appropriate partition file

4. Repeat for Listens table with SAME hash function

Result after Phase 1:

- 10 Users partition files (~640MB each)

- 10 Listens partition files (~6.4GB each)

- User 42 is in Users_partition_2

- ALL of User 42's listens are in Listens_partition_2

Phase 2 - Join:
For partition 0: Join(Users_p0, Listens_p0)
For partition 1: Join(Users_p1, Listens_p1)
...
For partition 9: Join(Users_p9, Listens_p9)

Each join fits in memory!

Key Point: user_id=42 will be in partition 2 in BOTH tables!


Multi-Table Joins

Joining Three Tables: Users, Listens, Songs

SELECT u.name, s.title, l.plays
FROM Users u
JOIN Listens l ON u.user_id = l.user_id
JOIN Songs s ON l.song_id = s.song_id;

📋 PSEUDOCODE: Multi-Way Join

ThreeWayJoin(U, L, S):    // Assume |U| < |L| < |S|
    temp = HPJ(U, L, "user_id")    // U ⋈ L
    result = HPJ(temp, S, "song_id")    // temp ⋈ S
    return result

Approximate Cost: 3(|U|+|L|) + |temp| + 3(|temp|+|S|) + OUT

Handling Skew in HPJ

Problem: Hot Keys

What if user_id=1 (system account) has 10M listens?

# Partition sizes become uneven:
# Partition 0: 100MB (normal)
# Partition 1: 10GB (hot key!)  <-- Won't fit in RAM
# Partition 2: 120MB (normal)

Solution: Handle Skew

When a partition is too large (hot key):

  1. Detect: Monitor partition sizes during Phase 1

  2. Split: Re-partition the large partition with a different hash

  3. Recurse: Apply HPJ recursively to the sub-partitions


Why Hash Partition Join Works

Correctness Guarantee

  1. Same hash function on both tables

  2. Same join key for partitioning

  3. Result: Every matching pair ends up in same partition

Example Trace

User Alice (id=42) in Users table:
  hash(42) % 10 = 2  → Goes to Users_partition_2

Alice's listens in Listens table:
  Row 1: (user_id=42, song_id=101)
    hash(42) % 10 = 2  → Goes to Listens_partition_2

  Row 2: (user_id=42, song_id=303)
    hash(42) % 10 = 2  → Goes to Listens_partition_2

When we join partition 2:
  - Alice is in Users_partition_2
  - ALL her listens are in Listens_partition_2
  - They meet during the join!

Real-World Usage

PostgreSQL

EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM huge1 JOIN huge2 USING (id);

-- Hash Join
--   -> Hash partition huge1
--   -> Hash partition huge2
--   -> Join partitions

Apache Spark

# Spark automatically uses HPJ for large joins
df1 = spark.read.parquet("users")
df2 = spark.read.parquet("listens")

# This triggers hash partition join
result = df1.join(df2, "user_id")

# You can control partitions
result = df1.repartition(200, "user_id") \
           .join(df2.repartition(200, "user_id"), "user_id")

Optimization Tips

  1. Partition count: Choose B ≈ sqrt(larger_table_size/RAM)

  2. Build on smaller: Always build hash table on smaller relation

  3. Broadcast small tables: If one table fits in RAM, broadcast it

  4. Monitor skew: Track partition sizes, handle hot keys


Key Takeaways

  1. HPJ enables joining tables >> RAM

  2. Same hash → Same partition ensures correctness

  3. Two phases: Partition then join

  4. Extends to multiple tables via sequential joins

  5. Handle skew with recursive partitioning

  6. Foundation of distributed joins in Spark/Hadoop

The Punchline: Hash once, join anywhere! Partitioning turns big joins into parallel local joins.