Hash Partitioning: Divide and Conquer
When Your Table is 10× Bigger Than RAM
The Hash Partitioning Algorithm
Core Idea
Split the data into B smaller chunks where each chunk contains all rows for a subset of keys.
📋 ALGORITHM: Hash Partitioning
HashPartition(file, B, key):
output_buffers[B] // B output buffers + 1 input buffer
for row in file:
p = hash(row.key) % B
output_buffers[p].add(row)
if output_buffers[p].full:
write(output_buffers[p] to dbfile_p)
output_buffers[p].clear()
flush_all(output_buffers)
Result: B partitions, each ≈ N/B pages
Cost: 2N IOs (read + write)
Key Property: Same key always goes to same partition
Example Problem: Break Big Problem Into Small Problems
Strategy: When facing a problem too large to solve at once, divide it into smaller, manageable pieces that can each be solved independently. Hash partitioning is the perfect example of this divide-and-conquer approach.
Why Can't We Just Process It All?
-- This query needs to group all data by user_id
SELECT user_id, COUNT(*) as play_count
FROM Listens
GROUP BY user_id;
-- Problem: Need all rows for each user in RAM at once!
-- With 500M rows, this won't fit
Two-Phase Execution: Applying the Algorithm
Now let's see how hash partitioning solves our example problem:
-- PROBLEM: This query requires all user data together
SELECT user_id, COUNT(*) as play_count
FROM Listens GROUP BY user_id;
-- But 64GB data won't fit in 6.4GB RAM!
SOLUTION: Two-Phase Approach
Phase 1: Partition → Hash data into 100 manageable files
-
Apply hash(user_id) % 100 to each row
-
Result: 100 DBFiles, each ~640MB (fits in RAM!)
-
All data for each user stays in the same partition
Phase 2: Process → Each file independently
-- Process each partition file separately:
for each_partition_file:
load_file_into_ram() // Now it fits!
SELECT user_id, COUNT(*)
FROM current_partition
GROUP BY user_id; // Fast in-memory processing
-- Combine results from all 100 partitions = final answer
Cost Analysis
Hash Partitioning Cost: C_r×N + C_w×N
-
Read N pages from original file → C_r×N cost
-
Write N pages to B partitioned DBFiles → C_w×N cost
-
Total: C_r×N + C_w×N IOs for partitioning step
-
For C_r = C_w = 1: 2N IOs (typical case)
Additional Processing on Partitioned Files
After partitioning, each smaller file can be processed independently:
-
GROUP BY aggregation: Read each DBFile once → +C_r×N cost
-
Sorting within partitions: Read + write each DBFile → +(C_r×N + C_w×N) cost
-
Hash joins: Build hash tables from each DBFile → +C_r×N cost
Memory Requirement
-
Minimum: B+1 pages (B output buffers + 1 input buffer)
-
Optimal: 2B pages (double buffering)
Comparison
| Approach | Memory Needed | Hash Partition Cost | Additional Processing | Feasible? |
|---|---|---|---|---|
| Load everything | N pages (64GB) | N/A | C_r×N | ❌ Won't fit |
| Process page-by-page | 2 pages | N/A | Multiple passes | ❌ Complex |
| Hash partition | B pages (6.4GB) | C_r×N + C_w×N | +C_r×N to +(C_r×N + C_w×N) | ✅ Perfect! |
Use Cases & Real-World Systems
GROUP BY Operations
SELECT user_id, COUNT(*), AVG(rating)
FROM Listens
GROUP BY user_id;
Apache Spark automatically hash partitions by user_id:
df = spark.read.parquet("listens")
df.groupBy("user_id").count() # Auto-partitioned
df.repartition(200, "user_id") # Control partition count
Large Joins
PostgreSQL uses hash partitioning for big joins:
EXPLAIN SELECT * FROM huge_table1 JOIN huge_table2 USING (key);
-- Hash Join: partition both tables, join matching partitions
Distributed Processing
-
BigQuery: 1000s of machines, each processes partitions in parallel
-
MapReduce/Hadoop: Foundation of distributed computing
-
Spark: Each DBFile → different executor
Handling Data Skew
# Problem: User 42 has 10M listens, User 99 has 10
# Solution: Multi-level partitioning or pre-splitting hot keys
Key Takeaways
-
Divide and Conquer: Split big problems into small ones
-
Hash Ensures Completeness: All related data stays together
-
Linear IO Cost: 3N is much better than N²
-
Foundation of Big Data: This is how PostGres, BigQuery, Spark, Hadoop work
-
Memory Efficient: Process TB of data with GB of RAM