Block Nested Loop Join: The Simple Double Loop
The Algorithm: Load Blocks, Scan Everything
The Block Nested Loop Join (BNLJ) is straightforward: it's a double-for-loop in disguise. Here's the play-by-play:
-
Outer loop: Load blocks of B pages from the smaller table.
-
Inner loop: For each block, scan the entire larger table.
-
Join: Evaluate the join condition for every pair.
Cost Analysis
BNLJ(Songs, Listens):
P(Songs) + ⌈P(Songs)/B⌉ × P(Listens) + OUT
- P(Songs): Read outer table once
- ⌈P(Songs)/B⌉: Number of blocks
- P(Listens): Scan inner for each block
Choose Outer Wisely:
Always put the smaller table as outer!
Songs: 100 pages
Listens: 10,000 pages
→ Use BNLJ(Songs, Listens)
Example: Songs ⋈ Listens
Find all listens for each song
Given:
- P(Songs) = 100 pages
- P(Listens) = 10,000 pages
- Buffer B = 10 pages
BNLJ Cost:
- Read Songs: 100
- Blocks: ⌈100/10⌉ = 10
- Scan Listens: 10 × 10,000 = 100,000
- Total: 100,100 IOs
Wrong Choice: Listens as Outer
If we used BNLJ(Listens, Songs):
- Blocks: ⌈10,000/10⌉ = 1,000
- Scan Songs: 1,000 × 100 = 100,000
- Total: 110,000 IOs (10% worse!)
Non-Equality Joins: Where BNLJ Shines
BNLJ handles ANY join condition!
Unlike other JOIN algorithms, BNLJ supports:
- Similarity:
similarity(s.features, a.style) > 0.8 - Complex conditions:
s.duration > a.avg_length * 1.5 AND s.genre = a.genre
Real Example: Find Songs Similar to Artist's Style
-- Find songs similar to what the artist usually makes
SELECT s.song_id, a.artist_id
FROM Songs s
JOIN Artists a ON distance(s.audio_vector, a.style_vector) < 0.2
AND s.duration BETWEEN a.min_duration AND a.max_duration;
-- Only BNLJ can handle this complex join!
When BNLJ Wins
Use BNLJ When:
- Non-equality joins
- One table is small
- Large buffer available
- No indexes exist
Avoid BNLJ When:
- Both tables are huge
- Simple equality join
- Tables pre-sorted
- Small buffer