Block Nested Loop Join: The Simple Double Loop
The Algorithm: Load Blocks, Scan Everything
BNLJ uses a double-loop approach:
-
Outer loop: Read blocks of B pages from smaller table
-
Inner loop: Scan entire larger table for each block
-
Join: Check join condition for all pairs
def BNLJ(Songs, Listens, B):
for block_start in range(0, P(Songs), B):
buffer = load_pages(Songs, block_start, min(B, P(Songs) - block_start))
for page in Listens:
listens_page = load_page(Listens, page)
for song in buffer:
for listen in listens_page:
if join_condition(song, listen):
output(song ⋈ listen)
Cost Analysis
BNLJ(Songs, Listens):
P(Songs) + ⌈P(Songs)/B⌉ × P(Listens) + OUT
- P(Songs): Read outer table once
- ⌈P(Songs)/B⌉: Number of blocks
- P(Listens): Scan inner for each block
Choose Outer Wisely:
Always put smaller table as outer!
Songs: 100 pages
Listens: 10,000 pages
→ Use BNLJ(Songs, Listens)
Example: Songs ⋈ Listens
Find all listens for each song
Given:
- P(Songs) = 100 pages
- P(Listens) = 10,000 pages
- Buffer B = 10 pages
BNLJ Cost:
- Read Songs: 100
- Blocks: ⌈100/10⌉ = 10
- Scan Listens: 10 × 10,000 = 100,000
- Total: 100,100 IOs
Wrong Choice: Listens as Outer
If we used BNLJ(Listens, Songs):
- Blocks: ⌈10,000/10⌉ = 1,000
- Scan Songs: 1,000 × 100 = 100,000
- Total: 110,000 IOs (10% worse!)
Non-Equality Joins: Where BNLJ Shines
BNLJ handles ANY join condition!
Unlike Hash Join (equality only) or Sort-Merge (equality/inequality), BNLJ supports:
- Range joins:
l.timestamp BETWEEN s.release_date AND s.end_date - Similarity:
similarity(s.features, a.style) > 0.8 - Complex conditions:
s.duration > a.avg_length * 1.5 AND s.genre = a.genre
Real Example: Find Songs Similar to Artist's Style
-- Find songs similar to what the artist usually makes
SELECT s.song_id, a.artist_id
FROM Songs s
JOIN Artists a ON distance(s.audio_vector, a.style_vector) < 0.2
AND s.duration BETWEEN a.min_duration AND a.max_duration;
-- Only BNLJ can handle this complex join!
When BNLJ Wins
Small Tables with Complex Joins
✅ Use BNLJ When:
- Non-equality joins
- One table is small
- Large buffer available
- No indexes exist
❌ Avoid BNLJ When:
- Both tables are huge
- Simple equality join
- Tables pre-sorted
- Small buffer
Buffer Impact
The key to BNLJ performance is buffer utilization:
Buffer Size Blocks Scans of Listens Total IO
B = 1 100 100 × 10,000 1,000,100
B = 10 10 10 × 10,000 100,100
B = 100 1 1 × 10,000 10,100
Rule: If B ≥ P(smaller table), BNLJ needs only ONE scan!
Key Takeaways
The Double Loop
- Outer: ⌈P(R)/B⌉ blocks
- Inner: P(S) full scans
- Cost grows multiplicatively!
Asymmetry Matters
- BNLJ(Small, Large) << BNLJ(Large, Small)
- Always put smaller table as outer
Unique Strength: Non-Equality Joins
- Only BNLJ handles arbitrary conditions
- Essential for similarity, range, and complex joins
- No preprocessing = works with any join predicate
When to Use
- Non-equality joins (no other choice!)
- Small outer table + large buffer
- No indexes or pre-sorting available