Block Nested Loop Join: The Simple Double Loop
The Algorithm: Load Blocks, Scan Everything
BNLJ uses a double-for-loop:
-
Outer loop: Read blocks of B pages from smaller table
-
Inner loop: Scan entire larger table for each block
-
Join: Check join condition for all pairs
def BNLJ(Songs, Listens, B):
for block_start in range(0, P(Songs), B):
buffer = load_pages(Songs, block_start, min(B, P(Songs) - block_start))
for page in Listens:
listens_page = load_page(Listens, page)
for song in buffer:
for listen in listens_page:
if join_condition(song, listen):
output(song ⋈ listen)
Cost Analysis
BNLJ(Songs, Listens):
P(Songs) + ⌈P(Songs)/B⌉ × P(Listens) + OUT
- P(Songs): Read outer table once
- ⌈P(Songs)/B⌉: Number of blocks
- P(Listens): Scan inner for each block
Choose Outer Wisely:
Always put smaller table as outer!
Songs: 100 pages
Listens: 10,000 pages
→ Use BNLJ(Songs, Listens)
Example: Songs ⋈ Listens
Find all listens for each song
Given:
- P(Songs) = 100 pages
- P(Listens) = 10,000 pages
- Buffer B = 10 pages
BNLJ Cost:
- Read Songs: 100
- Blocks: ⌈100/10⌉ = 10
- Scan Listens: 10 × 10,000 = 100,000
- Total: 100,100 IOs
Wrong Choice: Listens as Outer
If we used BNLJ(Listens, Songs):
- Blocks: ⌈10,000/10⌉ = 1,000
- Scan Songs: 1,000 × 100 = 100,000
- Total: 110,000 IOs (10% worse!)
Non-Equality Joins: Where BNLJ Shines
BNLJ handles ANY join condition!
Unlike other JOIN algorithms we'll see later, BNLJ supports:
- Similarity:
similarity(s.features, a.style) > 0.8 - Complex conditions:
s.duration > a.avg_length * 1.5 AND s.genre = a.genre
Real Example: Find Songs Similar to Artist's Style
-- Find songs similar to what the artist usually makes
SELECT s.song_id, a.artist_id
FROM Songs s
JOIN Artists a ON distance(s.audio_vector, a.style_vector) < 0.2
AND s.duration BETWEEN a.min_duration AND a.max_duration;
-- Only BNLJ can handle this complex join!
When BNLJ Wins
Use BNLJ When:
- Non-equality joins
- One table is small
- Large buffer available
- No indexes exist
Avoid BNLJ When:
- Both tables are huge
- Simple equality join
- Tables pre-sorted
- Small buffer