Block Nested Loop Join: The Simple Double Loop

Concept. Block nested loop join reads the inner table once into a RAM-sized block, then scans the outer table once for each block, matching rows with the predicate as it goes.

Intuition. When you need to join a small Songs table to a big Listens table, load Songs into RAM in one block, then scan Listens once and check each listen against every song in the block. Cost: read Songs once, read Listens ⌈P(Songs)/B⌉ times.

The Algorithm: Load Blocks, Scan Everything

A double-for-loop in disguise. Outer loop loads blocks of B pages from the smaller table; inner loop scans the entire larger table for each block; the join predicate fires on every pair.

Block Nested Loop Join: the smaller table R is loaded into RAM once. The larger table S is then streamed page-by-page; each S page is probed against the R block in RAM. Cost when R fits in 1 block: P(R) + P(S) page reads.

Example: Songs ⋈ Listens

Find all listens for each song

Given:

P(Songs) = 100 pages
P(Listens) = 10,000 pages
Buffer B = 10 pages

BNLJ Cost:

Read Songs: 100
Blocks: ⌈100/10⌉ = 10
Scan Listens: 10 × 10,000 = 100,000
Total: 100,100 IOs

Wrong Choice: Listens as Outer

If we used BNLJ(Listens, Songs):

Blocks: ⌈10,000/10⌉ = 1,000
Scan Songs: 1,000 × 100 = 100,000
Total: 110,000 IOs (10% worse!)

Non-Equality Joins: Where BNLJ Shines

BNLJ handles ANY join condition!

Unlike other JOIN algorithms, BNLJ supports:

Similarity: similarity(s.features, a.style) > 0.8
Complex conditions: s.duration > a.avg_length * 1.5 AND s.genre = a.genre

Real Example: Find Songs Similar to Artist's Style

-- Find songs similar to what the artist usually makes
SELECT s.song_id, a.artist_id
FROM Songs s
JOIN Artists a ON distance(s.audio_vector, a.style_vector) < 0.2
                  AND s.duration BETWEEN a.min_duration AND a.max_duration;

-- Only BNLJ can handle this complex join!

When BNLJ Wins

Use BNLJ When:

Non-equality joins
One table is small
Large buffer available
No indexes exist

Avoid BNLJ When:

Both tables are huge
Simple equality join
Tables pre-sorted
Small buffer

Hash Partition Join → When both tables are too big for BNLJ and the join is an equality, hash partitioning turns a quadratic problem into a linear one.