Block Nested Loop Join: The Simple Double Loop

BNLJ: For Each Block of Songs, Scan All Listens Songs (Outer) P(Songs) = 100 pages Block 1: Pages 1-10 Block 2: Pages 11-20 ... (10 blocks total) Buffer (B = 10 pages) Current: Block 1 of Songs Listens (Inner) P(Listens) = 10,000 pages Scan all 10,000 pages for each block of Songs 30% Join Result Matches from Block 1 × All Listens The Double Loop for each block of B pages from Songs: // Outer: ⌈100/10⌉ = 10 iterations for each page in Listens: // Inner: 10,000 iterations for each (song, listen) pair in (buffer, current_page): if join_condition(song, listen): output(song ⋈ listen)

The Algorithm: Load Blocks, Scan Everything

BNLJ uses a double-loop approach:

  1. Outer loop: Read blocks of B pages from smaller table

  2. Inner loop: Scan entire larger table for each block

  3. Join: Check join condition for all pairs

def BNLJ(Songs, Listens, B):
    for block_start in range(0, P(Songs), B):
        buffer = load_pages(Songs, block_start, min(B, P(Songs) - block_start))

        for page in Listens:
            listens_page = load_page(Listens, page)

            for song in buffer:
                for listen in listens_page:
                    if join_condition(song, listen):
                        output(song ⋈ listen)

Cost Analysis

BNLJ(Songs, Listens):

P(Songs) + ⌈P(Songs)/B⌉ × P(Listens) + OUT

  • P(Songs): Read outer table once
  • ⌈P(Songs)/B⌉: Number of blocks
  • P(Listens): Scan inner for each block

Choose Outer Wisely:

Always put smaller table as outer!
Songs: 100 pages
Listens: 10,000 pages
→ Use BNLJ(Songs, Listens)


Example: Songs ⋈ Listens

Find all listens for each song

Given:

  • P(Songs) = 100 pages
  • P(Listens) = 10,000 pages
  • Buffer B = 10 pages

BNLJ Cost:

  • Read Songs: 100
  • Blocks: ⌈100/10⌉ = 10
  • Scan Listens: 10 × 10,000 = 100,000
  • Total: 100,100 IOs

Wrong Choice: Listens as Outer

If we used BNLJ(Listens, Songs):

  • Blocks: ⌈10,000/10⌉ = 1,000
  • Scan Songs: 1,000 × 100 = 100,000
  • Total: 110,000 IOs (10% worse!)

Non-Equality Joins: Where BNLJ Shines

BNLJ handles ANY join condition!

Unlike Hash Join (equality only) or Sort-Merge (equality/inequality), BNLJ supports:

  • Range joins: l.timestamp BETWEEN s.release_date AND s.end_date
  • Similarity: similarity(s.features, a.style) > 0.8
  • Complex conditions: s.duration > a.avg_length * 1.5 AND s.genre = a.genre

Real Example: Find Songs Similar to Artist's Style

-- Find songs similar to what the artist usually makes
SELECT s.song_id, a.artist_id
FROM Songs s
JOIN Artists a ON distance(s.audio_vector, a.style_vector) < 0.2
                  AND s.duration BETWEEN a.min_duration AND a.max_duration;

-- Only BNLJ can handle this complex join!

When BNLJ Wins

Small Tables with Complex Joins

✅ Use BNLJ When:

  • Non-equality joins
  • One table is small
  • Large buffer available
  • No indexes exist

❌ Avoid BNLJ When:

  • Both tables are huge
  • Simple equality join
  • Tables pre-sorted
  • Small buffer


Buffer Impact

The key to BNLJ performance is buffer utilization:

Buffer Size    Blocks    Scans of Listens    Total IO
B = 1          100       100 × 10,000        1,000,100
B = 10         10        10 × 10,000         100,100  
B = 100        1         1 × 10,000          10,100

Rule: If B ≥ P(smaller table), BNLJ needs only ONE scan!


Key Takeaways

The Double Loop

  • Outer: ⌈P(R)/B⌉ blocks
  • Inner: P(S) full scans
  • Cost grows multiplicatively!

Asymmetry Matters

  • BNLJ(Small, Large) << BNLJ(Large, Small)
  • Always put smaller table as outer

Unique Strength: Non-Equality Joins

  • Only BNLJ handles arbitrary conditions
  • Essential for similarity, range, and complex joins
  • No preprocessing = works with any join predicate

When to Use

  1. Non-equality joins (no other choice!)
  2. Small outer table + large buffer
  3. No indexes or pre-sorting available