Bloom Filters

Your Secret Weapon for 30% of Interview Questions

One line: A space-efficient probabilistic data structure that tells you "definitely not in set" or "maybe in set".

10x
Space savings
O(1)
Insert & lookup
1%
Typical false positive rate
0%
False negatives

Visual: How Bloom Filters Work

Bloom Filter: Just a Bit Array + Hash Functions 16-bit array (real ones use millions of bits) 0 [0] 0 [1] 0 [2] 1 [3] 0 [4] 0 [5] 0 [6] 1 [7] 0 [8] 1 [9] 0 [10] 1 [11] 0 [12] 0 [13] 1 [14] 0 [15] INSERT "alice" hash1("alice") % 16 = 3 → bit[3] = 1 hash2("alice") % 16 = 7 → bit[7] = 1 hash3("alice") % 16 = 14 → bit[14] = 1 Result: Bits 3, 7, 14 are now set to 1 LOOKUP "bob" hash1("bob") % 16 = 3 → bit[3] = 1 ✓ hash2("bob") % 16 = 9 → bit[9] = 1 ✓ hash3("bob") % 16 = 2 → bit[2] = 0 ✗ NOT IN SET (guaranteed) Key Points • If ANY bit is 0 → Item definitely NOT in set (100% certain) • If ALL bits are 1 → Item MAYBE in set (could be false positive) • More bits = lower false positive rate. Formula: (1-e^(-kn/m))^k Real world: 10 bits/item = 1% false positive 1 billion items = 1.2 GB Bloom filter vs 32+ GB for HashSet

Problem-Solving Applications

For detailed examples of how Bloom filters solve real-world problems at scale, see the Data Structures Problem Solving guide, which covers:


Pseudocode

// Bloom Filter - 20 lines that save 10x space
class BloomFilter:
    bits = BitArray(size=m)      // m bits, all 0
    k = 3                         // number of hash functions

    Add(item):
        for i in 0..k-1:
            index = hash_i(item) % m
            bits[index] = 1

    Contains(item):
        for i in 0..k-1:
            index = hash_i(item) % m
            if bits[index] == 0:
                return False      // Definitely not in set
        return True               // Maybe in set

// Optimal parameters (math magic)
m = -n * ln(p) / (ln(2)^2)      // bits needed
k = m/n * ln(2)                  // hash functions

Example: 1M items, 1% false positive
→ m = 9.6M bits = 1.2MB
→ k = 7 hash functions

The Trade-off

Space per Item False Positive Rate Hash Functions (k)
4 bits 10% 3
10 bits 1% 7
14 bits 0.1% 10

Rule of thumb: 10 bits per item = 1% false positives = sweet spot


Key Takeaway

Bloom filters trade a tiny false positive rate for massive space savings. They're perfect when:

Interview tip: Mention Bloom filters for any "have we seen this?" problem at scale!