Bloom Filters

Your Secret Weapon for 30% of Interview Questions

One line: A space-efficient probabilistic data structure that tells you "definitely not in set" or "maybe in set".

10x

Space savings

O(1)

Insert & lookup

Typical false positive rate

False negatives

Visual: How Bloom Filters Work

Problem-Solving Applications

For detailed examples of how Bloom filters solve real-world problems at scale, see the Data Structures Problem Solving guide, which covers:

CDN Cache Optimization: How Akamai and Cloudflare use Bloom filters to skip disk checks
Web Crawler Efficiency: How Google and Bing prevent duplicate page crawling
Real-time Username Validation: How social platforms provide instant availability feedback

Pseudocode

// Bloom Filter - 20 lines that save 10x space
class BloomFilter:
    bits = BitArray(size=m)      // m bits, all 0
    k = 3                         // number of hash functions

    Add(item):
        for i in 0..k-1:
            index = hash_i(item) % m
            bits[index] = 1

    Contains(item):
        for i in 0..k-1:
            index = hash_i(item) % m
            if bits[index] == 0:
                return False      // Definitely not in set
        return True               // Maybe in set

// Optimal parameters (math magic)
m = -n * ln(p) / (ln(2)^2)      // bits needed
k = m/n * ln(2)                  // hash functions

Example: 1M items, 1% false positive
→ m = 9.6M bits = 1.2MB
→ k = 7 hash functions

The Trade-off

Space per Item	False Positive Rate	Hash Functions (k)
4 bits	10%	3
10 bits	1%	7
14 bits	0.1%	10

Rule of thumb: 10 bits per item = 1% false positives = sweet spot

Key Takeaway

Bloom filters trade a tiny false positive rate for massive space savings. They're perfect when:

"Definitely no" is useful (cache misses, crawled URLs)
Dataset is huge (billions of items)
Some false positives are acceptable

Interview tip: Mention Bloom filters for any "have we seen this?" problem at scale!