Bloom Filters
Your Secret Weapon for 30% of Interview Questions
One line: A space-efficient probabilistic data structure that tells you "definitely not in set" or "maybe in set".
Visual: How Bloom Filters Work
Problem-Solving Applications
For detailed examples of how Bloom filters solve real-world problems at scale, see the Data Structures Problem Solving guide, which covers:
-
CDN Cache Optimization: How Akamai and Cloudflare use Bloom filters to skip disk checks
-
Web Crawler Efficiency: How Google and Bing prevent duplicate page crawling
-
Real-time Username Validation: How social platforms provide instant availability feedback
Pseudocode
// Bloom Filter - 20 lines that save 10x space
class BloomFilter:
bits = BitArray(size=m) // m bits, all 0
k = 3 // number of hash functions
Add(item):
for i in 0..k-1:
index = hash_i(item) % m
bits[index] = 1
Contains(item):
for i in 0..k-1:
index = hash_i(item) % m
if bits[index] == 0:
return False // Definitely not in set
return True // Maybe in set
// Optimal parameters (math magic)
m = -n * ln(p) / (ln(2)^2) // bits needed
k = m/n * ln(2) // hash functions
Example: 1M items, 1% false positive
→ m = 9.6M bits = 1.2MB
→ k = 7 hash functions
The Trade-off
| Space per Item | False Positive Rate | Hash Functions (k) |
|---|---|---|
| 4 bits | 10% | 3 |
| 10 bits | 1% | 7 |
| 14 bits | 0.1% | 10 |
Rule of thumb: 10 bits per item = 1% false positives = sweet spot
Key Takeaway
Bloom filters trade a tiny false positive rate for massive space savings. They're perfect when:
-
"Definitely no" is useful (cache misses, crawled URLs)
-
Dataset is huge (billions of items)
-
Some false positives are acceptable
Interview tip: Mention Bloom filters for any "have we seen this?" problem at scale!