Case Study 2.3: How does Google Chrome protect 3 billion users using Bloom Filters?

Case Study 2.3 Reading Time: 6 mins

How does Google Chrome protect 3 billion users from malicious sites?

Objective: Achieve 100x data reduction for secure, privacy-conscious lookups.

Google's Safe Browsing service flags over 100 million malicious URLs, ranging from phishing to malware. The mission: shield every Chrome user from these threats. But two hurdles stand in the way: Scale and Privacy.


The Challenges: Scale and Privacy

  1. Scale: Storing 100 million URLs, each averaging 80 characters, demands roughly 8 GB of storage. Expecting every device to download this blacklist is impractical.

  2. Privacy: Constantly pinging Google's servers with every URL you visit raises privacy red flags. Nobody wants a tech giant tracking their every click.

  3. Infrastructure: With 3 billion users online, querying Google's servers for each URL would flood the system with trillions of requests daily, straining bandwidth and risking server overload. A single point of failure isn't an option.


The Solution: The 80MB Local Filter

Chrome opts for a probabilistic method. Instead of the full list, it downloads a compressed Bloom filter of URL hashes.

The "100x" Compression Advantage:


Technical Workflow: The Privacy-Preserving Check

The Bloom filter enables Chrome to assess "is this site bad?" directly on your device.

  1. Local Check: Chrome hashes the URL and checks against its local 80MB Bloom filter.

  2. Confidential Path: If the filter returns "Not in set," the URL is safe. Chrome doesn't inform Google, keeping your browsing history private.

  3. "K-Anonymity" Trick: If the filter indicates "Maybe in set" (potential threat), Chrome sends only a prefix (e.g., first 32 bits) of the hash. Google responds with all malicious hashes sharing that prefix, and Chrome completes the check locally.


Python Implementation: Malicious URL Filter

import mmh3 # MurmurHash3
FROM bitarray import bitarray

class SafeBrowsingFilter:
    def __init__(self, expected_items, fpr):
        # Math: bits_per_item = -1.44 * log2(fpr)
        # For 1% FPR (0.01), we need ~10 bits per item
        self.size = expected_items * 10 
        self.bit_array = bitarray(self.size)
        self.bit_array.setall(0)
        self.hash_count = 7 # Optimal for 1% error

    def add_malicious_url(self, url):
        for seed IN range(self.hash_count):
            index = mmh3.hash(url, seed) % self.size
            self.bit_array[index] = 1

    def check_url(self, url):
        for seed IN range(self.hash_count):
            index = mmh3.hash(url, seed) % self.size
            if self.bit_array[index] == 0:
                return "SAFE" # 100% Guaranteed 
        return "MAYBE_THREAT" # Perform prefix-check validation

# Setup filter for 1M malicious URLs
sbf = SafeBrowsingFilter(1_000_000, 0.01)
sbf.add_malicious_url("phish-bank.ru")

# Instant local lookup
print(f"'google.com': {sbf.check_url('google.com')}") # SAFE
print(f"'phish-bank.ru/is-this-is-a-malicious-link.html': {sbf.check_url('phish-bank.ru/is-this-a-malicious-link.html')}") # MAYBE_THREAT

Comparison: Why not just use a Database?

Method Space (100M URLs) Privacy Level User Experience
Full DB Download ~8 GB High (Local) Slow / Data Heavy
Cloud Lookup 0 MB Zero (Google sees all) Network Lag
Bloom Filter ~80 MB High (Google sees <1%) Instant

Conclusion: The Bloom filter serves as a "Privacy Shield," allowing the browser to handle 99.9% of browsing locally. It only consults the cloud for a prefix check when a threat is likely.