Case Study 2.3: How does Google Chrome protect 3 billion users using Bloom Filters?

Case Study 2.3 Reading Time: 6 mins

How does Google Chrome protect 3 billion users from malicious sites?

Goal: 100x data reduction for high-security, privacy-sensitive lookups.

Google's Safe Browsing service identifies over 100 million malicious URLs (phishing, malware, etc.). Google wants to protect every Chrome user from these sites, but they face two engineering and product challenges: Scale and Privacy.


The Challenges: Scale and Privacy

  1. The Scale Problem: 100M malicious URLs with an average length of 80 characters requires ~8 GB of raw storage. You can't ask every smartphone and laptop user to download and store an 8GB "blacklist".

  2. The Privacy Problem: Sending every URL you visit to Google's servers to check if it's "safe" is a privacy consideration (e.g., Google would know your entire browsing history).

  3. The Infrastructure Problem: With 3 billion users clicking links simultaneously, checking Google's servers for every single URL would create trillions of requests per day, leading to massive bandwidth costs and server load. And also, Google.com will be a single-point-of-failure when anyone is using a browser.


The Solution: The 80MB Local Filter

Chrome uses a probabilistic approach. Instead of downloading the raw list, it downloads a highly compressed Bloom filter of URL hashes.

The "100x" Compression Win:


Technical Workflow: The Privacy-Preserving check

The magic of the Bloom filter is that Chrome can check "is this site bad?" locally on your device.

  1. Step 1 (Local Check): Chrome hashes the URL and checks the bits in its local 80MB Bloom filter.

  2. Step 2 (The Confidential Path): If the filter says "Not in set", the URL is 100% safe. Chrome never tells Google you visited this site. Your browsing history stays private on your machine.

  3. Step 3 (The "K-Anonymity" Trick): If the filter says "Maybe in set" (a possible threat), Chrome still doesn't send the full URL. It sends only a prefix (e.g., the first 32 bits) of the hash. Google sends back all malicious hashes sharing that prefix, and Chrome performs the final check locally.


Python implementation: Malicious URL Filter

import mmh3 # MurmurHash3
from bitarray import bitarray

class SafeBrowsingFilter:
    def __init__(self, expected_items, fpr):
        # Math: bits_per_item = -1.44 * log2(fpr)
        # For 1% FPR (0.01), we need ~10 bits per item
        self.size = expected_items * 10 
        self.bit_array = bitarray(self.size)
        self.bit_array.setall(0)
        self.hash_count = 7 # Optimal for 1% error

    def add_malicious_url(self, url):
        for seed in range(self.hash_count):
            index = mmh3.hash(url, seed) % self.size
            self.bit_array[index] = 1

    def check_url(self, url):
        for seed in range(self.hash_count):
            index = mmh3.hash(url, seed) % self.size
            if self.bit_array[index] == 0:
                return "SAFE" # 100% Guaranteed 
        return "MAYBE_THREAT" # Perform prefix-check validation

# Setup filter for 1M malicious URLs
sbf = SafeBrowsingFilter(1_000_000, 0.01)
sbf.add_malicious_url("phish-bank.ru")

# Instant local lookup
print(f"'google.com': {sbf.check_url('google.com')}") # SAFE
print(f"'phish-bank.ru/is-this-is-a-malicious-link.html': {sbf.check_url('phish-bank.ru/is-this-a-malicious-link.html')}") # MAYBE_THREAT

Comparison: Why not just use a Database?

Method Space (100M URLs) Privacy Level User Experience
Full DB Download ~8 GB High (Local) Slow / Data Heavy
Cloud Lookup 0 MB Zero (Google sees all) Network Lag
Bloom Filter ~80 MB High (Google sees <1%) Instant

Takeaway: The Bloom filter acts as a "Privacy Shield." It allows the browser to process 99.9% of your browsing locally, only reaching out to the cloud for a prefix check when there is a high probability of a threat.