SSTables: Sorted String Tables

Immutable Files That Power Modern Databases

Core idea: Don't fiddle with files once they're written. Instead, create new sorted files and handle the merging later.

Immutable

Never modified after write

Sorted by Key

Enables efficient merging

Sequential Write

Created by memory flush

Self-Contained

Index + data in one file

Visual: Anatomy of an SSTable

Pseudocode

// Create SSTable
Flush(memtable):
    sorted = sort(memtable)        // Sort by key
    write(bloom_filter)             // For quick "not exists"
    write(sparse_index)             // Every 100th key → offset
    write(sorted_data)              // The actual key-values
    write(footer)                   // min_key, max_key, size

// Read from SSTable
Read(sstable, key):
    if not bloom_filter.contains(key):
        return None                 // Quick reject

    offset = binary_search(index, key)
    block = read_block(offset)
    return linear_search(block, key)

// Merge SSTables (Compaction)
Compact(sstables):
    // K-way merge, keep newest version
    merged = k_way_merge(sstables)
    return write_new_sstable(merged)

Key Takeaway

SSTables turn random writes into sequential writes by batching updates in memory and flushing sorted, immutable files to disk.
This simple idea powers Google's Spanner, Google's Bigtable, Amazon's DynamoDB, and Apache Cassandra.