Compression Basics: Making Data Smaller

The Magic of Patterns: 10× Smaller, Same Information


When to Use What?

Technique Best For Compression Ratio Speed
RLE Repeated values, sorted data (e.g., user_id after GROUP BY) 10-100× Very Fast
Dictionary Low cardinality strings (e.g., country, genre) 5-20× Fast
Delta Timestamps (always increasing), sequences (e.g., IDs) 5-10× Fast
Bit Packing Small integers (e.g., ratings 1-5, boolean flags) 4-8× Very Fast

Combining Techniques

The real magic is in the mix. Stack these techniques, and you start to see serious space savings:

Example: 1M User Listening Sessions
Original user_id column:     4MB (1M × 4 bytes)
↓
After RLE (sorted):          200KB (runs of same user)
↓  
After Dictionary:            100KB (only 500K unique users)
↓
After zstd compression:       50KB (general compression)

Final: 4MB → 50KB = 98.75% reduction!