Compression Basics: Making Data Smaller

The Magic of Patterns: 10× Smaller, Same Information

When to Use What?

Technique	Best For	Compression Ratio	Speed
RLE	Repeated values, sorted data (e.g, user_id after GROUP BY)	10-100×	Very Fast
Dictionary	Low cardinality strings (e.g, country, genre)	5-20×	Fast
Delta	Timestamps (always increasing), sequences (e.g, IDs)	5-10×	Fast
Bit Packing	Small integers (e.g, ratings 1-5, boolean flags)	4-8×	Very Fast

Combining Techniques

The real magic happens when you chain compressions:

Example: 1M User Listening Sessions

Original user_id column:     4MB (1M × 4 bytes)
↓
After RLE (sorted):          200KB (runs of same user)
↓  
After Dictionary:            100KB (only 500K unique users)
↓
After zstd compression:       50KB (general compression)

Final: 4MB → 50KB = 98.75% reduction!