Storage Layout: Row vs Columnar

How Tables Live on Disk: From OS Files to DbFiles

OS Files: Think of these as the OS's basic building blocks—unstructured bytes with no rules.

DbFiles: These are the database's tailored files, neatly packed into fixed 64MB pages. Every table you care about exists as a DbFile.

Why DbFiles Matter: Each 64MB page is your I/O workhorse. When you query, the database hauls in whole 64MB chunks, not piecemeal records.

Why 64MB pages? I/O devices are sloths. Bulk operations—like one 64MB read versus 2000 separate 32KB reads—keep costs in check.

Inside a 64MB Page: The Storage Layout Question

  1. Row Storage

  2. Columnar Storage

  3. Hybrid Storage

1. Row-Oriented Storage: The Slotted Page

In a 64MB page, the database uses a Slots design to juggle variable-length rows and fragmented space.

2. Columnar Storage: Segments & Compression

Here, a page typically holds values for one specific column across many rows.

Compression Magic for Columnar: The Scale Effect

More data means better compression. Take Spotify's Listens table:

Column Small Scale (T(R) = 9) Massive Scale (T(R) = 100M) Compression Technique & Scaling
user_id 36B → 6B (83%) 400MB → 8MB (98%) RLE: [1×3, 2×3, 3×3] → [1×500, 2×300, ...]
song_name 36B → 14B (61%) 400MB → 50MB (88%) Dictionary: 5 unique → 10M unique, top 1K = 80% of listens
rating 36B → 18B (50%) 400MB → 40MB (90%) Quantization: NULL bitmap → 0.1 precision = 1 byte
listen_time 180B → 60B (67%) 2GB → 200MB (90%) Delta: Random timestamps → binge sessions compress 10:1
Total 432B → 98B (77%) 4.8GB → 298MB (94%) More data = more repetition patterns

The Scaling Law: Small datasets (77% compression) → Massive datasets (94%+ compression)

Why? Repetition patterns emerge at scale: Popular songs played millions of times, active users with thousands of listens, predictable timestamp sequences. Rows can compress, but not nearly as much (mix of different data types inside a row).

This is why BigQuery, Snowflake, and Redshift can scan petabytes efficiently—they're reading highly compressed columnar data, not just columnar data.

The Core Trade-off: Query Patterns Drive Everything

Different query patterns demand different storage layouts. Here's the fundamental divide:

OLTP: Row Storage Wins

Why Row Storage?

OLAP: Columnar Wins

Why Columnar?

Key Takeaways

  1. Layout Matters: How you organize data in 64MB pages.

  2. Query Pattern = Storage Choice: OLTP (many small updates) → Row storage (PostgreSQL, MySQL). OLAP (few large scans) → Columnar storage (BigQuery, Snowflake).

  3. Compression Magic: Columnar can get 77% → 99%+ compression as data scales.

Modern Analytics: Columnar + compression + distribution = scan petabytes in seconds.