Storage Layout: Row vs Columnar

Concept. A storage layout is the on-disk arrangement of a table. Row-oriented stores all columns of one row together; column-oriented stores all values of one column together.

Intuition. Row-oriented Listens packs each listen (listen_id, user_id, song_id, rating, listen_time) as one chunk on disk. Column-oriented stores all the user_id values together, then all the rating values, then all the listen_time values. "Average rating across a billion listens" reads one column instead of every row.

How Tables Live on Disk

Two layers between your table and the operating system:

OS Files. The OS's basic building blocks: unstructured bytes with no rules.
DbFiles. The database's tailored files, packed into fixed 64 MB pages. Every table you query lives as a DbFile.
Why pages? When you query, the database hauls in whole 64 MB chunks, not piecemeal records. I/O devices are slow, so bulk operations (one 64 MB read vs. 2,000 separate 32 KB reads) keep cost in check.

Inside a 64 MB Page: The Storage Layout Question

Three options for how to lay out the bytes inside a page:

Row storage. Pack all columns of one row together.
Columnar storage. Pack all values of one column together.
Hybrid storage. Column-grouped within row-groups (covered separately).

The first two are the fundamental contrast. Same data, two layouts, very different I/O for the same query:

Same logical Listens table stored two ways: row-oriented packs whole rows together into pages; columnar puts each column in its own file.

Figure

Same logical Listens table, two physical layouts.

Row-oriented file (left): each page packs whole rows together. Reading row 5 returns all four columns in one page read. Writing a row is one append.
Columnar files (right): each column gets its own file. Reading AVG(rating) touches only the rating column file, skipping the other three entirely.
Which layout wins depends on the query:
- SELECT AVG(rating) FROM Listens → columnar. Row layout would touch every byte of every row, ~75% wasted I/O on this 4-column toy, much worse on real wide tables.
- SELECT * FROM Listens WHERE listen_id = 42 → row. All four columns of that record sit on one page; columnar would seek into 4 files and reassemble.
- Inserts and updates → row. One record, one packed write. Columnar would append to every column file.

Row-Oriented: The Slotted Page

A 64 MB row-oriented page uses a slotted page design to handle variable-length rows and the fragmentation that updates create.

Slotted page layout: a single 64MB row-oriented page. Data records grow from the top down; the slot array grows from the bottom up; free space sits in the middle and shrinks as both grow toward each other.

Columnar: Compression at Scale

Same idea inverted: a page holds many values for a single column. Sequential bytes are all the same type, and that's exactly what compression algorithms exploit aggressively.

More data means better compression. Take Spotify's Listens table:

Column	Small scale (T(R) = 9)	Massive scale (T(R) = 100M)	Compression technique
user_id	36 B → 6 B (83 %)	400 MB → 8 MB (98 %)	RLE. Runs of repeated user_ids.
song_name	36 B → 14 B (61 %)	400 MB → 50 MB (88 %)	Dictionary. Top 1K songs cover 80 % of plays.
rating	36 B → 18 B (50 %)	400 MB → 40 MB (90 %)	Quantization. 0.1 precision fits in 1 byte plus a NULL bitmap.
listen_time	180 B → 60 B (67 %)	2 GB → 200 MB (90 %)	Delta. Binge sessions compress 10:1.
Total	432 B → 98 B (77 %)	4.8 GB → 298 MB (94 %)	More data means more repetition.

The Scaling Law: small datasets compress to ~77 %; massive datasets to 94 %+.

Why: repetition patterns emerge at scale. Popular songs played millions of times, active users with thousands of listens, predictable timestamp sequences. Rows can compress too, but not nearly as much (mix of types per row defeats most encoders).

This is why BigQuery, Snowflake, and Redshift can scan petabytes efficiently. They're reading highly compressed columnar data, not just columnar data.

Key Takeaways

Layout matters. How you organize bytes inside a 64 MB page dictates I/O cost.
Query pattern picks the layout. OLTP (point lookups, updates, inserts) favors row (PostgreSQL, MySQL). OLAP (aggregates, scans, append-only) favors columnar (BigQuery, Snowflake).
Columnar plus compression plus distribution equals scanning petabytes in seconds. This is the foundation of modern analytics.

Case Study 3.1: BigQuery Scaling → The canonical columnar-analytics system that powers Google and is the blueprint for modern cloud data warehouses.