Hybrid Storage: The Best of Both Worlds

Concept. Hybrid storage runs row-oriented and column-oriented layouts side by side: fresh writes land in row format for speed, then a background process converts aged partitions to columnar for cheap analytics. One system, two layouts, picked automatically by data age.

Intuition. Spotify's last hour of listens needs blazing inserts (1M/sec); the past year of listens needs blazing aggregates (100ms dashboards). Pure row gives you the inserts but slow analytics; pure columnar gives you the analytics but slow inserts. ClickHouse's MergeTree keeps recent data as row-shaped "parts" and merges them into columnar segments in the background. Same SQL, both worlds.

The Challenge: Two Workloads, One Table

-- OLTP: high-frequency real-time inserts
INSERT INTO Listens (listen_id, user_id, song_id, rating, listen_time)
VALUES (75382914, 12345, 456, 4.5, NOW());
-- happening 1M times per second

-- OLAP: complex analytics on the same data
SELECT song_id, COUNT(*) AS plays, AVG(rating)
FROM Listens
WHERE listen_time > NOW() - INTERVAL '1 hour'
GROUP BY song_id
ORDER BY plays DESC;
-- must complete in <100 ms for dashboards

Pick row, you get the inserts but the analytics take 30 s. Pick columnar, you get 50 ms analytics but inserts crash to 10 K/sec. Hybrid says: don't pick.

The Trade-off

Layout	Inserts	Analytics
Pure row	1M/sec ✓	30 s ✗
Pure columnar	10K/sec ✗	50 ms ✓
Hybrid (row + columnar)	1M/sec ✓	100 ms ✓

How It Works: Three Pieces

Adaptive layout. Fresh listens (< 1 hour) live in row format for fast inserts. Aged data is converted to columnar by a background job.

Time-based partitioning. Partitions are sliced by listen_time. Background merges combine row "parts" into larger columnar segments once activity slows.

Unified query engine. A single SQL interface. The planner reads from both row and columnar partitions and merges results transparently.

The key insight: listen_time is the natural partitioning dimension. Recent data needs fast writes; old data needs fast scans. Hybrid systems optimise storage layout based on age, automatically.

Real-World Implementation: ClickHouse MergeTree

ClickHouse's MergeTree engine is the canonical example, in production at Cloudflare, Uber, and Bloomberg.

Incoming listens land in small row-oriented "parts."
Background merging combines parts into larger columnar-compressed segments.
Query execution reads from both formats simultaneously.
Automatic optimization based on listen_time age and size thresholds.

Performance numbers:

Insert throughput: 1M+ listens/sec sustained
Query latency: sub-second analytics on billions of listening events
Storage: 10:1 compression on real-world listening data
Operational simplicity: one system, not separate OLTP + OLAP

Storage engine connection. ClickHouse's "parts + background merging" pattern is a variation of LSM trees. The twist: traditional LSM trees merge row-to-row; ClickHouse merges row-to-columnar. We cover LSM tree fundamentals in LSM Trees.

Adaptive Storage Research

Hybrid Storage: The Best of Both Worlds

The Challenge: Two Workloads, One Table

The Trade-off

How It Works: Three Pieces

Real-World Implementation: ClickHouse MergeTree

Adaptive Storage Research

Why Hybrid Wins

Key Takeaways

Next