Hybrid Storage: The Best of Both Worlds

Concept. Hybrid storage runs row-oriented and column-oriented layouts side by side: fresh writes land in row format for speed, then a background process converts aged partitions to columnar for cheap analytics. One system, two layouts, picked automatically by data age.

Intuition. Spotify's last hour of listens needs blazing inserts (1M/sec); the past year of listens needs blazing aggregates (100ms dashboards). Pure row gives you the inserts but slow analytics; pure columnar gives you the analytics but slow inserts. ClickHouse's MergeTree keeps recent data as row-shaped "parts" and merges them into columnar segments in the background. Same SQL, both worlds.

The Challenge: Two Workloads, One Table

-- OLTP: high-frequency real-time inserts
INSERT INTO Listens (listen_id, user_id, song_id, rating, listen_time)
VALUES (75382914, 12345, 456, 4.5, NOW());
-- happening 1M times per second

-- OLAP: complex analytics on the same data
SELECT song_id, COUNT(*) AS plays, AVG(rating)
FROM Listens
WHERE listen_time > NOW() - INTERVAL '1 hour'
GROUP BY song_id
ORDER BY plays DESC;
-- must complete in <100 ms for dashboards

Pick row, you get the inserts but the analytics take 30 s. Pick columnar, you get 50 ms analytics but inserts crash to 10 K/sec. Hybrid says: don't pick.

The Trade-off

Layout Inserts Analytics
Pure row 1M/sec ✓ 30 s ✗
Pure columnar 10K/sec ✗ 50 ms ✓
Hybrid (row + columnar) 1M/sec ✓ 100 ms ✓

How It Works: Three Pieces

Adaptive layout. Fresh listens (< 1 hour) live in row format for fast inserts. Aged data is converted to columnar by a background job.

Time-based partitioning. Partitions are sliced by listen_time. Background merges combine row "parts" into larger columnar segments once activity slows.

Unified query engine. A single SQL interface. The planner reads from both row and columnar partitions and merges results transparently.

The key insight: listen_time is the natural partitioning dimension. Recent data needs fast writes; old data needs fast scans. Hybrid systems optimise storage layout based on age, automatically.

Real-World Implementation: ClickHouse MergeTree

ClickHouse's MergeTree engine is the canonical example, in production at Cloudflare, Uber, and Bloomberg.

  1. Incoming listens land in small row-oriented "parts."

  2. Background merging combines parts into larger columnar-compressed segments.

  3. Query execution reads from both formats simultaneously.

  4. Automatic optimization based on listen_time age and size thresholds.

Performance numbers:

  • Insert throughput: 1M+ listens/sec sustained

  • Query latency: sub-second analytics on billions of listening events

  • Storage: 10:1 compression on real-world listening data

  • Operational simplicity: one system, not separate OLTP + OLAP

Storage engine connection. ClickHouse's "parts + background merging" pattern is a variation of LSM trees. The twist: traditional LSM trees merge row-to-row; ClickHouse merges row-to-columnar. We cover LSM tree fundamentals in LSM Trees.

Adaptive Storage Research

Active research areas:

  • Workload-adaptive storage. Systems that learn access patterns and re-tune layout.

  • Online reorganization. Converting between formats without stopping queries.

  • Hybrid indexing. Row-based indexes over columnar-compressed data.

  • Multi-format query processing. Execution across heterogeneous storage layouts.

The paper below explores theoretical foundations and practical implementations:

Why Hybrid Wins

Operationally:

  • One system, no complex ETL pipelines between OLTP and OLAP

  • Real-time analytics on data that arrived seconds ago

  • Cost: one cluster instead of two

  • Consistency: no sync lag between transactional and analytical stores

Technically:

  • Storage tunes itself to actual workload

  • Each data age uses its best-fit format

  • Best compression for cold data, fast writes for hot data

  • Single SQL surface regardless of underlying layout

Key Takeaways

  1. The OLTP/OLAP choice isn't binary. Hybrid storage gives you both layouts in one system, picked by data age.

  2. Partition by time, convert in the background. That's the universal pattern: ClickHouse, Apache Kudu, Parquet+Delta are all variations of the same idea.

  3. One system replaces two. Cuts ops complexity and gives you fresher analytics than any ETL pipeline ever could.


Next

Hash Partitioning → Once the data is laid out on disk, the next question is how to split a too-big-for-RAM table into RAM-sized chunks.