Hybrid Storage: The Best of Both Worlds
Concept. Hybrid storage runs row-oriented and column-oriented layouts side by side: fresh writes land in row format for speed, then a background process converts aged partitions to columnar for cheap analytics. One system, two layouts, picked automatically by data age.
Intuition. Spotify's last hour of listens needs blazing inserts (1M/sec); the past year of listens needs blazing aggregates (100ms dashboards). Pure row gives you the inserts but slow analytics; pure columnar gives you the analytics but slow inserts. ClickHouse's MergeTree keeps recent data as row-shaped "parts" and merges them into columnar segments in the background. Same SQL, both worlds.
The Challenge: Two Workloads, One Table
-- OLTP: high-frequency real-time inserts
INSERT INTO Listens (listen_id, user_id, song_id, rating, listen_time)
VALUES (75382914, 12345, 456, 4.5, NOW());
-- happening 1M times per second
-- OLAP: complex analytics on the same data
SELECT song_id, COUNT(*) AS plays, AVG(rating)
FROM Listens
WHERE listen_time > NOW() - INTERVAL '1 hour'
GROUP BY song_id
ORDER BY plays DESC;
-- must complete in <100 ms for dashboards
Pick row, you get the inserts but the analytics take 30 s. Pick columnar, you get 50 ms analytics but inserts crash to 10 K/sec. Hybrid says: don't pick.
The Trade-off
| Layout | Inserts | Analytics |
|---|---|---|
| Pure row | 1M/sec ✓ | 30 s ✗ |
| Pure columnar | 10K/sec ✗ | 50 ms ✓ |
| Hybrid (row + columnar) | 1M/sec ✓ | 100 ms ✓ |
How It Works: Three Pieces
Adaptive layout. Fresh listens (< 1 hour) live in row format for fast inserts. Aged data is converted to columnar by a background job.
Time-based partitioning. Partitions are sliced by listen_time. Background merges combine row "parts" into larger columnar segments once activity slows.
Unified query engine. A single SQL interface. The planner reads from both row and columnar partitions and merges results transparently.
The key insight: listen_time is the natural partitioning dimension. Recent data needs fast writes; old data needs fast scans. Hybrid systems optimise storage layout based on age, automatically.
Real-World Implementation: ClickHouse MergeTree
ClickHouse's MergeTree engine is the canonical example, in production at Cloudflare, Uber, and Bloomberg.
-
Incoming listens land in small row-oriented "parts."
-
Background merging combines parts into larger columnar-compressed segments.
-
Query execution reads from both formats simultaneously.
-
Automatic optimization based on
listen_timeage and size thresholds.
Performance numbers:
-
Insert throughput: 1M+ listens/sec sustained
-
Query latency: sub-second analytics on billions of listening events
-
Storage: 10:1 compression on real-world listening data
-
Operational simplicity: one system, not separate OLTP + OLAP
Storage engine connection. ClickHouse's "parts + background merging" pattern is a variation of LSM trees. The twist: traditional LSM trees merge row-to-row; ClickHouse merges row-to-columnar. We cover LSM tree fundamentals in LSM Trees.
Adaptive Storage Research
Active research areas:
-
Workload-adaptive storage. Systems that learn access patterns and re-tune layout.
-
Online reorganization. Converting between formats without stopping queries.
-
Hybrid indexing. Row-based indexes over columnar-compressed data.
-
Multi-format query processing. Execution across heterogeneous storage layouts.
The paper below explores theoretical foundations and practical implementations:
Why Hybrid Wins
Operationally:
-
One system, no complex ETL pipelines between OLTP and OLAP
-
Real-time analytics on data that arrived seconds ago
-
Cost: one cluster instead of two
-
Consistency: no sync lag between transactional and analytical stores
Technically:
-
Storage tunes itself to actual workload
-
Each data age uses its best-fit format
-
Best compression for cold data, fast writes for hot data
-
Single SQL surface regardless of underlying layout
Key Takeaways
-
The OLTP/OLAP choice isn't binary. Hybrid storage gives you both layouts in one system, picked by data age.
-
Partition by time, convert in the background. That's the universal pattern: ClickHouse, Apache Kudu, Parquet+Delta are all variations of the same idea.
-
One system replaces two. Cuts ops complexity and gives you fresher analytics than any ETL pipeline ever could.
Next
Hash Partitioning → Once the data is laid out on disk, the next question is how to split a too-big-for-RAM table into RAM-sized chunks.