Hybrid Storage: The Best of Both Worlds

HS

Real System: ClickHouse Hybrid Storage

1M inserts/sec + 100ms analytics on same data

The Challenge

-- OLTP: High-frequency real-time inserts
INSERT INTO Listens (listen_id, user_id, song_id, rating, listen_time) 
VALUES (75382914, 12345, 456, 4.5, NOW());
-- Happening 1M times per second

-- OLAP: Complex analytics on the same data  
SELECT song_id, COUNT(*) as play_count, AVG(rating)
FROM Listens 
WHERE listen_time > NOW() - INTERVAL '1 hour'
GROUP BY song_id
ORDER BY play_count DESC;
-- Must complete in <100ms for dashboards

The Impossible Triangle: Pick All Three?

Row Storage

1M/sec
Fast inserts
Single row access
Analytics: 30 seconds

Columnar Storage

50ms
Fast analytics
Compressed scans
Inserts: 10K/sec

Hybrid System

BOTH
1M inserts/sec
100ms analytics
How?

How It Works: The Three Pillars

Adaptive Layout

Fresh Listens (< 1 hour) stored in row format for fast inserts. Aged listening data automatically converted to columnar.

Smart Partitioning

Time-based partitions by listen_time with automatic merging. Background process converts row parts to columnar when listening activity slows.

Unified Query Engine

Single SQL interface. Query planner automatically reads from both row and columnar parts, merges results transparently.

Key Insight: listen_time is the natural partitioning dimension

Recent listening data needs fast writes, historical listens need fast analytics. Hybrid systems automatically optimize storage layout based on listen_time and access patterns.


Real-World Implementation: ClickHouse MergeTree

ClickHouse's MergeTree engine demonstrates hybrid storage in production at companies like Cloudflare, Uber, and Bloomberg.

The Magic Process:

  1. Incoming Listens land in small row-oriented "parts"

  2. Background merging combines parts into larger, columnar-compressed segments

  3. Query execution reads from both formats simultaneously

  4. Automatic optimization based on listen_time age and size thresholds

Performance Results:

Storage Engine Deep Dive

ClickHouse's "parts" + background merging pattern is a variation of LSM trees - the key innovation being that traditional LSM trees merge row-to-row files, while ClickHouse merges row-to-columnar. We'll explore LSM tree fundamentals in LSM Trees.

Adaptive Storage Research

Key Research Areas:

The paper below explores the theoretical foundations and practical implementations of adaptive storage systems:


More Examples -- Apache Kudu, Parquet, Delta Lake, etc.

Why Hybrid Storage Wins

Operational Benefits

  • Single system: No complex ETL pipelines
  • Real-time analytics: Query fresh data immediately
  • Cost efficiency: One system instead of two
  • Consistency: No sync lag between OLTP/OLAP

Technical Advantages

  • Workload adaptivity: Storage optimizes for actual usage
  • Automatic tuning: No manual format decisions
  • Space efficiency: Best compression for each data age
  • Query flexibility: Fast regardless of access pattern