BigQuery: Serverless OLAP at Cloud Scale

Concept. BigQuery is a serverless OLAP query engine: you submit SQL, Google's compute layer auto-provisions thousands of ephemeral "slots" against shared columnar storage, and you get an answer in seconds. Storage and compute scale and bill independently.

Intuition. When a Spotify analyst runs SELECT artist, COUNT(*) FROM listens WHERE year = 2026 GROUP BY artist ORDER BY count DESC LIMIT 100 over a 10 TB Listens table, BigQuery spins up roughly 2,000 compute slots, each scans a slice of the columnar files in Colossus, and returns the answer in about 30 seconds for a few dollars. No cluster was provisioned, no DBAs were involved, and when the query finishes the slots vanish.


The Architectural Shift: Decoupled Storage and Compute

Traditional data warehouses (Teradata, the original Hadoop stack, even early Redshift) bundled storage and compute into one cluster. To store more data, you bought bigger nodes, and you paid for the compute on those nodes whether you queried them or not. Want to scan a petabyte? Buy a petabyte-sized cluster, then keep it running 24/7.

BigQuery split the two. Storage lives in Colossus, Google's distributed file system, where it sits as compressed columnar files indefinitely at storage prices (around $20/TB/month). Compute is a separate fleet of ephemeral "slots" that spin up per query, do their work, and vanish. The result is two independent scaling axes:

  • Storage scales to petabytes without paying for compute.

  • Compute scales to thousands of slots per query without provisioning servers.

The decoupling is what makes "serverless" possible. You never manage a cluster because there is no cluster waiting around between your queries.

BigQuery's decoupled architecture: ephemeral compute slots spin up per query and read from persistent columnar storage in Colossus; the two layers scale and bill independently


A Worked Query: Spotify Top Artists, 100B Plays

SELECT artist, COUNT(*) AS plays
FROM spotify.listens_2026
WHERE year = 2026
GROUP BY artist
ORDER BY plays DESC
LIMIT 100;

Walk through what happens behind the API call.

  1. Plan. BigQuery's optimizer reads the query and the table's schema. It sees the query only touches artist and year, so the columnar layout (covered in the Module 3 BigQuery case study) lets it skip the other 18 columns. Bytes scanned drops from 10 TB to roughly 200 GB.

  2. Schedule. The slot scheduler allocates roughly 2,000 ephemeral compute slots based on table size and current queue priority.

  3. Scan in parallel. Each slot reads a partition of the columnar files in Colossus, runs a local hash aggregation by artist, and emits partial counts.

  4. Shuffle and merge. A second wave of slots receives partial counts (hash-partitioned by artist), sums them, sorts, and trims to the top 100. This is the same hash-partitioning idea from Distributed Sort & Hash, running automatically inside BigQuery.

  5. Return. The result lands in the client in about 30 seconds. Slots release. You pay for the bytes scanned (around $1 in on-demand pricing) and nothing else.

No cluster was sized. No nodes were provisioned. The 2,000 slots existed for the duration of the query and vanished.


Why ELT Beat ETL

In the pre-BigQuery world, the standard pattern was ETL: Extract from operational systems, Transform on a separate compute layer (often Spark), Load into the warehouse. This made sense when warehouse storage was expensive and warehouse compute was inflexible. You cleaned the data before loading, because once it was in the warehouse you did not want to touch it.

Decoupled storage and compute flip the economics:

  • Load raw. Storage is cheap. Dump everything into BigQuery without pre-processing.

  • Transform inside. Use SQL to reshape data on demand. The same engine that runs your analyst queries runs your transformations.

  • Iterate freely. When the transformation logic has a bug, re-run from the raw source instead of from a stale ETL output.

This is ELT: Extract, Load, Transform. The transformation step moves from a brittle external pipeline (Spark + Airflow) into the warehouse itself. Tools like dbt are built entirely around this pattern.


Where BigQuery Sits Among Modern OLAP Engines

BigQuery Snowflake Redshift Spark SQL
Storage / compute Decoupled Decoupled Coupled (legacy) or decoupled (RA3) Decoupled
Cluster model Serverless Virtual warehouses (you size them) Sized clusters Sized clusters
Pricing Per-byte-scanned or slot-hour Compute-credit per warehouse-hour Per-hour per node Per-machine-hour
Idle cost Storage only Compute auto-suspends after N minutes Reserved compute None (ephemeral)
Best for Ad-hoc analytics, ELT Cross-cloud analytics, data sharing AWS-native warehousing ML pipelines, custom transforms

The modern OLAP space converged on decoupled storage and compute, but each system makes different tradeoffs on cluster sizing (serverless vs. virtual warehouse), pricing (per-byte vs. per-credit), and ecosystem fit. For ad-hoc analytics where the query mix is unpredictable, serverless wins on operational simplicity. For workloads where you want predictable cost ceilings and sticky compute, sized warehouses (Snowflake) or sized clusters (Redshift) can be cheaper.


What This Says About the Modern Data Stack

The modern data stack is a chain of specialized systems, each playing the role you have now seen:

  • Kafka (next) ingests real-time events.

  • DFS (S3, Colossus) stores raw and transformed data cheaply.

  • Spark / dbt transform data in place using SQL.

  • BigQuery / Snowflake serve interactive OLAP queries against the result.

Spark and BigQuery often appear in the same architecture: Spark does the heavy ELT job overnight, writes results back into BigQuery's storage, and analysts run sub-minute queries against the result the next morning.

For the columnar storage and Dremel execution model that make BigQuery work under the hood, see the Module 3 BigQuery case study.


Next

Kafka → Storage you can keep forever and compute you can spin up on demand handle batch and interactive analytics. The last piece is what brings the data in: a real-time event substrate that makes raw events available for both streaming consumers and batch loaders.