Distributed File Systems
Concept. A distributed file system splits each large file into fixed-size chunks, replicates each chunk across many cheap commodity servers, and gives clients a single-namespace view, so the cluster delivers sequential read throughput proportional to its size while tolerating constant disk and machine failures.
Intuition. When Spotify stores a 64 GB audio-feature export, GFS splits it into 1,024 chunks of 64 MB each, places 3 copies of every chunk on different machines, and lets a job read all 1,024 chunks in parallel from 1,024 disks. Even if a disk dies mid-read, two other replicas of the dead chunk are already on standby.
The Pattern, and What Changes Across Generations
The GFS case study introduced chunks-and-replicas as the answer to "store the web on cheap, dying drives." That same pattern shows up in every distributed file system since: split big files into chunks, replicate each chunk across cheap servers, and give clients one logical namespace. What evolved across generations is how the metadata is coordinated, and that single design choice cascades into scale, latency, and consistency.
GFS (2003): The Pioneer
Google's solution to storing the web on cheap hardware. The case study walks through the architecture. The punchline for this comparison is the single Master that holds all file-to-chunk metadata in RAM. Every open, list, or chunk-lookup goes through it, which is why the cluster scales only until the Master's RAM or CPU runs out, somewhere in the petabyte range.
HDFS (2006): GFS for Everyone Else
Yahoo and the Hadoop community open-sourced the GFS pattern as HDFS, the storage layer that powered the entire Hadoop ecosystem.
-
One NameNode plays the Master role.
-
DataNodes store 128 MB blocks (twice GFS's chunk size, sized for the larger files of the Hadoop era).
-
Replication is rack-aware: copies sit on different racks so a rack power loss does not lose all replicas of a block.
-
CP system: strict consistency, sometimes pays in availability when the NameNode is recovering.
Same fundamental architecture as GFS, same bottleneck. HDFS Federation later sharded the NameNode to push past it.
S3 (2006): No Leader at All
Amazon S3 took a different turn. Instead of one Master, S3 has no central metadata coordinator. Every metadata operation hits a distributed key-value store, and the storage backend is a partitioned cluster spanning availability zones.
-
No leader bottleneck. Throughput scales horizontally with cluster size.
-
11 nines of durability (99.999999999%) by storing 6+ copies across multiple AZs.
-
Eventually consistent writes (AP system). A
PUTmay take a few seconds to be visible to every reader. -
Effectively infinite scale. Customers store exabytes; the largest single buckets hold petabytes.
The trade is consistency for availability and scale. For most workloads (object storage, backups, static assets, data lakes) the eventual-consistency window is acceptable.
Colossus (2010+): GFS Without the Bottleneck
Google replaced GFS with Colossus to break the Master bottleneck without giving up consistency.
-
Distributed metadata. Metadata is sharded across many servers; no single node holds the full chunk map.
-
Reed-Solomon erasure coding instead of 3× replication, dropping storage overhead from 200% to ~50% for the same fault tolerance.
-
Sub-millisecond latencies for metadata operations.
-
Exabyte-plus scale at Google.
Colossus is what GFS would have been if it had been redesigned with everything Google learned about distributed metadata coordination over the prior decade.
Quick Comparison
| System | Year | Metadata | Chunk Size | Replication | CAP | Scale |
|---|---|---|---|---|---|---|
| GFS | 2003 | Single Master | 64 MB | 3× | CP | PB |
| HDFS | 2006 | Single NameNode | 128 MB | 3× rack-aware | CP | PB |
| S3 | 2006 | Distributed | Variable | 6× across AZs | AP | ∞ |
| Colossus | 2010+ | Distributed | Variable | Reed-Solomon | CP | EB+ |
Real-World Usage
Hadoop (HDFS)
The Data Warehouse
Powers analytics for tech giants. Facebook's 600 PB+ data warehouse and Yahoo's 40,000+ node clusters rely on it.
Amazon S3
The Global Hard Drive
From Netflix to Spotify, if a piece of data is online, it probably lives on S3 (or a peer like Google Cloud Storage / Azure Blob).
Google Colossus
The Final Boss of Scale
Stores everything from YouTube videos to Gmail attachments at a scale that defies conventional file systems.
Key Takeaway: The Leader Bottleneck
Every system above shares the same chunk-and-replicate idea. What changes across generations is how metadata is coordinated. GFS and HDFS lean on a single Master / NameNode, which caps cluster size at the leader's throughput. S3 and Colossus distribute the metadata, removing that ceiling at the cost of more complex coordination. The CAP and scale columns in the table above are downstream of that one design choice.
Next
MapReduce → Now that the storage layer can hold petabytes across cheap hardware, the next question is how to compute on that data without moving it across the network. MapReduce gives you a programming model that works directly on top of HDFS-style chunks.