CAP Theorem
Concept. CAP says a distributed database can guarantee at most two of Consistency, Availability, and Partition tolerance, and since network partitions are a fact of life (P is mandatory), every real system trades off C against A.
Intuition. When a transatlantic cable cuts and Spotify's US and EU data centers can no longer talk, the system has to pick: either both regions stop accepting Premium upgrades until they reconnect (Consistency wins) or both keep taking writes that may conflict on reconnect (Availability wins). You don't get both during the partition; you only get to choose which one.
Pick Two (But It's Complicated)
The Harsh Reality
In college, you assume the network works. In distributed systems, you assume the network is actively trying to destroy your career. That's why Partition Tolerance is mandatory.
Real Systems: It's a Spectrum
| System | Typically | Tunable? | Note |
|---|---|---|---|
| HDFS/Hadoop | CP | No | Strict consistency for big data analytics |
| Amazon S3 | AP | No | High availability object storage (Eventual Consistency) |
| MongoDB | CP | Yes | writeConcern, readConcern |
| Cassandra | AP | Yes | Quorum reads/writes → CP |
| PostgreSQL | CA* | No | *Single node only |
| DynamoDB | AP | Yes | Eventually/Strong consistency |
| etcd/Consul | CP | No | Raft consensus |
| CouchDB | AP | No | Multi-master, eventual |
| Redis | CP/CA | Yes | Depends on configuration |
Key Takeaways
1. It's a Useful Model (Not a Law)
CAP isn't a strict rulebook. It's a framework for understanding trade-offs. Most systems offer tunable consistency levels, so use CAP to predict system behavior when things go south.
2. P is Not Optional
Network failures are a given. The real decision is between Consistency and Availability. Start by designing for partition tolerance.
3. Choose Based on Requirements
Align your system design with business needs. Financial systems demand Consistency. Social media thrives on Availability. Analytics can tolerate stale data, so Availability takes precedence.