Beyond CAP: Understanding the PACELC Theorem
Most engineers know the CAP Theorem: in the presence of a network partition (P), you must choose between Consistency (C) and Availability (A). However, CAP only tells us what happens when things go wrong. It says nothing about how a system behaves during normal operations.
This is where PACELC comes in.
1. What is PACELC?
PACELC is an extension of CAP. It is read as:
- If there is a Partition (P), choose between Availability (A) and Consistency (C).
- Else (E), choose between Latency (L) and Consistency (C).
2. The Missing Piece: Normal Operations
In a healthy system (no partition), data is replicated across nodes. To ensure Consistency, the system must wait for all replicas to acknowledge a write before replying to the user. This increases Latency.
If you want low Latency, you reply to the user immediately after one node writes, and replicate to others in the background. This results in Eventual Consistency.
3. Categorizing Popular Databases
PC/EC (Consistency Priority)
These systems prioritize consistency at all costs.
- Example: BigTable, HBase.
- Behavior: They are consistent during partitions and choose consistency over latency during normal operations.
PA/EL (Availability & Latency Priority)
These systems prioritize speed and uptime.
- Example: DynamoDB, Cassandra, Riak.
- Behavior: During a partition, they remain available (PA). During normal operation, they prioritize low latency (EL) via asynchronous replication.
PA/EC (The Hybrid)
- Example: MongoDB.
- Behavior: MongoDB is PA because it can lose data during a partition if the primary fails before replication. However, it is EC because, by default, it waits for primary acknowledgment, prioritizing consistency over the lowest possible latency.
4. Why PACELC Matters for Architects
When choosing a database, you need to ask two questions:
- "What happens when the network breaks?" (CAP)
- "How fast do I need my reads and writes to be when the network is fine?" (PACELC)
If you are building a high-frequency trading system, you likely need PC/EC. If you are building a social media feed where a few seconds of stale data is fine but speed is paramount, you want PA/EL.
Summary
The PACELC theorem provides a more realistic framework for evaluating distributed databases. By acknowledging that latency is a trade-off for consistency even in a healthy network, you can make more informed decisions about your system's performance and data integrity.
