Cassandra Internals: Built for Scale
Apache Cassandra is a peer-to-peer distributed database designed to handle massive amounts of data across many commodity servers. Its "Masterless" architecture and high write throughput are enabled by several key technologies.
1. Log-Structured Merge Trees (LSM)
Cassandra is optimized for writes. It uses an LSM-tree approach:
- Memtable: All writes first go to an in-memory buffer called a Memtable.
- Commit Log: For durability, the write is also appended to a disk-based Commit Log.
- SSTable (Sorted String Table): Once the Memtable is full, it is flushed to disk as an immutable SSTable.
- Compaction: In the background, Cassandra merges smaller SSTables into larger ones, removing deleted data (tombstones) and resolving conflicts.
2. Gossip Protocol
Since Cassandra has no "Master" node, it needs a way for nodes to know about each other. It uses Gossip:
- Every second, each node contacts 1-3 random peers to exchange state information.
- This allows the cluster to handle membership, failure detection, and health status without a central bottleneck.
3. Consistent Hashing and Vnodes
Data is distributed using a hash of the partition key.
- The Ring: All possible hash values form a ring. Nodes are assigned ranges on this ring.
- Virtual Nodes (vnodes): Instead of one big range, a node is assigned many small ranges. This ensures that when a node joins or leaves, the data is redistributed evenly across the cluster.
4. Tunable Consistency
Cassandra allows you to trade off latency for consistency on a per-query basis:
- ANY: A write is successful if any node (even a hinted handoff) accepts it.
- ONE: At least one replica must respond.
- QUORUM: $ replicas must respond.
- ALL: All replicas must respond (highest consistency, lowest availability).
5. Hinted Handoff and Read Repair
- Hinted Handoff: If a node is down during a write, a peer stores a "hint" and delivers the data when the node returns.
- Read Repair: During a read, Cassandra compares versions from different replicas. If it finds stale data, it automatically updates the outdated nodes.
Summary
Cassandra's architecture is a masterclass in distributed systems design. By prioritizing availability and write throughput through LSM-trees and Gossip, it has become the go-to database for global-scale applications.
