Databases

Cassandra Internals: LSM-Trees, Gossip, and Eventual Consistency

Explore the distributed architecture of Apache Cassandra. Learn about Log-Structured Merge Trees, the Gossip protocol, and how it handles massive write volumes.

Sachin Sarawgi·April 20, 2026·2 min read
#cassandra#databases#lsm-trees#distributed-systems#nosql

Cassandra Internals: Built for Scale

Apache Cassandra is a peer-to-peer distributed database designed to handle massive amounts of data across many commodity servers. Its "Masterless" architecture and high write throughput are enabled by several key technologies.

1. Log-Structured Merge Trees (LSM)

Cassandra is optimized for writes. It uses an LSM-tree approach:

  1. Memtable: All writes first go to an in-memory buffer called a Memtable.
  2. Commit Log: For durability, the write is also appended to a disk-based Commit Log.
  3. SSTable (Sorted String Table): Once the Memtable is full, it is flushed to disk as an immutable SSTable.
  4. Compaction: In the background, Cassandra merges smaller SSTables into larger ones, removing deleted data (tombstones) and resolving conflicts.

2. Gossip Protocol

Since Cassandra has no "Master" node, it needs a way for nodes to know about each other. It uses Gossip:

  • Every second, each node contacts 1-3 random peers to exchange state information.
  • This allows the cluster to handle membership, failure detection, and health status without a central bottleneck.

3. Consistent Hashing and Vnodes

Data is distributed using a hash of the partition key.

  • The Ring: All possible hash values form a ring. Nodes are assigned ranges on this ring.
  • Virtual Nodes (vnodes): Instead of one big range, a node is assigned many small ranges. This ensures that when a node joins or leaves, the data is redistributed evenly across the cluster.

4. Tunable Consistency

Cassandra allows you to trade off latency for consistency on a per-query basis:

  • ANY: A write is successful if any node (even a hinted handoff) accepts it.
  • ONE: At least one replica must respond.
  • QUORUM: $ replicas must respond.
  • ALL: All replicas must respond (highest consistency, lowest availability).

5. Hinted Handoff and Read Repair

  • Hinted Handoff: If a node is down during a write, a peer stores a "hint" and delivers the data when the node returns.
  • Read Repair: During a read, Cassandra compares versions from different replicas. If it finds stale data, it automatically updates the outdated nodes.

Summary

Cassandra's architecture is a masterclass in distributed systems design. By prioritizing availability and write throughput through LSM-trees and Gossip, it has become the go-to database for global-scale applications.

📚

Recommended Resources

Designing Data-Intensive ApplicationsBest Seller

The definitive guide to building scalable, reliable distributed systems by Martin Kleppmann.

View on Amazon
Kafka: The Definitive GuideEditor's Pick

Real-time data and stream processing by Confluent engineers.

View on Amazon
Apache Kafka Series on Udemy

Hands-on Kafka course covering producers, consumers, Kafka Streams, and Connect.

View Course

Practical engineering notes

Get the next backend guide in your inbox

One useful note when a new deep dive is published: system design tradeoffs, Java production lessons, Kafka debugging, database patterns, and AI infrastructure.

No spam. Just practical notes you can use at work.

Sachin Sarawgi

Written by

Sachin Sarawgi

Engineering Manager and backend engineer with 10+ years building distributed systems across fintech, enterprise SaaS, and startups. CodeSprintPro is where I write practical guides on system design, Java, Kafka, databases, AI infrastructure, and production reliability.

Found this useful? Share it: