Database Sharding Part 7: Case Study - Scaling Discord to Billions
We have spent the last 6 parts of this series exploring the theoretical mathematics, routing algorithms, and migration playbooks for distributed databases. Now, it is time to look at one of the most famous real-world implementations of these concepts: Discord's migration from MongoDB to Cassandra, and eventually to ScyllaDB.
Discord-like messaging workloads are brutal for storage systems. They generate billions of append-heavy writes, demand incredibly low read-latency for message history, and suffer from extreme data skew.
This case study highlights the most practical lesson in modern database architecture: Sharding by channel_id or user_id is not enough when your data distribution follows a power law.
1. The Core Architectural Challenge
In the early days, Discord stored messages in MongoDB. As they grew to billions of messages, MongoDB's index sizes exceeded available RAM, causing catastrophic disk paging and latency spikes. They migrated to Apache Cassandra, a distributed, masterless database designed for massive write throughput.
The initial Cassandra data model was logical:
- Partition Key:
channel_id - Clustering Key:
message_id(TimeUUID)
Under this model, all messages for a specific channel live on the same physical server (shard) and are sorted sequentially by time. When a user opens a channel, Cassandra executes a lighting-fast sequential disk read to fetch the last 50 messages.
The "Fortnite" Skew Problem
This model works perfectly for a private Discord server with 10 friends.
But Discord hosts massive public servers. The official Fortnite or Midjourney channels have millions of active users generating thousands of messages per second.
Under the naive channel_id partitioning scheme, the entire Fortnite channel—potentially terabytes of data—is forced onto a single physical shard.
This causes two catastrophic failures:
- Write Hotspots: One physical server hits 100% CPU attempting to ingest the massive influx of messages, while the rest of the cluster sits idle.
- Compaction Death: Cassandra uses Log-Structured Merge (LSM) trees. It must periodically merge (compact) old data files on disk. Compacting a multi-terabyte partition belonging to a single channel takes so long that the server exhausts its disk I/O, leading to dropped writes and cluster instability.
2. The Solution: Bucketized Sharding
Discord engineers realized they had hit the Celebrity Problem (discussed in Part 3). They could not store an infinitely growing channel in a single partition.
They invented Bucketized Sharding (or Time-Window Sharding).
Instead of partitioning solely by channel_id, they modified the Partition Key to be a composite key:
partition_key = (channel_id, bucket_id)
How Bucketing Works
A bucket_id represents a bounded chunk of time or a bounded number of messages.
For a small, private channel with low traffic, the bucket_id might simply represent the year (e.g., bucket_id = 2026). All messages for that year fit comfortably on one shard.
For the massive Fortnite channel, Discord calculates the bucket_id dynamically based on time windows (e.g., a new bucket every 10 days).
When the time window rolls over, the application starts writing new messages to a brand new bucket_id.
The architectural impact is massive:
- The multi-terabyte channel is instantly broken into hundreds of small, manageable partitions.
- Because these partitions use a composite key, Cassandra's Consistent Hashing algorithm scatters them randomly across the entire physical cluster.
- The write hotspot is completely eliminated. The load of a massive channel is evenly distributed across every server in the datacenter.
3. The Read Path Penalty
As we learned in Part 5, fixing a write hotspot often ruins the read latency. If the data is scattered across hundreds of buckets, how do you read a channel's history without performing a brutal scatter-gather query?
Discord optimized the read path by explicitly avoiding scatter-gather.
When a user opens the Fortnite channel, the application doesn't ask the database for "the last 50 messages." Instead, the application knows the current time. It calculates the current bucket_id and queries the database for exactly that bucket.
If the current bucket only has 10 messages, the application sequentially queries the previous bucket_id to fetch the remaining 40 messages.
Because the application knows exactly which physical buckets to ask for, it performs targeted O(1) reads, preserving P99 read latency while completely solving the write bottleneck.
4. The Final Migration: Cassandra to ScyllaDB
Even with the perfect data model, Discord eventually hit the physical limitations of the Java Virtual Machine (JVM).
Apache Cassandra is written in Java. As Discord scaled to trillions of messages, the Java Garbage Collector (GC) became a major source of latency spikes. When the GC runs a "Stop-The-World" pause to clean up memory, the database node completely freezes for a few hundred milliseconds, causing timeouts across the microservice architecture.
Discord executed a final zero-downtime migration to ScyllaDB.
ScyllaDB is a drop-in replacement for Cassandra written entirely in C++. It bypasses the Linux kernel using an architectural pattern called "Thread-per-Core", and manages its own memory, completely eliminating Garbage Collection pauses. By combining the Bucketized Sharding data model with a C++ storage engine, Discord achieved sub-millisecond P99 latency on a dataset spanning trillions of rows.
Summary: Lessons for your Architecture
You do not need to be the size of Discord to learn from their architecture:
- Never create unbound partitions. If your shard key allows a single entity to grow to terabytes over time, you will eventually crash your database.
- Use Bucketized Sharding to artificially split massive partitions into time-bounded chunks.
- Control the Read Path by calculating exactly which buckets hold the data, avoiding expensive scatter-gather queries.
This concludes our 7-part deep dive into Database Sharding. You now have the theoretical foundations, the mathematical algorithms, and the practical case studies needed to scale a relational database beyond its vertical ceiling.
