System Design: Designing a Local-First Key-Value Store
While distributed systems get all the glory, the foundation of every database is an efficient local storage engine. Whether it's LevelDB (Google), RocksDB (Meta), or the foundation of Cassandra, these systems use an LSM-tree (Log-Structured Merge Tree) architecture to maximize write performance on disk.
1. Core Requirements
- High Write Throughput: Minimize disk seeks.
- Fast Lookups: Efficiently find a key in a massive file.
- Durability: Ensure data is safe after a crash.
- Compression: Keep the storage footprint small.
2. The LSM-Tree Architecture
Unlike a B-Tree that updates data "in-place" (random I/O), an LSM-tree is optimized for Sequential I/O.
- Memtable (Memory): New writes go into an in-memory balanced tree (e.g., Skip List).
- Commit Log (Disk): To survive crashes, writes are first appended to a sequential file on disk.
- SSTable (Disk): When the Memtable reaches a certain size, it is flushed to disk as a sorted, immutable file called an SSTable.
- Compaction: Background threads merge smaller SSTables into larger ones, removing obsolete/deleted records and improving future read performance.
3. Optimizing Reads: Bloom Filters
If you need to check if a key exists, scanning every SSTable on disk is too slow.
- The Secret: Each SSTable has an associated Bloom Filter in memory. If the filter says "No," we skip that file entirely, saving massive disk I/O.
4. Why "Log-Structured"?
The name comes from the fact that all disk writes are log-like appends. This architecture is the single biggest reason why systems like Cassandra and RocksDB can handle millions of writes per second on standard hardware.
5. Summary
A high-performance key-value store is a balancing act between memory usage (Memtable/Bloom Filters) and disk management (Compaction). By embracing the LSM-tree model, you create a storage engine that is exponentially more efficient at writes than traditional page-based databases.
