Distributed Caching at Scale
In a distributed system, caching is often the difference between a sub-100ms response and a total system collapse. However, most developers treat Redis as a simple "key-value bucket." At scale, the challenge isn't storing data; it's managing the lifecycle and convergence of that data when multiple nodes compete for the same resource.
1. The Thundering Herd Problem
When a high-traffic key (e.g., "current_stock_iphone_15") expires, thousands of concurrent threads see a cache miss and hit the database simultaneously. This is known as a Cache Stampede or the Thundering Herd.
2. Solution: Distributed Locking
To prevent a stampede, we ensure that only one thread is allowed to recompute the cache for a specific key.
The Algorithm:
- Thread A sees a miss.
- Thread A attempts to acquire a short-lived lock in Redis using
SETNX. - Thread A fetches from the DB and updates the Cache.
- Threads B, C, D fail to get the lock and either wait or return the "stale" previous value.
public String getWithLock(String key) {
String value = redis.get(key);
if (value == null) {
// Attempt to acquire lock for 5 seconds
if (redis.setNx("lock:" + key, "locked", Duration.ofSeconds(5))) {
try {
value = db.fetch(key);
redis.set(key, value, Duration.ofMinutes(10));
} finally {
redis.delete("lock:" + key);
}
} else {
// Wait and retry
Thread.sleep(100);
return getWithLock(key);
}
}
return value;
}
![Sequence diagram showing 10 requests hitting the DB vs the approach where only 1 hits the DB]
3. Architecture: Sidecar vs. Global Cache
- Sidecar Cache (Local): Fast, zero network latency, but creates data inconsistency across nodes.
- Global Cluster (Redis): Strongly consistent across nodes, but requires a network hop and can become a central point of failure.
![Architecture diagram comparing a Sidecar Cache vs a Global Cluster]
4. Trade-offs & Production Insights
- Consistency vs. Availability: Do you serve stale data during a recompute, or do you block the user? In e-commerce, stale data is often better; in banking, blocking is mandatory.
- The "Big Key" Gotcha: Never store 5MB JSON blobs in a single Redis key. It blocks the single-threaded event loop. Shard large objects into multiple keys.
Next: Java Virtual Threads (Project Loom): High-Concurrency Previous: Project Case Study: Designing Stripe’s Ledger System
Related Guides
- Beginner: What is Caching?
- Related: Mastering Redis Internals
- Advanced: Designing a Distributed Lock Manager
