Distributed Locking: Coordinating at Scale
In a distributed system, multiple instances of a service often need to access a shared resource (like an inventory item or a single-use coupon) simultaneously. Standard language-level locks (like Java's synchronized) don't work across multiple servers. We need a Distributed Lock.
1. Redis: The Performance Choice
Redis is the most common choice for distributed locking due to its low latency.
- Implementation: Using
SET resource_name my_random_value NX PX 30000. This sets the key only if it doesn't exist (NX) with an expiry (PX). - The Redlock Algorithm: For high-availability, Redis author Antirez proposed Redlock, which involves acquiring locks from a majority of independent Redis masters.
- The Catch: Redlock is controversial. Critics (like Martin Kleppmann) argue it relies too heavily on system clock synchronization, which can fail in distributed environments.
2. Zookeeper: The Consistency Choice
Zookeeper is designed for coordination. It uses Ephemeral Nodes to implement locks.
- Implementation: A client creates an ephemeral node. If the client disconnects or crashes, the node is automatically deleted, releasing the lock.
- Pros: Extremely robust; naturally handles network partitions; provides "watchers" so clients don't have to poll for lock availability.
- Cons: Higher latency than Redis; managing a Zookeeper cluster adds operational complexity.
3. Database-Level Locking (SQL)
If you already use a relational database, you might not need a new tool.
- Implementation:
SELECT * FROM resources WHERE id = 1 FOR UPDATE. - Pros: Simplest to implement; consistent with your business data.
- Cons: Holding a DB connection open for a long time can lead to connection pool exhaustion and deadlocks.
4. The Fencing Token Solution
Regardless of the tool, a process might lose its lock (due to a GC pause or network flap) and still think it owns it.
- The Solution: Use a Fencing Token. Every time a lock is acquired, the lock manager returns a monotonically increasing ID. When the process writes to the shared resource, it includes the token. The resource rejects any write with a token older than the last successful one.
Summary
- Use Redis if you need high-performance, short-lived locks where occasional failure is acceptable.
- Use Zookeeper if correctness is mission-critical (e.g., financial ledger coordination).
- Use PostgreSQL/MySQL if your throughput is low and you want to avoid infrastructure bloat.
Which one to choose?
Choose the tool that matches your system's existing consistency model. If your stack is built on AP (Availability/Partition-tolerance), use Redis. If it's CP (Consistency/Partition-tolerance), use Zookeeper.
