System Design

System Design: Designing a Distributed Lock Manager

How to coordinate access to shared resources in a distributed system? A technical deep dive into Redis Redlock, Zookeeper ephemeral nodes, and fencing tokens.

Sachin Sarawgi·April 20, 2026·3 min read
#system-design#distributed-locking#redis#zookeeper#concurrency#consistency

System Design: Designing a Distributed Lock Manager (DLM)

In a microservices architecture, multiple instances of a service often need to access a shared resource (like an inventory item or a single-use coupon) simultaneously. Standard language-level locks (like Java’s synchronized) do not work across multiple servers. We need a Distributed Lock.

1. Core Requirements

  • Safety: Mutual exclusion — only one client can hold the lock at any time.
  • Liveness (Deadlock-free): A lock must eventually be released, even if the client holding it crashes.
  • Performance: Acquiring and releasing locks must have low latency.
  • Fault Tolerance: The locking service itself must remain available even if some nodes fail.

2. Redis-based Locking (The Performance Choice)

Redis is the most common choice due to its extreme performance.

  • Implementation: Using SET resource_name my_random_value NX PX 30000. This atomically sets the key only if it doesn't exist (NX) with an expiry (PX) of 30 seconds to ensure deadlock freedom.
  • Redlock Algorithm: To make it fault-tolerant, Redis author Antirez proposed Redlock, where a client acquires locks from a majority of independent Redis masters.
  • The Catch: Redlock is controversial. Critics argue it relies too heavily on system clock synchronization, which can fail in distributed environments.

3. Zookeeper: The Consistency Choice

Zookeeper is designed for coordination and provides strong consistency.

  • Implementation: A client creates an "ephemeral" node in the Zookeeper hierarchy. If the client disconnects or crashes, Zookeeper automatically deletes the node, releasing the lock.
  • Pros: Robust against network partitions, provides "watchers" (event notifications) so clients don't have to poll for lock availability.
  • Cons: Higher latency than Redis; managing a Zookeeper cluster adds operational complexity.

4. The Fencing Token (The Safety Essential)

Regardless of the tool, a process might lose its lock (e.g., due to a long GC pause) but still think it owns it. This leads to Split-Brain writes.

  • The Solution: Every time a lock is acquired, the lock manager returns a Fencing Token (a monotonically increasing version number). When the client writes to the shared resource, it must include this token. The resource rejects any write with an old token, effectively "fencing out" the process that lost its lock.

Summary

  • Redis: Use for high-performance, short-lived locks where minor risks are acceptable.
  • Zookeeper: Use for mission-critical coordination where consistency is paramount.
  • Postgres: Use for simple, low-throughput systems where extra infrastructure is unnecessary.

Practical engineering notes

Get the next backend guide in your inbox

One useful note when a new deep dive is published: system design tradeoffs, Java production lessons, Kafka debugging, database patterns, and AI infrastructure.

No spam. Just practical notes you can use at work.

Sachin Sarawgi

Written by

Sachin Sarawgi

Engineering Manager and backend engineer with 10+ years building distributed systems across fintech, enterprise SaaS, and startups. CodeSprintPro is where I write practical guides on system design, Java, Kafka, databases, AI infrastructure, and production reliability.

Found this useful? Share it: