System DesignAdvancedplaybookPart 2 of 2 in Distributed Systems Fundamentals

Distributed Locking: Redis Redlock vs. Zookeeper vs. Database Constraints

Learn how to coordinate access to shared resources in a distributed system. A technical comparison of Redis Redlock, Zookeeper ephemeral nodes, and SQL-based locking.

Sachin SarawgiApril 20, 20263 min read3 minute lesson
Recommended Prerequisites
Consistent Hashing: The Secret Sauce of Distributed Scalability

Distributed Locking: Coordinating at Scale

In a distributed system, multiple instances of a service often need to access a shared resource (like an inventory item or a single-use coupon) simultaneously. Standard language-level locks (like Java's synchronized) don't work across multiple servers. We need a Distributed Lock.

1. Redis: The Performance Choice

Redis is the most common choice for distributed locking due to its low latency.

  • Implementation: Using SET resource_name my_random_value NX PX 30000. This sets the key only if it doesn't exist (NX) with an expiry (PX).
  • The Redlock Algorithm: For high-availability, Redis author Antirez proposed Redlock, which involves acquiring locks from a majority of independent Redis masters.
  • The Catch: Redlock is controversial. Critics (like Martin Kleppmann) argue it relies too heavily on system clock synchronization, which can fail in distributed environments.

2. Zookeeper: The Consistency Choice

Zookeeper is designed for coordination. It uses Ephemeral Nodes to implement locks.

  • Implementation: A client creates an ephemeral node. If the client disconnects or crashes, the node is automatically deleted, releasing the lock.
  • Pros: Extremely robust; naturally handles network partitions; provides "watchers" so clients don't have to poll for lock availability.
  • Cons: Higher latency than Redis; managing a Zookeeper cluster adds operational complexity.

3. Database-Level Locking (SQL)

If you already use a relational database, you might not need a new tool.

  • Implementation: SELECT * FROM resources WHERE id = 1 FOR UPDATE.
  • Pros: Simplest to implement; consistent with your business data.
  • Cons: Holding a DB connection open for a long time can lead to connection pool exhaustion and deadlocks.

4. The Fencing Token Solution

Regardless of the tool, a process might lose its lock (due to a GC pause or network flap) and still think it owns it.

  • The Solution: Use a Fencing Token. Every time a lock is acquired, the lock manager returns a monotonically increasing ID. When the process writes to the shared resource, it includes the token. The resource rejects any write with a token older than the last successful one.

Summary

  • Use Redis if you need high-performance, short-lived locks where occasional failure is acceptable.
  • Use Zookeeper if correctness is mission-critical (e.g., financial ledger coordination).
  • Use PostgreSQL/MySQL if your throughput is low and you want to avoid infrastructure bloat.

Which one to choose?

Choose the tool that matches your system's existing consistency model. If your stack is built on AP (Availability/Partition-tolerance), use Redis. If it's CP (Consistency/Partition-tolerance), use Zookeeper.

Learning Path: System Design Roadmap

Keep the momentum going

Step 8 of 10: Your next milestone in this track.

Next Article

NEXT UP

Distributed Transactions Part 1: The Death of ACID

2 min readAdvanced

📚

Recommended Resources

Designing Data-Intensive ApplicationsBest Seller

The definitive guide to building scalable, reliable distributed systems by Martin Kleppmann.

View on Amazon
Kafka: The Definitive GuideEditor's Pick

Real-time data and stream processing by Confluent engineers.

View on Amazon
Apache Kafka Series on Udemy

Hands-on Kafka course covering producers, consumers, Kafka Streams, and Connect.

View Course

Practical engineering notes

Get the next backend guide in your inbox

One useful note when a new deep dive is published: system design tradeoffs, Java production lessons, Kafka debugging, database patterns, and AI infrastructure.

No spam. Just practical notes you can use at work.

Sachin Sarawgi

Written by

Sachin Sarawgi

Engineering Manager and backend engineer with 10+ years building distributed systems across fintech, enterprise SaaS, and startups. CodeSprintPro is where I write practical guides on system design, Java, Kafka, databases, AI infrastructure, and production reliability.

Continue Series

Distributed Systems Fundamentals

Lesson 2 of 2 in this learning sequence.

Keep Learning

Move through the archive without losing the thread.

Related Articles

More deep dives chosen from shared tags, category overlap, and reading difficulty.

System DesignIntermediate

Consistent Hashing: The Secret Sauce of Distributed Scalability

Consistent Hashing: Scaling Without the Chaos In a distributed system, you need a way to map data keys to specific servers. The naive approach—server = hash(key) % N—works until you need to add or remove a server (N chan…

Apr 20, 20262 min read
Deep DiveDistributed Systems Fundamentals
#distributed-systems#consistent-hashing#scalability
System DesignAdvanced

System Design: Designing a Distributed Lock Manager

System Design: Designing a Distributed Lock Manager (DLM) In a microservices architecture, multiple instances of a service often need to access a shared resource (like an inventory item or a single-use coupon) simultaneo…

Apr 20, 20263 min read
Deep Dive
#system-design#distributed-locking#redis
System DesignAdvanced

System Design: Designing a Global Distributed Rate Limiter

System Design Masterclass: Designing a Distributed Rate Limiter In a distributed environment, a single malicious script, a misconfigured client, or a massive traffic spike can easily overwhelm your backend servers, bring…

Apr 20, 20266 min read
Case StudyBackend Systems Mastery
#system-design#rate-limiting#redis
System DesignAdvanced

Distributed Data Observability: Metrics That Actually Matter

Distributed Data Observability: High-Signal Metrics In a distributed data system, CPU and RAM are "low-signal" metrics. A system can have 10% CPU usage and still be completely stalled. To truly understand the health of y…

Apr 20, 20262 min read
Deep Dive
#observability#monitoring#distributed-systems

More in System Design

Category-based suggestions if you want to stay in the same domain.