System DesignAdvancedarticle

System Design: Multi-Leader Database Replication

How to handle writes across multiple data centers. A deep dive into multi-leader replication, conflict resolution strategies (LWW, CRDTs), and data consistency.

Sachin SarawgiApril 20, 20262 min read2 minute lesson

System Design: Multi-Leader Replication

In a single-leader setup, all writes go to one node. This is a bottleneck for global applications. Multi-Leader Replication allows writes to happen at multiple data centers simultaneously, dramatically improving latency and availability.

1. Why Multi-Leader?

  • Geo-Latency: Users in London write to the London datacenter; users in NYC write to the NYC datacenter.
  • Resilience: If one datacenter goes down, others can still accept writes.
  • Scalability: Horizontal write scaling.

2. The Conflict Challenge

Since writes happen in parallel, two users can update the same row simultaneously in different datacenters.

  • Conflict Resolution Strategies:
    • Last Write Wins (LWW): Compare timestamps and keep the latest write. Simple but prone to clock-skew issues.
    • Conflict-free Replicated Data Types (CRDTs): Data structures designed to be merged deterministically.
    • Version Vectors: Each node maintains a vector of version history to detect causality.

3. Replication Topologies

  • All-to-All: Every node replicates to every other node.
  • Circular: Writes pass through a fixed ring of nodes.
  • Star: Writes go to a central node that redistributes to others (less resilient).

4. Conflict Avoidance

The simplest way to handle multi-leader conflicts? Avoid them. If a user's data is partitioned so that a specific user always hits the same datacenter (using Geohashing or UserID-based partitioning), you eliminate the conflict entirely.

Summary

Multi-leader replication is a powerful tool for global scale, but it demands robust conflict resolution. By partitioning traffic to avoid collisions whenever possible, you keep your system simple while gaining the massive availability benefits of a multi-leader design.

Practical engineering notes

Get the next backend guide in your inbox

One useful note when a new deep dive is published: system design tradeoffs, Java production lessons, Kafka debugging, database patterns, and AI infrastructure.

No spam. Just practical notes you can use at work.

Sachin Sarawgi

Written by

Sachin Sarawgi

Engineering Manager and backend engineer with 10+ years building distributed systems across fintech, enterprise SaaS, and startups. CodeSprintPro is where I write practical guides on system design, Java, Kafka, databases, AI infrastructure, and production reliability.

Keep Learning

Move through the archive without losing the thread.

Related Articles

More deep dives chosen from shared tags, category overlap, and reading difficulty.

System DesignAdvanced

Speculative Retries: The Google Approach to Solving Tail Latency

Speculative Retries: Solving the P99 Tail In a large distributed system, the "tail latency" (P99.9) is often dominated by a single "slow" node. This is the Tail at Scale problem. No matter how much you optimize your code…

Apr 20, 20262 min read
Deep DiveDistributed Systems Mastery
#system-design#low-latency#p99
System DesignAdvanced

System Design: Designing Airbnb (Hotel/Home Booking)

System Design: Designing Airbnb (Hotel/Home Booking) Designing a platform like Airbnb or Booking.com involves two distinct technical challenges: Search (helping users find the perfect place) and Concurrency (ensuring tha…

Apr 20, 20263 min read
Deep Dive
#system-design#airbnb#booking-system
System DesignAdvanced

System Design: Designing a Distributed BLOB Store (like S3/GCS)

System Design: Designing a Distributed BLOB Store An object store (BLOB store) is a fundamental building block of cloud infrastructure. Unlike a file system, it provides a simple interface (PUT, GET, DELETE) to store lar…

Apr 20, 20262 min read
Deep Dive
#system-design#object-storage#distributed-systems
System DesignAdvanced

System Design: Designing a Distributed Logging System (TB/Day Scale)

System Design: Designing a Distributed Logging System In a microservices architecture with thousands of containers, logs are scattered everywhere. You need a centralized system that can ingest terabytes of log data every…

Apr 20, 20263 min read
Deep Dive
#system-design#logging#elk-stack

More in System Design

Category-based suggestions if you want to stay in the same domain.