Lesson 3 of 5 5 min

Database Sharding Part 6: Zero-Downtime Re-sharding

The Expand-Contract playbook. Learn how to migrate production data to a new shard cluster without taking your application offline.

Database Sharding Part 6: Zero-Downtime Re-sharding

Eventually, your sharding strategy will need to change. Perhaps you outgrew your initial 4-shard cluster and need to expand to 16 shards. Or perhaps you realized your Shard Key was flawed (as discussed in Part 3) and you need to completely reshape your data distribution.

Migrating Terabytes of data from an old database schema to a new one is hard. Doing it while the application is live, processing thousands of transactions per second, without dropping a single write or returning a single 500 error to a user, is arguably the most difficult operation in backend engineering.

This is the playbook for Zero-Downtime Migration, powered by the Expand-Contract pattern.

The Expand-Contract Pattern

The golden rule of zero-downtime migrations is that you cannot flip a switch. Migrations must happen in tiny, reversible phases. The Expand-Contract pattern guarantees that at any moment during the migration, you can hit the "Abort" button and instantly revert to the legacy system without data loss.

Here is the exact 6-step blueprint used by companies like Stripe, GitHub, and Shopify to migrate petabytes of live data.

Step 1: Provisioning

You spin up the new 16-shard cluster. It is completely empty. You run all your DDL scripts to create the tables, indexes, and constraints. At this point, no application traffic is hitting it.

Step 2: Dual-Writing (Expand)

You deploy an update to your application code. Whenever the application needs to perform an INSERT, UPDATE, or DELETE, it now performs that write twice.

  1. It writes synchronously to the Old Cluster.
  2. It fires an asynchronous task to write to the New Cluster.
Handling Dual-Write Failures

The Old Cluster remains the Source of Truth. If the write to the Old Cluster fails, the transaction is aborted and an error is returned to the user. If the write to the New Cluster fails, you simply log the error and proceed. The New Cluster is allowed to be inconsistent at this stage.

Step 3: Historical Backfill

Now that the New Cluster is receiving all new data, you must move the historical data.

You write a background worker (or use a framework) that slowly scans the Old Cluster, batching 1,000 rows at a time, and UPSERTs them into the New Cluster.

This script must run slowly to avoid exhausting the disk IOPS on the Old Cluster. Because this process can take weeks for massive datasets, Idempotency is critical. A row might be dual-written by live traffic and picked up by the backfill script simultaneously. The database must resolve this via an UPSERT (e.g., ON CONFLICT DO UPDATE), ensuring the latest timestamp always wins.

Step 4: Verification via Dark Reads

Once the backfill is 100% complete, you theoretically have two identical databases. But you cannot trust theory in production.

You deploy another code update: Dark Reads. When a user requests their profile, the application reads from the Old Cluster and returns the data to the user immediately. However, asynchronously, the application also reads from the New Cluster.

A background process compares the two JSON payloads. If there is a mismatch (e.g., the new cluster is missing a field due to a dual-write bug), the system logs an alert. You let this run for several days. Once your dashboards show 0 mismatches for 72 consecutive hours, you have mathematically proven the migration is sound.

Step 5: The Cutover

You update a dynamic feature flag in your configuration system (e.g., LaunchDarkly or Consul).

The application code instantly flips its behavior:

  • It now reads exclusively from the New Cluster.
  • It writes synchronously to the New Cluster.
  • (Optional) It writes asynchronously back to the Old Cluster as a temporary fallback mechanism.

Congratulations. Your application is now running entirely on the new 16-shard architecture.

Step 6: Teardown (Contract)

You leave the Old Cluster running for one or two weeks as an ultimate safety net. Once you are absolutely confident the new cluster is stable, performing within latency limits, and not dropping data, you execute the final step.

You delete the dual-write code from your application, and you terminate the Old Cluster instances. The migration is complete.

The Alternative: Change Data Capture (CDC)

The Expand-Contract pattern relies heavily on application-level dual writing. If your application is a massive monolith with thousands of random SQL queries scattered across millions of lines of code, finding every single INSERT and UPDATE statement to add dual-writes is impossible.

In this scenario, you skip Step 2 (Dual Writes) and instead use Change Data Capture (CDC).

Tools like Debezium attach directly to the PostgreSQL Write-Ahead Log (WAL). Every time the database commits a transaction, Debezium streams that raw binary change into a Kafka topic. A consumer reads that Kafka topic and applies the exact same transformation to the New Cluster. CDC pushes the complexity of the migration down to the infrastructure layer, keeping your application code pristine until the final cutover.

Summary

Never attempt a "big bang" database migration. By utilizing the Expand-Contract pattern, you transform a high-stress, high-risk, multi-hour downtime window into a boring, predictable, and reversible series of code deployments.

Want to track your progress?

Sign in to save your progress, track completed lessons, and pick up where you left off.