System DesignAdvancedarticle

System Design: Data Partitioning and Sharding Strategies

How to scale databases horizontally? A deep dive into Partitioning, Sharding (Horizontal vs Vertical), and choosing the right Shard Key to avoid Hotspots.

Sachin SarawgiApril 20, 20262 min read2 minute lesson

System Design: Data Partitioning and Sharding

When your database outgrows a single machine, you have two choices: Scale Up (bigger machine) or Scale Out (multiple machines). Sharding (Horizontal Partitioning) is the art of scaling out by splitting your data across multiple servers.

1. Sharding vs. Partitioning

  • Vertical Partitioning (Normalization): Splitting tables by columns (e.g., storing user metadata in one table and user preferences in another).
  • Horizontal Partitioning (Sharding): Splitting the table by rows. Server 1 holds user IDs 1-1M, Server 2 holds 1M-2M.

2. Choosing the Right Shard Key

The shard key is the most important architectural decision. If you get it wrong, you end up with Hotspots.

  • Hash-based Sharding: .
    • Pros: Even distribution of data.
    • Cons: Hard to perform range queries (e.g., "Find all users joined between X and Y") because data is scattered everywhere.
  • Range-based Sharding: for IDs 0-1000, for 1001-2000.
    • Pros: Efficient range queries.
    • Cons: Leads to "hot shards" if one range becomes popular (e.g., all new users being added to the newest shard).

3. Directory-Based Sharding

Maintain a "Shard Map" (a lookup table) that tells the application where each key lives.

  • Pros: Extremely flexible; you can move users between shards without changing the key.
  • Cons: The lookup table itself becomes a single point of failure and a performance bottleneck.

4. The Challenge: Re-Sharding

What happens when your 10 shards are full and you need to add 2 more?

  • If you use , changing it to forces you to move almost 100% of your data.
  • The Solution: Use Consistent Hashing to ensure that only of the data needs to be moved when the cluster size changes.

5. Summary

Sharding is a "last resort" for scaling. It adds significant complexity to queries (cross-shard joins are nearly impossible). Before sharding, always exhaust other options: better indexing, caching (Redis), and read replicas. But when you finally do shard, choose your key based on your most frequent access pattern.

Practical engineering notes

Get the next backend guide in your inbox

One useful note when a new deep dive is published: system design tradeoffs, Java production lessons, Kafka debugging, database patterns, and AI infrastructure.

No spam. Just practical notes you can use at work.

Sachin Sarawgi

Written by

Sachin Sarawgi

Engineering Manager and backend engineer with 10+ years building distributed systems across fintech, enterprise SaaS, and startups. CodeSprintPro is where I write practical guides on system design, Java, Kafka, databases, AI infrastructure, and production reliability.

Keep Learning

Move through the archive without losing the thread.

Related Articles

More deep dives chosen from shared tags, category overlap, and reading difficulty.

More in System Design

Category-based suggestions if you want to stay in the same domain.