DatabasesIntermediateguidePart 2 of 8 in Database Sharding Mastery

Database Sharding Part 2: Partitioning vs. Sharding

Learn the difference between logical table partitioning and physical database sharding. Which one do you need?

Sachin SarawgiApril 20, 20265 min read5 minute lesson
Recommended Prerequisites
Database Indexing Deep Dive

Database Sharding Part 2: Partitioning vs. Sharding

When engineering teams realize their database is struggling to keep up with load, the words "partitioning" and "sharding" are often thrown around interchangeably in architectural meetings.

However, they are fundamentally different concepts operating at entirely different layers of the software stack. One solves index bloating and maintenance issues; the other solves hard hardware bottlenecks.

If you jump straight to sharding when you only needed partitioning, you will introduce months of unnecessary distributed systems complexity. Let's explore the critical differences between the two.

1. Logical Table Partitioning (Vertical/Horizontal Slicing on a Single Node)

Table partitioning is a database-native feature where a single massive logical table (e.g., events) is physically broken down into smaller, individual tables under the hood.

Importantly, all of these smaller tables exist on the exact same database server.

When your application executes a SELECT * FROM events WHERE created_at = '2026-01-01', the database query planner intercepts the query, realizes it only needs data from the events_2026_01 partition, and skips scanning all the other partitions.

The B-Tree Index Problem

Why do we partition tables? To save the B-Tree index.

In PostgreSQL or MySQL, an index is structured as a B-Tree. When a table has 5 million rows, the B-Tree is shallow, and traversing it to find a row takes sub-milliseconds. When a table has 5 billion rows, the B-Tree becomes incredibly deep and massive. It no longer fits in the server's RAM (buffer cache).

Every time you insert a new row into a 5-billion-row table, the database has to update this massive B-Tree. If the required parts of the index aren't in RAM, the database has to read from the disk (an I/O miss), dramatically slowing down write performance.

How Partitioning Solves This

By using Range Partitioning (e.g., creating a new partition every month), you are creating a fresh, tiny B-Tree index every month.

  • The current month's index is small enough to stay 100% in RAM.
  • INSERT performance remains lightning fast forever.
  • Old partitions can easily be archived or dropped instantly using DROP TABLE events_2024_01, which is an O(1) metadata operation, rather than running a massive DELETE query that takes hours and causes table bloat.
Partitioning does NOT scale hardware limits

Because all partitions live on the same RDS instance, partitioning does absolutely nothing to help you if you are maxing out your CPU cores, or if your overall disk IOPS are saturated. It purely solves logical data management and index traversal speed.

2. Physical Database Sharding (Horizontal Scaling Across Nodes)

If partitioning is slicing a pie into 12 pieces but leaving them on the same plate, Sharding is buying 12 different plates and putting a slice on each one.

Sharding (often called Horizontal Partitioning) is the architectural process of distributing your dataset across multiple, independent physical database servers.

  • Server A handles users with IDs 1 to 1,000,000.
  • Server B handles users with IDs 1,000,001 to 2,000,000.

The Problem Sharding Solves

Sharding solves the Vertical Ceiling (discussed in Part 1). When a single 128-core AWS instance cannot physically handle the CPU or I/O demands of your traffic, sharding allows you to add infinite capacity. If you need more CPU, you just spin up another physical database shard and migrate a chunk of your users to it.

The Massive Trade-off

Sharding forces you to embrace distributed systems complexity:

  1. No Cross-Shard Joins: You cannot write a SQL JOIN between a user on Shard A and an order on Shard B. The application layer must fetch the data from both shards and join it in application memory.
  2. No Distributed Transactions: If you need to deduct money from a user on Shard A and credit a user on Shard B, you can no longer rely on BEGIN and COMMIT. You must implement complex distributed transaction patterns like Two-Phase Commit (2PC) or the Saga Pattern.
  3. Operational Overhead: Instead of monitoring, backing up, and maintaining one massive database, your DevOps team must now manage 10, 20, or 100 separate databases.
The Shard Routing Layer

The database engine usually does not know it is sharded. Your application code (or a proxy like Vitess or PgBouncer) must intercept every query, analyze the Shard Key, determine which physical database holds the data, and route the connection accordingly. This requires significant code rewrites.

Summary: Which one do you need?

Feature Logical Partitioning Physical Sharding
Location Single Server Multiple Servers
Solves Huge Indexes, Slow Deletes, Archiving Maxed out CPU, RAM, Disk I/O
SQL Joins Works perfectly across partitions Broken across shards
ACID Transactions Fully Supported Requires application-level sagas
App Complexity Zero (handled by DB) High (requires smart routing)

The Golden Rule: Always logically partition your massive tables first. Only when you max out the physical limits of the largest cloud instance should you resort to physical sharding.

Learning Path: System Design Roadmap

Keep the momentum going

Step 3 of 10: Your next milestone in this track.

Next Article

NEXT UP

Database Sharding Part 3: The Shard Key Blueprint

5 min readAdvanced

Learning Path: Databases Track

Keep the momentum going

Step 9 of 54: Your next milestone in this track.

Next Article

NEXT UP

Database Sharding Part 3: The Shard Key Blueprint

5 min readAdvanced

📚

Recommended Resources

Designing Data-Intensive ApplicationsEssential

The go-to book for understanding databases, consistency, and distributed data.

View on Amazon
MongoDB — The Complete Developer's Guide — Udemy

Comprehensive MongoDB course from basics to advanced aggregations.

View Course

Practical engineering notes

Get the next backend guide in your inbox

One useful note when a new deep dive is published: system design tradeoffs, Java production lessons, Kafka debugging, database patterns, and AI infrastructure.

No spam. Just practical notes you can use at work.

Sachin Sarawgi

Written by

Sachin Sarawgi

Engineering Manager and backend engineer with 10+ years building distributed systems across fintech, enterprise SaaS, and startups. CodeSprintPro is where I write practical guides on system design, Java, Kafka, databases, AI infrastructure, and production reliability.

Continue Series

Database Sharding Mastery

Lesson 2 of 8 in this learning sequence.

Next in series
1

Intermediate

Database Sharding Part 1: The Vertical Ceiling

Database Sharding Part 1: The Vertical Ceiling In the early days of a startup, database scaling is a solved problem. You go into the AWS RDS or GCP Cloud SQL console, select a larger instance type, click "Modify," and yo…

2

Intermediate

Database Sharding Part 2: Partitioning vs. Sharding

Database Sharding Part 2: Partitioning vs. Sharding When engineering teams realize their database is struggling to keep up with load, the words "partitioning" and "sharding" are often thrown around interchangeably in arc…

3

Advanced

Database Sharding Part 3: The Shard Key Blueprint

Database Sharding Part 3: The Shard Key Blueprint Once you have accepted the architectural complexity of sharding (as discussed in Part 2), you are faced with the single most critical decision in distributed database des…

4

Advanced

Database Sharding Part 4: Consistent Hashing Internals

Database Sharding Part 4: Consistent Hashing Internals In Part 3, we successfully identified userid as our shard key. Now, we must write the mathematical algorithm that routes an incoming query containing userid: 1045 to…

5

Advanced

Database Sharding Part 5: The Scatter-Gather Problem

Database Sharding Part 5: The Scatter-Gather Problem When you implement a sharded database architecture, you establish a primary Shard Key (as discussed in Part 3) to route your queries. If your Shard Key is userid, retr…

6

Advanced

Database Sharding Part 6: Zero-Downtime Re-sharding

Database Sharding Part 6: Zero-Downtime Re-sharding Eventually, your sharding strategy will need to change. Perhaps you outgrew your initial 4-shard cluster and need to expand to 16 shards. Or perhaps you realized your S…

7

Advanced

Database Sharding Part 7: Case Study - Scaling Discord to Billions

Database Sharding Part 7: Case Study - Scaling Discord to Billions We have spent the last 6 parts of this series exploring the theoretical mathematics, routing algorithms, and migration playbooks for distributed database…

8

Expert

Query Optimization: The Hidden Cost of Cross-Shard Joins

Query Optimization: The Hidden Cost of Cross-Shard Joins In a sharded database, the "Scatter-Gather" query is the silent performance killer. When you perform a join on columns that aren't the shard key, your proxy has to…

Keep Learning

Move through the archive without losing the thread.

Related Articles

More deep dives chosen from shared tags, category overlap, and reading difficulty.

DatabasesIntermediate

Database Sharding Part 1: The Vertical Ceiling

Database Sharding Part 1: The Vertical Ceiling In the early days of a startup, database scaling is a solved problem. You go into the AWS RDS or GCP Cloud SQL console, select a larger instance type, click "Modify," and yo…

Apr 20, 20265 min read
Deep DiveDatabase Sharding Mastery
#database-scaling#sharding#performance
DatabasesAdvanced

Database Sharding Part 5: The Scatter-Gather Problem

Database Sharding Part 5: The Scatter-Gather Problem When you implement a sharded database architecture, you establish a primary Shard Key (as discussed in Part 3) to route your queries. If your Shard Key is userid, retr…

Apr 20, 20265 min read
Deep DiveDatabase Sharding Mastery
#query-optimization#sharding#latency
DatabasesAdvanced

Database Sharding Part 3: The Shard Key Blueprint

Database Sharding Part 3: The Shard Key Blueprint Once you have accepted the architectural complexity of sharding (as discussed in Part 2), you are faced with the single most critical decision in distributed database des…

Apr 20, 20265 min read
PlaybookDatabase Sharding Mastery
#database-scaling#shard-key#system-design
DatabasesAdvanced

Database Sharding Part 6: Zero-Downtime Re-sharding

Database Sharding Part 6: Zero-Downtime Re-sharding Eventually, your sharding strategy will need to change. Perhaps you outgrew your initial 4-shard cluster and need to expand to 16 shards. Or perhaps you realized your S…

Apr 20, 20265 min read
PlaybookDatabase Sharding Mastery
#data-migration#zero-downtime#sharding

More in Databases

Category-based suggestions if you want to stay in the same domain.