Database Sharding Part 1: The Vertical Ceiling
In the early days of a startup, database scaling is a solved problem. You go into the AWS RDS or GCP Cloud SQL console, select a larger instance type, click "Modify," and your application is suddenly fast again.
This process—scaling by adding more CPU, RAM, and Disk I/O to a single machine—is known as Vertical Scaling.
Vertical scaling is glorious. It requires zero application code changes, zero migration downtime (with modern failover), and preserves the beautiful, monolithic ACID guarantees of your relational database. But eventually, every hyper-growth application hits a physical limit. We call this the Vertical Ceiling.
In Part 1 of this series, we will dissect the physical, architectural, and operational ceilings of single-node databases, and define exactly when you must abandon vertical scaling and embrace sharding.
The Physical Limits of Hardware
You might think, "I'll just keep buying bigger servers." But the cloud providers have limits.
As of this writing, if you attempt to scale a PostgreSQL database on AWS, the largest instance you can provision (like a db.x2iedn.32xlarge) tops out at 128 vCPUs and 4,096 GiB of RAM.
While 4TB of RAM sounds virtually infinite for a startup, for an application processing millions of events per second, it is remarkably small. More importantly, simply adding cores and RAM yields diminishing returns due to NUMA (Non-Uniform Memory Access) architecture.
On massive servers, CPUs are divided into "nodes." Each CPU has local memory. If CPU 1 needs to read memory attached to CPU 4, the request must traverse the interconnect bus. As you scale vertically to 128 cores, the overhead of memory cross-talk increases exponentially. Doubling the cores rarely doubles the database throughput; it often only increases it by 30-40%.
The I/O Bottleneck
Even if you have infinite CPU, you do not have infinite disk speed.
Relational databases use Write-Ahead Logging (WAL). Every time you INSERT or UPDATE a row, the database must write that transaction to the WAL on disk before acknowledging the commit to the user. This is what guarantees durability.
If you are using network-attached storage like AWS EBS, you are fundamentally constrained by the network. AWS io2 Block Express volumes max out at 256,000 IOPS and 4,000 MB/s throughput.
When your application is ingesting 300,000 telemetry events per second, your EBS volume will physically throttle the database. At this point, the Vertical Ceiling is hard-capped by the physics of fiber-optic network cables in the AWS datacenter.
The Operational Nightmare
Let's assume you somehow bypass the CPU and I/O limits. You now have a 30 Terabyte PostgreSQL database sitting on a single monolithic instance. You have just entered the operational nightmare phase.
1. The Vacuuming Problem
PostgreSQL uses MVCC (Multi-Version Concurrency Control). When you update a row, it creates a new version and marks the old one as dead. A background process called autovacuum must clean up these dead rows. On a 30TB database with high write churn, the vacuum process can take days to run, eating up disk I/O and causing severe table bloat.
2. The Index Rebuild
Imagine needing to create a new index on a users table with 5 billion rows. Even using CREATE INDEX CONCURRENTLY, this operation will take weeks. If the index creation fails halfway through, you have to start over. Schema migrations become a high-stakes, multi-week anxiety attack.
3. The Backup and Restore Window
If your single massive database gets corrupted, you must restore it from a snapshot. Restoring a 30TB volume and rolling forward the transaction logs can easily result in 12+ hours of complete production downtime.
A monolithic database means a monolithic blast radius. If one bad analytical query accidentally bypasses an index and performs a full table scan on 5 billion rows, it will evict the entire working set from RAM, spiking CPU to 100%, and taking down the entire application for all users globally.
The Concurrency Bottleneck (Latch Contention)
Finally, there is internal database contention. When 10,000 active connections attempt to update different rows that happen to live on the same physical 8KB memory page, the database must lock that memory page. This is known as latch contention.
No amount of CPU or RAM can solve this. The database engine itself becomes the bottleneck because thousands of threads are waiting in line to modify the same memory block.
When is Sharding the Answer?
Sharding is the process of splitting your monolithic database into multiple, independent databases (shards), each responsible for a subset of the data.
Because sharding breaks relational constraints (foreign keys, global secondary indexes, and global transactions) and vastly increases application complexity, it must be your absolute last resort.
Before you shard, you must exhaust these alternatives:
- Vertical Scaling: Max out the RDS instance size until the ROI diminishes.
- Read Replicas: Offload all
SELECTqueries to 5-10 read replicas. - Caching: Put Redis or Memcached in front of the database to absorb 90% of read traffic.
- Table Partitioning: Break massive tables into monthly/weekly partitions within the same database to keep index sizes small and make vacuuming manageable.
When you have implemented all of the above, and your primary database is still hitting 90% CPU utilization purely from write traffic, you have officially hit the Vertical Ceiling.
It is time to shard.
