System Design: Designing a Database Proxy for Sharding
Scaling a relational database like MySQL or PostgreSQL is one of the hardest challenges in engineering. When a single database server can't handle the load, you must Shard (partition) your data. But sharding manually in your application code is a nightmare. This is why we need a Database Proxy.
1. What is a Database Proxy?
A proxy (like Vitess, ProxySQL, or Prisma Data Proxy) sits between your application and your database nodes. The application talks to the proxy as if it were a single, giant database, and the proxy handles the complexity of sharding, routing, and replication in the background.
2. The Sharding Coordinator
The proxy's most important job is Routing.
- The Logic: You define a "Shard Key" (e.g.,
user_id). When the application runsSELECT * FROM users WHERE user_id = 123, the proxy calculates:shard = hash(123) % Nand routes the query to the correct physical server. - Cross-Shard Queries: If a query doesn't include the shard key, the proxy must "Scatter-Gather"—sending the query to all shards and merging the results.
3. Connection Pooling at Scale
Opening a new database connection is expensive (handshakes, authentication).
- The Problem: 10,000 application containers each opening 10 connections = 100,000 connections. MySQL will crash.
- The Solution: The proxy maintains a small, fixed pool of persistent connections to each database node and multiplexes application requests over them. This allows thousands of app instances to share a handful of DB connections.
4. Query Rewriting and Optimization
A smart proxy can improve performance without changing application code:
- Query Sanitization: Blocking slow or dangerous queries (e.g.,
SELECT *without aLIMIT). - Read-Write Splitting: Automatically routing
SELECTqueries to Read Replicas andINSERT/UPDATEto the Primary node.
5. Handling Database Failovers
When a primary database node dies, the proxy detects it instantly.
- Automatic Routing: The proxy redirects all traffic to a promoted replica. The application never sees a connection error—it only experiences a tiny latency spike.
6. Real-world Architectures: Vitess
Vitess (used by YouTube and Slack) takes this further by adding:
- VTGate: The proxy layer.
- VTTablet: A sidecar that runs alongside every MySQL instance to monitor health and enforce query limits.
- Topology Store: A Zookeeper/Etcd cluster that stores the global sharding map.
Summary
Building a database proxy is about Abstracting Complexity. By moving sharding and connection management into a dedicated infrastructure layer, you can scale your relational data to millions of users while keeping your application code clean and simple.
