System Design Masterclass: Designing a Payment Gateway (Stripe)
Designing a system to serve photos or short URLs is fundamentally about optimizing for read-latency and disk space. If a user's photo fails to load, they refresh the page.
Designing a Payment Gateway (like Stripe, Adyen, or PayPal) is a completely different engineering paradigm. It is fundamentally about Correctness, Atomicity, and Trust. If your system charges a user twice, or drops a transaction, the business faces massive legal and financial liability.
In this premium blueprint, we will design a highly secure, PCI-compliant payment orchestration platform.
1. Capacity Estimation & Constraints
Unlike social media platforms, payment gateways do not have millions of queries per second (QPS). Even global networks like Visa peak at roughly 65,000 QPS. Stripe operates well below that.
Assumptions:
- Transactions: 10 Million transactions per day.
- QPS:
10,000,000 / 86400 = ~115 QPS(average). Peak QPS around1,000. - Latency: Because we must synchronously communicate with external banks, latency will be high (
1s to 5sper request). - Availability:
99.999%(Five Nines). Downtime means merchants lose money instantly. - Consistency: Strong Consistency (ACID). Eventual consistency is entirely unacceptable for financial ledgers.
Conclusion: The engineering challenge is not network throughput; it is Resilience and Atomicity.
2. API Design
We need an endpoint for the merchant's backend to initiate a charge.
POST /v1/charges
Headers:
Authorization: Bearer sk_live_12345
Idempotency-Key: 7421-4f11-89ab-cd7123ef (Crucial!)
Request Body:
{
"amount": 5000,
"currency": "usd",
"source": "tok_12345",
"description": "Premium Subscription"
}
(Note: Always represent currency in its smallest unit, e.g., cents, to avoid floating-point math errors).
3. High-Level Architecture
The system acts as a router between the merchant and the highly-regulated global banking network.
graph TD
Client[Client Browser/App] -->|1. Submit Card| Vault[PCI-Compliant Token Vault]
Vault -->|2. Return Token| Client
Client -->|3. Checkout| Merchant[Merchant Backend]
Merchant -->|4. POST /charges| API[API Gateway]
API -->|5. Verify| Idempotency[(Idempotency DB)]
API -->|6. Check Risk| Risk[Fraud / ML Engine]
API -->|7. Execute| Processor[Payment Processor Core]
Processor -->|8. API Call| Bank[External Bank/Visa]
Processor -->|9. Record| Ledger[(Double-Entry Ledger)]
style Vault fill:#047857,stroke:#fff,stroke-width:2px,color:#fff
style Processor fill:#1e40af,stroke:#fff,stroke-width:2px,color:#fff
style Ledger fill:#b91c1c,stroke:#fff,stroke-width:2px,color:#fff
4. The Deep Dive: Core Engineering Pillars
Pillar A: Tokenization & PCI Compliance
Storing raw credit card numbers requires intense auditing (PCI-DSS compliance). If your database is breached, the company is ruined.
To minimize risk, we implement Tokenization:
- When a user types their credit card into a form, the form submits directly to our highly secure, isolated Vault service.
- The Vault encrypts the card and returns a one-time token (
tok_12345). - The merchant's backend only ever sees the token.
- When the merchant calls
/v1/charges, our Payment Core asks the Vault to "detokenize" the card so we can send it to the bank.
Pillar B: Idempotency (The Most Critical Concept)
What happens if the merchant's server sends a charge request, our Payment Core charges the credit card, but right before we send the 200 OK response, the merchant's internet connection drops?
The merchant's code will automatically retry the request. If we aren't careful, we will charge the user a second time.
To prevent double-charging, the merchant must include a unique Idempotency-Key in the HTTP header (e.g., a UUID representing the shopping cart).
Before processing a charge, our API queries an Idempotency Database (often Redis or Postgres).
1. If the key exists and the status is PENDING, we return a 409 Conflict (a retry is already happening).
2. If the key exists and the status is SUCCESS, we return the cached JSON response from the previous successful call without talking to the bank again.
3. If it doesn't exist, we insert it as PENDING and proceed.
Pillar C: The Double-Entry Ledger
When money moves, it must be recorded perfectly. We do not use a standard users table with a balance column. We use a Double-Entry Ledger.
Every transaction creates two immutable, append-only records:
CREDIT: Merchant Account (+ $50.00)DEBIT: Settlement Account (- $50.00)
Because the ledger is append-only, no row is ever UPDATED or DELETED. If a refund occurs, we append two new rows reversing the flow. This ensures a perfect audit trail that accountants and regulators can verify. The database backing this must be strongly consistent and support ACID transactions (e.g., PostgreSQL, CockroachDB, or Spanner).
5. Asynchronous Flows and Webhooks
Because bank APIs are notoriously slow and prone to timeouts, payment state machines are complex (PENDING -> AUTHORIZED -> CAPTURED -> SETTLED).
Merchants cannot leave HTTP connections open for hours waiting for a payment to settle. We use Webhooks to notify them asynchronously.
- When a payment state changes in the database, a CDC (Change Data Capture) tool like Debezium publishes an event to Kafka.
- A dedicated
Webhook Dispatcherservice consumes the event and sends an HTTP POST to the merchant's registered URL. - If the merchant's server returns a
500 Error, the Dispatcher uses exponential backoff to retry delivery over the next 3 days.
To prevent hackers from sending fake "Payment Successful" webhooks to the merchant, your gateway must sign the webhook payload using a cryptographic secret shared with the merchant. The merchant verifies the Stripe-Signature header before fulfilling the order.
Summary Checklist for the Interview
When an interviewer asks you to design a Payment Gateway, ensure you hit these four checkpoints:
- Emphasize that Correctness is vastly more important than QPS.
- Defend PostgreSQL (or another ACID database) for the Double-Entry Ledger. Never suggest Cassandra or MongoDB for core financial ledgers.
- Explicitly solve the double-charging problem using Idempotency Keys.
- Explain how Tokenization isolates PCI-compliance risk from the core application logic.
