Designing a highly scalable, multi-channel notification system (supporting iOS Push, Android Push, SMS, and Email) requires solving a core reliability problem: how to guarantee message delivery under third-party gateway throttle limits, network failures, and massive traffic spikes without losing messages or double-delivering alerts.
This case study details the priority-queuing models, idempotency caches, and resilient retry architectures required to operate a global notification engine at scale.
1. Requirements & Core Constraints
To establish our technical architecture, we define the performance limits, ingestion QPS, and reliability metrics of our notification engine.
Functional Requirements
- Multi-Channel Dispatching: Deliver alerts across multiple channels: iOS Push (via APNS), Android Push (via FCM), SMS (via Twilio/Plivo), and Email (via SendGrid/Mailgun).
- Priority-Based Routing: Critical real-time notifications (e.g., 2FA OTP codes, security alerts) must bypass background bulk messages (e.g., marketing newsletters).
- Template Management: Support dynamic notification templates with runtime placeholder variables (e.g.,
"Hi {username}, your OTP is {code}"). - User Preference Center: Provide users with channel-specific opt-out switches and time-of-day quiet hour rules.
- Delivery State Tracking: Log the status of each notification (Ingested, Queued, Processing, Sent, Failed, Clicked).
Non-Functional Requirements
- Strict Delivery SLAs: High-priority transactional notifications (OTPs) must be delivered within $5\text{ seconds}$ (p99 latency).
- No Data Loss: Notifications should never be silently dropped. The system must implement robust retry queues.
- Idempotency: Prevent double-delivery of notifications (e.g., sending the same charge alert email twice due to network retries).
- Extensibility: Third-party gateway providers must be decoupled behind standard adapters so we can failover between vendors dynamically.
Back-of-the-Envelope Estimation
- System Scale & QPS:
- Total User Base: $100,000,000$ registered users.
- Daily Notification Volume: $100,000,000$ total messages sent per day across all channels.
- Average Dispatch QPS: $$\text{Average Ingest QPS} = \frac{100,000,000}{86,400} \approx 1,157 \text{ QPS}$$
- Peak Transactional Burst QPS: During high-intensity events (e.g., flash sales or security incidents), transactional QPS can surge to $10\times$: $$\text{Peak Transactional QPS} \approx 10,000 \text{ QPS}$$
- Storage Footprint (90-Day Log Retention):
- Each notification delivery log is stored for audit and analytics.
- Log Record Size:
notification_id(16 bytes),user_id(16 bytes),payload(200 bytes),status(12 bytes),metadata(100 bytes) = 344 bytes. - Daily Storage Footprint: $$100,000,000 \text{ alerts/day} \times 344 \text{ bytes} \approx 34.4 \text{ GB/day}$$
- 90-Day Total Storage: $$34.4 \text{ GB/day} \times 90 \text{ days} \approx 3.1 \text{ TB}$$ This data is sharded and stored in a database optimized for write-heavy logging.
2. API Design & Core Contracts
The system exposes a unified ingestion API that accepts multi-channel payloads and handles templating parameters.
Dispatch Notification
POST /api/v1/notifications/send
Invoked by internal microservices to request dispatch.
Request Payload:
{
"user_id": "usr_99a8c7b0",
"priority": "HIGH",
"channels": ["PUSH", "EMAIL"],
"idempotency_key": "idem-uuid-7f8a-45c1-92b1",
"template_id": "temp_otp_verification",
"variables": {
"username": "Sachin",
"otp_code": "489201"
},
"channel_overrides": {
"email": {
"subject": "Critical: Verify Your Account"
}
}
}
Response Payload:
{
"notification_id": "notif_3389a01f",
"status": "INGESTED",
"channels_queued": ["PUSH", "EMAIL"],
"created_at": "2026-05-22T16:50:00Z"
}
3. High-Level Design (HLD)
The architecture separates ingestion from dispatch. A fast API gateway validates and commits messages to a distributed message bus, while a dedicated worker fleet handles vendor integration, rate limiting, and failure recoveries.
graph TD
%% Ingest Path
Publisher[Internal Microservices] -->|1. POST Send Request| Gateway[API Gateway & Rate Limiter]
Gateway -->|2. Check Idempotency| IdemCache[(Redis Idempotency Cache)]
Gateway -->|3. Read Preferences| UserDB[(PostgreSQL Primary)]
%% Queue Segregation
Gateway -->|4. Push to Segregated Topics| Kafka[Kafka Event Bus]
subgraph Ingestion Queues
Kafka -->|High Priority| TopicHigh[topic.notif.high-2fa]
Kafka -->|Medium Priority| TopicMed[topic.notif.medium-updates]
Kafka -->|Low Priority| TopicLow[topic.notif.low-marketing]
end
%% Worker Dispatching
TopicHigh -->|Consume| WorkerFleet[Worker Dispatch Fleet]
TopicMed -->|Consume| WorkerFleet
TopicLow -->|Consume| WorkerFleet
%% Third-party Integrations
WorkerFleet -->|Push Notification| APNS[iOS APNS Gateway]
WorkerFleet -->|Push Notification| FCM[Android FCM Gateway]
WorkerFleet -->|SMS Alert| Twilio[Twilio/Plivo SMS API]
WorkerFleet -->|Email Alert| SendGrid[SendGrid/Mailgun API]
%% Failure & Retry Path
WorkerFleet -- 5. Push Failed -> RetryQueue[Kafka Retry Topic]
RetryQueue -->|Delay Consumer| WorkerFleet
WorkerFleet -- 6. Multi-Failures -> DLQ[Dead Letter Queue DB]
Flow Breakdown:
- Request Ingestion: Internal microservices publish notifications through the API Gateway. The gateway validates payload variables and runs a rapid membership test against the Redis Idempotency Cache to filter duplicate requests.
- Preference Resolution: The gateway fetches the recipient's preference state (e.g., quiet hours, opt-outs) from the PostgreSQL Database and filters out blocked channels.
- Queue Segregation: The gateway writes messages into appropriate Kafka Topics based on priority:
topic.notif.high-2fa(OTPs, immediate processing, dedicated thread counts).topic.notif.medium-updates(Transactional order updates, Direct messages).topic.notif.low-marketing(Bulk campaigns, heavily rate-limited).
- Worker Dispatch: The Worker Dispatch Fleet consumes messages from Kafka, parses templates, hydrates placeholders, and sends requests to third-party vendor APIs (APNS, FCM, Twilio, SendGrid).
- Retry Orchestrator: If a vendor gateway returns an error, the worker routes the message to the Kafka Retry Topic for deferred retries, keeping primary topics clear of retry blocks.
4. Low-Level Design (LLD) & Data Models
Database Selection Rationale
- User Configurations & Templates (PostgreSQL): Storing structured templates, quiet hours, user preferences, and tracking keys is relational.
- Deduplication State (Redis): We use Redis to store active
idempotency_keyswith a 24-hour TTL. Point-lookups (EXISTS) take $O(1)$ time, preventing database locks during ingestion bursts. - Log Archiving (MongoDB / Cassandra): Storing high-volume, append-only notification logs is a document/wide-column problem. MongoDB allows us to handle high-frequency writes and easily scale reviews independently.
SQL DDL Database Schemas
User Channel Preferences
CREATE TABLE user_preferences (
user_id VARCHAR(64) NOT NULL,
channel VARCHAR(32) NOT NULL, -- 'PUSH', 'SMS', 'EMAIL'
is_enabled BOOLEAN NOT NULL DEFAULT TRUE,
quiet_hour_start TIME WITH TIME ZONE NULL,
quiet_hour_end TIME WITH TIME ZONE NULL,
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (user_id, channel)
);
CREATE INDEX idx_user_preferences_lookup ON user_preferences (user_id);
Notification Templates
CREATE TABLE notification_templates (
id VARCHAR(64) PRIMARY KEY,
name VARCHAR(255) NOT NULL,
email_subject VARCHAR(255),
body_content TEXT NOT NULL, -- "Hi {username}, your verification code is {code}"
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
Notification Log Table
CREATE TABLE notification_logs (
id VARCHAR(64) PRIMARY KEY,
user_id VARCHAR(64) NOT NULL,
channel VARCHAR(32) NOT NULL,
status VARCHAR(32) NOT NULL, -- 'INGESTED', 'QUEUED', 'SENT', 'FAILED'
retry_count INT DEFAULT 0,
last_error TEXT,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_notification_logs_lookup ON notification_logs (user_id, created_at);
5. Scaling Challenges & High Write Concurrency
Scaling ingestion to 10K QPS and dispatching through strict third-party rate-limited channels introduces three primary architectural bottlenecks:
Mitigating Vendor Gateways Bottlenecks
Third-party providers (APNS, FCM, Twilio) enforce strict rate-limits on our API keys. If we burst 10K messages through Twilio at once, Twilio will return HTTP 429 and drop our calls.
graph LR
WorkerFleet[Worker Fleet] -->|Rate Limit Throttle| TokenBucket[Token Bucket Rate Limiter]
TokenBucket -->|Allow Rate Limit QPS| Twilio[Twilio SMS Gateway]
- The Mitigation: Token Bucket Rate Limiting. We maintain a cluster-wide Token Bucket Rate Limiter (using Redis Lua scripts) for each vendor API key. Before a worker fires a request, it must acquire a token. If the bucket is empty, the worker pulls back, deferring the message to the Kafka Retry Topic. This prevents key suspension and guarantees compliance with vendor SLAs.
6. Technical Trade-offs & Consistency Models
Designing global messaging fabrics requires balancing data integrity, latency, and delivery guarantees:
Tradeoff A: At-Least-Once vs. Exactly-Once Delivery
- Exactly-Once Delivery:
- Pros: Ideal user experience—users never receive duplicate push alerts or emails.
- Cons: Mathematically impossible to guarantee across network boundaries (due to the Two Generals' Problem). Enforcing this requires massive distributed consensus protocols (e.g., 2-Phase Commit), which degrades throughput by 95%.
- At-Least-Once Delivery:
- Pros: Guarantees that no notification is ever lost, maintaining highly available pipelines.
- Cons: Users may occasionally receive duplicate messages during network partitions.
- Our Resolution: We choose At-Least-Once with Client Idempotency Deduplication. We guarantee delivery via Kafka retries, and use Redis to store and verify transaction tokens, achieving simulated exactly-once delivery at high scale.
Tradeoff B: DB Transaction Logs vs. Memory Caches for Deduplication
During high-volume bursts (e.g., millions of 2FA alerts), checking PostgreSQL for duplicate keys before sending becomes a major database bottleneck. We use Redis as a distributed cache. Checking an in-memory key take less than 1ms and completely shields PostgreSQL from read/write locking contention.
7. Resilience & Failure Scenarios
Operating global distributed networks requires robust failure isolation protocols:
Scenario A: APNS/FCM Outages
If APNS goes down globally for 2 hours, iOS push notifications will fail. If we do not isolate this failure, failing push notifications will saturate our worker pools, blocking SMS and email dispatches.
- The Mitigation: Bulkheads & Dedicated Thread Pools. We run dedicated, isolated worker fleets for each channel (Push Workers, SMS Workers, Email Workers). If APNS goes down, the Push Worker queues will fill up, but our SMS and Email Worker fleets will continue to process alerts without interruption.
Scenario B: Resilient Retry Loops with Jitter
When a downstream provider fails temporarily, retrying all messages immediately will worsen the outage (the "Thundering Herd" problem).
- The Mitigation: We retry failed dispatches up to 3 times using Exponential Backoff with Jitter. The wait time before retry increases exponentially: $$t = 2^{\text{attempt}} + \text{random_jitter}$$ This spreads the retry load smoothly over time. If a message fails after 3 attempts, it is routed to a Dead-Letter Queue (DLQ) database table, and an alert is fired to our on-call engineers.
8. Staff Engineer Perspective (Deep-Dive Callouts)
9. Candidate Verbal Script (Interview Guide)
Interviewer: "How would you design a notification system that guarantees 2FA OTP codes are delivered within 5 seconds, even during a massive marketing campaign blast?"
Candidate: "First, we must avoid using a single, unified message queue. If a marketing campaign pushes 10 million marketing emails, our high-priority 2FA codes will get stuck in the queue queue.
To solve this, I would implement Strict Queue Segregation using Apache Kafka. I would set up three segregated topics: topic.notif.high-2fa (for transactional alerts and OTPs), topic.notif.medium-updates (for order status updates), and topic.notif.low-marketing (for marketing campaigns).
We run dedicated, isolated Worker Fleets for each queue. The High-Priority fleet is allocated 80% of our worker resources and maintains high thread counts to ensure instant processing.
To protect the system from third-party vendor rate limits (e.g., Twilio API limits), we maintain a cluster-wide Token Bucket Rate Limiter inside Redis. Before a worker dispatches a message, it must acquire a token. If the bucket is empty, the worker backs off and routes the message to a Kafka Retry Topic for deferred retry with exponential backoff and jitter. This preserves our gateway keys and guarantees delivery SLAs."
Interviewer: "What happens if a network partition occurs and the API is retried? How do you ensure the user doesn't receive the same transaction notification twice?"
Candidate: "We enforce strict Idempotency Gating using a distributed cache layer powered by Redis.
When an internal service requests a notification dispatch, it must generate and pass a unique idempotency_key (e.g., derived from user_id, template_id, and hash_of_parameters).
At the ingestion layer, the API Gateway runs an atomic SET key val NX PX 86400 command inside Redis. If the key already exists, Redis returns false, the gateway recognizes the request as a duplicate, and returns the previous tracking ID without re-queueing the message.
If a network partition occurs during payment and the booking service retries the call, the gateway blocks the duplicate request, achieving simulated 'exactly-once' delivery across the network."