Designing a High-Throughput Notification System for 100K Events per Second

100,000 events per second is not a notification system problem — it's a data pipeline problem that happens to produce notifications. Teams that approach it as a features problem ("we just need push, email, and SMS") build systems that collapse under load. The engineering challenge is the fan-out at scale: one event triggers notifications to potentially thousands of users, each notification routed through different channels, rate-limited per user, deduplicated, and delivered with retry guarantees.

Functional and Non-Functional Requirements

Functional:

Ingest raw events from producer services (user actions, system events)
Enrich events with user preferences and notification templates
Route to appropriate channels: push (FCM/APNs), email, SMS, in-app
Respect per-user preferences and quiet hours
Track delivery status per notification

Non-Functional:

Ingest throughput: 100,000 events/second (peak)
Delivery latency: < 5 seconds end-to-end for push notifications
Delivery latency: < 60 seconds for email/SMS
Availability: 99.9% (< 8.7 hours downtime/year)
At-least-once delivery with idempotent consumers
Rate limiting: max 10 push notifications/user/hour
Multi-region: active-active for ingestion, active-passive for delivery

Throughput Calculation and Capacity Planning

Raw event rate: 100,000 events/second

Fan-out ratio: on average, each event triggers notifications for 5 users (social platform: a viral post triggers notifications to followers). Burst: 50× fan-out (celebrity post, flash sale).

Peak notification volume:
100,000 events/s × 50 fan-out = 5,000,000 notifications/second (peak burst)
100,000 events/s × 5 fan-out  = 500,000 notifications/second (average)

Channel distribution:
- Push: 70% → 350,000 push/second
- In-app: 20% → 100,000 in-app/second
- Email: 8%  → 40,000 email/second
- SMS: 2%    → 10,000 SMS/second

Storage sizing:

Each notification record: ~2KB
Retention: 30 days
Volume: 500,000/s × 86,400s × 30 days × 2KB = 2.59 TB/day

This is a write-heavy storage problem. Cassandra or DynamoDB, not PostgreSQL.

Kafka Partition Planning

Kafka partitions determine parallelism. Rule: partition count ≥ peak consumer count × 1.5.

For the event ingestion topic (raw-events):

Target throughput: 100,000 events/second × 1KB avg = 100 MB/s
Kafka broker write throughput: ~200 MB/s per broker (practical limit)
Minimum brokers: 100/200 = 0.5 → use 3 brokers for HA

Target consumer parallelism: 200 consumer threads
Partition count: 200 × 1.5 = 300 partitions

For the fan-out output topic (notifications-to-send):

Peak output: 5,000,000 notifications/second × 500B = 2,500 MB/s
Brokers needed: 2,500/200 = 12.5 → 15 brokers (safety margin)
Partitions: 1,500 (15 brokers × 100 partitions/broker)

Topic Configuration:
raw-events:
  partitions: 300
  replication-factor: 3
  min.insync.replicas: 2
  retention.ms: 3600000  # 1 hour (events are processed quickly)

notifications-to-send:
  partitions: 1500
  replication-factor: 3
  retention.ms: 86400000  # 24 hours (for replay on downstream failure)

notification-status:
  partitions: 300
  replication-factor: 3
  retention.ms: 604800000  # 7 days

System Architecture

High-Throughput Notification Architecture:

Producer Services (100K events/s)
[Order Service] [Social Service] [Marketing Service]
        │              │                │
        └──────────────┴────────────────┘
                       │
                       ▼ (Kafka, 100K msg/s)
              ┌─────────────────┐
              │   raw-events    │  300 partitions
              │   topic         │
              └────────┬────────┘
                       │
                       ▼
          ┌────────────────────────┐
          │   Event Processor      │  200 instances
          │   - Validate           │  (Kafka consumer group)
          │   - Deduplicate        │
          │   - Enrich with        │
          │     user prefs         │
          └────────┬───────────────┘
                   │ Fan-out (1 event → N notifications)
                   ▼ (Kafka, 500K-5M msg/s peak)
        ┌──────────────────────┐
        │ notifications-to-    │  1500 partitions
        │ send topic           │
        └───┬──────────┬───────┘
            │          │
    ┌───────┘          └──────────┐
    ▼                             ▼
┌─────────────┐           ┌─────────────┐
│  Push       │           │  Email/SMS  │
│  Dispatcher │           │  Dispatcher │
│  (FCM/APNs) │           │  (SES/SNS)  │
│  500 inst   │           │  100 inst   │
└──────┬──────┘           └──────┬──────┘
       │                         │
       ▼                         ▼
┌──────────────────────────────────────┐
│   Notification Status Store          │
│   (Cassandra/DynamoDB)               │
│   + notification-status Kafka topic  │
└──────────────────────────────────────┘

Database Schema

-- Cassandra schema for notification storage
-- Partition key = user_id for co-located user notification history
-- Clustering key = created_at DESC for reverse-chronological reads

CREATE TABLE notifications (
    user_id         UUID,
    notification_id UUID,
    type            TEXT,          -- push | email | sms | in_app
    title           TEXT,
    body            TEXT,
    status          TEXT,          -- pending | sent | delivered | failed
    channel_msg_id  TEXT,          -- FCM message_id, SES message_id, etc.
    idempotency_key TEXT,
    metadata        MAP<TEXT, TEXT>,
    created_at      TIMESTAMP,
    delivered_at    TIMESTAMP,
    PRIMARY KEY ((user_id), created_at, notification_id)
) WITH CLUSTERING ORDER BY (created_at DESC)
  AND compaction = {'class': 'TimeWindowCompactionStrategy',
                    'compaction_window_size': '1',
                    'compaction_window_unit': 'DAYS'};

-- TTL: auto-expire after 90 days
ALTER TABLE notifications WITH default_time_to_live = 7776000;

Fan-Out Problem and Solution

Fan-out is the amplification problem. One event → many notifications. Two strategies:

Push fan-out (eager): Expand the fan-out immediately when the event is ingested. For a post liked by 1M followers: generate 1M notification records instantly.

Pros: Delivery latency is predictable (no fan-out latency at read time)
Cons: Hot events cause traffic spikes; wasteful for inactive users

Pull fan-out (lazy): Store the event once; fan-out happens when the user opens the app.

Pros: Low write amplification; inactive users don't receive unnecessary processing
Cons: First read after event is slow (fan-out happens on read)

Hybrid (what large platforms use): Push fan-out for regular users (< 10K followers). Pull fan-out for celebrity/high-follower accounts. Threshold: if sender follower count > 100K, use pull fan-out.

@Service
public class FanOutRouter {

    private static final int PUSH_FANOUT_THRESHOLD = 100_000;

    public void route(NotificationEvent event) {
        long followerCount = userService.getFollowerCount(event.getSenderId());

        if (followerCount <= PUSH_FANOUT_THRESHOLD) {
            // Eager fan-out: write a notification for each follower now
            fanOutService.pushFanOut(event);
        } else {
            // Lazy fan-out: write event once, expand on read
            fanOutService.lazyFanOut(event);
        }
    }
}

Rate Limiting

Rate limiting protects users from notification spam and protects downstream channels from overload.

Per-user rate limits: Redis sliding window counter:

public boolean isAllowed(String userId, String channel) {
    String key = "ratelimit:" + channel + ":" + userId;
    long windowSeconds = 3600L; // 1 hour window
    int maxAllowed = switch (channel) {
        case "push"  -> 10;
        case "email" -> 3;
        case "sms"   -> 2;
        default      -> 20;
    };

    // Lua script: atomic sliding window check
    Long count = redisTemplate.execute(
        SLIDING_WINDOW_SCRIPT,
        Collections.singletonList(key),
        String.valueOf(System.currentTimeMillis()),
        String.valueOf(windowSeconds * 1000),
        String.valueOf(maxAllowed)
    );

    return count != null && count <= maxAllowed;
}

Channel-level rate limits: FCM allows 600 notifications/second per project, SES defaults to 14 emails/second. Use a token bucket per channel:

@Component
public class ChannelRateLimiter {
    // RateLimiter from Guava: token bucket implementation
    private final Map<String, RateLimiter> channelLimiters = Map.of(
        "fcm",  RateLimiter.create(500),  // 500/s, below FCM limit
        "apns", RateLimiter.create(1000), // 1000/s
        "ses",  RateLimiter.create(100),  // 100/s, limit for burst safety
        "sns",  RateLimiter.create(200)
    );

    public void acquire(String channel) {
        channelLimiters.get(channel).acquire(); // Blocks until permit available
    }
}

Backpressure Handling

When downstream channels are slow (FCM is degraded, SES is throttling), Kafka consumer lag grows. Handle it explicitly:

@KafkaListener(topics = "notifications-to-send", groupId = "push-dispatcher")
public void dispatch(ConsumerRecord<String, NotificationMessage> record,
                     Acknowledgment ack) {
    NotificationMessage msg = record.value();

    // Check channel health before consuming more
    if (channelHealthMonitor.isUnhealthy("fcm")) {
        // Pause this partition — let lag build in Kafka
        consumer.pause(Collections.singleton(record.topicPartition()));
        scheduledExecutor.schedule(() -> {
            consumer.resume(Collections.singleton(record.topicPartition()));
        }, 5, TimeUnit.SECONDS);
        return; // Don't ack — will be redelivered
    }

    try {
        channelRateLimiter.acquire("fcm");
        FcmResult result = fcmClient.send(msg);
        notificationStore.updateStatus(msg.getNotificationId(), "sent", result.getMessageId());
        ack.acknowledge();
    } catch (FcmThrottledException e) {
        // Don't ack — will retry
        Thread.sleep(1000);
    }
}

Idempotency Design

The event processor may process the same event twice (Kafka at-least-once). Fan-out must be idempotent:

@Transactional
public void fanOut(NotificationEvent event) {
    String dedupKey = "fanout:" + event.getEventId();

    // Atomic insert — skip if already processed
    boolean inserted = cassandraOps.insert(
        new FanOutRecord(event.getEventId(), Instant.now())
    ).wasApplied(); // Cassandra lightweight transaction

    if (!inserted) {
        log.info("Fan-out already processed for event: {}", event.getEventId());
        return;
    }

    // Process fan-out only once
    List<Notification> notifications = generateNotifications(event);
    kafkaTemplate.send("notifications-to-send", notifications);
}

Retry Strategy

Retry topology:
notifications-to-send
        │
        ├── FCM success → notification-status (delivered)
        │
        ├── FCM retryable error (5xx, timeout)
        │       └── notifications-retry-30s (wait 30s)
        │               └── notifications-retry-5m
        │                       └── notifications-retry-30m
        │                               └── notifications-dlq
        │
        └── FCM non-retryable (invalid token, app uninstalled)
                └── Update user record: push token invalid
                └── notifications-dlq (for audit)

Horizontal Scaling

Each component scales independently:

Component	Scale trigger	Scale mechanism
Event Processor	Kafka consumer lag > 10K	Add consumer instances
Push Dispatcher	Kafka consumer lag + FCM latency	Add consumer instances
Email Dispatcher	SES send rate limit	Add SES sending identities
Cassandra	Disk > 70% or read latency > 5ms	Add Cassandra nodes

Kafka consumers scale horizontally up to the partition count. With 1,500 partitions on notifications-to-send, you can run 1,500 consumer threads — spread across ~100 pods × 15 threads each.

Monitoring and Alerting

Critical alerts (page immediately):
- Kafka consumer lag on notifications-to-send > 1,000,000 messages
- Push delivery success rate < 95% over 5 minutes
- DLQ topic lag growing > 1000/minute
- Cassandra write latency P99 > 100ms

Warning alerts (ticket/slack):
- Fan-out throughput > 80% of capacity (approaching limit)
- Per-user rate limit hit rate > 5% (users getting suppressed at high rate)
- Email bounce rate > 2%

Grafana dashboard panels:

Events ingested/second (with 24h comparison)
Fan-out ratio (events → notifications) — spike indicates viral content
Notification delivery success rate by channel
Kafka consumer lag by consumer group
Channel latency P50/P95/P99

Incident Simulation

Scenario: FCM has a 30-minute partial outage (50% error rate).

Impact without proper design:

Push dispatcher retries immediately
2× traffic to FCM
FCM rate-limits our project
All push notifications backed up
Downstream timeout cascade

Impact with proper design:

Circuit breaker opens after 20% FCM error rate
Push Dispatcher pauses Kafka consumption on FCM topic
Kafka lag builds (Kafka as buffer — this is the right behavior)
Backpressure propagates cleanly — no retry storm
Email/SMS/in-app continue unaffected (separate consumer groups)
When FCM recovers: circuit breaker half-opens, dispatcher resumes, lag drains over 10 minutes

The system degrades gracefully. Push notifications are delayed, not lost.

Trade-offs Discussion

Consistency vs availability for rate limits: Using Redis for per-user rate limits means Redis failure bypasses rate limiting. Decision: accept this. Redis failure is transient; brief notification overshoot is acceptable. Alternative (DB-based rate limits) would cause rate limit checks to become a bottleneck under load.

Fan-out at write vs read: Push fan-out guarantees low delivery latency but wastes compute for inactive users. For a 100M user platform where 20% are active, push fan-out wastes 80% of fan-out compute. Hybrid strategy based on sender follower count is the right balance.

Kafka retention: 24-hour retention on the notification topic is long enough for downstream failures to recover and replay. 7-day retention would require much larger storage. 1-hour retention would not cover extended outages.

The architecture scales to 100K events/second because every component is stateless and horizontally scalable, and Kafka absorbs all the burst capacity between components. The hard part is the fan-out math — design your Kafka partition count around your peak fan-out ratio, not your ingestion rate.

Designing a High-Throughput Notification System for 100K Events per Second