Lesson 97 of 105 15 minFlagship

WebSocket and SSE for Real-Time Systems: Architecture and Production Patterns

Building real-time features at scale: WebSocket vs SSE trade-offs, Spring Boot WebSocket implementation with STOMP, connection management, horizontal scaling with Redis pub/sub, SSE for one-directional streaming, and handling reconnection and backpressure.

Reading Mode

Hide the curriculum rail and keep the lesson centered for focused reading.

Key Takeaways

  • Bidirectional communications require WebSocket stateful connections, whereas unidirectional server-push is better served via Server-Sent Events (SSE).
  • Horizontal scaling requires a message bus (like Redis Pub/Sub or Kafka) to route events between stateless backend instances.
  • Connection limits are governed by kernel file descriptors, memory per socket, and thundering herd storm control.
Recommended Prerequisites
System Design Interview Framework

Premium outcome

From vague architecture answers to staff-level trade-off thinking.

Backend engineers preparing for senior, staff, and architecture rounds.

What you unlock

  • A reusable system design answer framework for ambiguous prompts
  • Clear language for consistency, scaling, and reliability trade-offs
  • Case-study depth across feeds, payments, storage, and messaging systems

Real-time capabilities like push notifications, live telemetry, dynamic collaborative editors, and instant messaging have transitioned from luxury features to baseline user expectations. Historically, web architectures relied on polling mechanisms where the client repeatedly queried the server for updates. While simple, short polling and long polling introduce high latency, consume massive CPU cycles, and saturate network cards with empty HTTP headers.

Modern real-time systems utilize persistent, stateful TCP channels. Selecting between WebSocket and Server-Sent Events (SSE) is one of the most critical structural decisions a backend architect can make. Selecting the wrong protocol introduces massive, unnecessary complexity, load balancing bottlenecks, and operational overhead. This guide outlines the deep technical blueprints, scaling mechanics, and code frameworks required to build production-grade real-time systems.


Requirements & Core Constraints

To design a system that serves millions of concurrent users, we must establish rigorous operational parameters. Our target is a generalized real-time event distribution platform capable of supporting a high-density active user base.

1. Functional Constraints

  • Bidirectional Channel: Supporting interactive features (e.g., instant messaging and co-authoring) where the client sends messages and the server pushes updates concurrently.
  • Unidirectional Channel: Supporting broadcast-style features (e.g., live stock feeds and system notifications) with high-efficiency server-push capability.
  • Message Ordering: Strict sequence guarantees must be preserved per user session.
  • Latency Target: Server-push latency must remain under 50 milliseconds to ensure a live user experience.

2. Non-Functional SLAs

  • Concurrency: The platform must maintain 10 Million concurrent persistent connections simultaneously.
  • Throughput: Support an ingestion rate of 100,000 incoming messages per second, and a broadcast delivery rate of 500,000 events per second.
  • Target Availability: 99.999% connection uptime (less than 5.26 minutes of total unplanned downtime per year).
  • Graceful Degradation: The system must handle connection spikes without cascading crashes, utilizing socket shedding and adaptive backpressure.

3. Back-of-the-Envelope Estimates

Let us compute the memory and bandwidth constraints for maintaining 10 Million concurrent connections:

  • Memory Footprint: Each open socket requires kernel space memory (read/write buffers) and user-space memory (session trackers). Assuming a highly optimized footprint of 30 Kilobytes per connection (16 KB TCP read/write buffer + 14 KB application context): $$\text{Total Memory} = 10,000,000 \times 30,000 \text{ bytes} \approx 300 \text{ Gigabytes}$$ To run this safely with a 50% buffer, we require at least 450 Gigabytes of RAM across our gateway fleet.
  • Ingress Bandwidth: Assuming an average client message payload is 500 bytes: $$\text{Ingress Bandwidth} = 100,000 \text{ messages/sec} \times 500 \text{ bytes} = 50 \text{ Megabytes per second (400 Mbps)}$$
  • Egress Bandwidth: Assuming egress broadcast message size is 1 Kilobyte: $$\text{Egress Bandwidth} = 500,000 \text{ messages/sec} \times 1,000 \text{ bytes} = 500 \text{ Megabytes per second (4 Gbps)}$$
  • IP Port Availability: A single server IP can support a maximum of 65,535 outbound connections per destination IP due to port exhaustion limits. Therefore, we must deploy a minimum of 160 Gateway instances (each managing around 62,500 connections) and utilize multiple virtual IPs (VIPs) on our Load Balancer to bypass this limit.

API Design & Core Contracts

To facilitate standard protocol handshakes and STOMP message routing, we define production-ready API contracts.

1. HTTP to WebSocket Upgrade Handshake

The handshake starts as a standard HTTP GET request with upgrade headers:

GET /api/v1/realtime/connect HTTP/1.1
Host: realtime.codesprintpro.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Authorization: Bearer <JWT_TOKEN>

The server responds with a 101 Switching Protocols status code:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

2. STOMP Protocol Payload Schema

Once the persistent TCP socket is active, the client and server communicate using STOMP frames over the raw socket.

STOMP Connect Frame:

{
  "command": "CONNECT",
  "headers": {
    "accept-version": "1.2",
    "host": "realtime.codesprintpro.com",
    "heart-beat": "10000,10000"
  },
  "body": ""
}

STOMP Subscribe Frame (Client listening to private notifications):

{
  "command": "SUBSCRIBE",
  "headers": {
    "id": "sub-001",
    "destination": "/user/queue/notifications"
  },
  "body": ""
}

STOMP Send Message Frame (Client publishing chat event):

{
  "command": "SEND",
  "headers": {
    "destination": "/app/chat.sendMessage",
    "content-type": "application/json"
  },
  "body": {
    "recipientId": "user-8877",
    "messageType": "TEXT",
    "content": "Hello World! This is an interactive message."
  }
}

3. Server-Sent Events (SSE) Stream Subscription

For unidirectional server-push features, the client makes a standard HTTP GET request:

GET /api/v1/streams/events HTTP/1.1
Host: realtime.codesprintpro.com
Accept: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
Last-Event-ID: event-seq-9988
Authorization: Bearer <JWT_TOKEN>

The server keeps the HTTP channel open indefinitely, streaming data chunks formatted as plain text:

HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
Transfer-Encoding: chunked

event: message
id: event-seq-9989
retry: 5000
data: {"type": "STOCK_TICK", "symbol": "AAPL", "price": 182.45, "timestamp": 1716388450}

event: system-notification
id: event-seq-9990
data: {"type": "ALERT", "severity": "INFO", "message": "Scheduled database maintenance starting in 1 hour."}

High-Level Design (HLD)

The global architecture must route traffic from clients to a horizontally scaled set of gateway nodes while utilizing a central message bus to sync state.

1. Core Platform Architecture

The following diagram details the flow of messages through the system. Stateless client requests are routed through a Layer 4 (TCP-based) Load Balancer to a cluster of Stateful Connection Gateways.

graph TD
    subgraph Client Fleet
        C1[Mobile App - WS]
        C2[Web Browser - SSE]
        C3[Desktop Client - WS]
    end

    subgraph Edge Layer
        L4LB[Layer 4 Load Balancer / AWS NLB]
        APIGW[API Gateway / Envoy Proxy]
    end

    subgraph Real-Time Stateful Gateways
        GW1[Gateway Instance A - 62.5k connections]
        GW2[Gateway Instance B - 62.5k connections]
        GW3[Gateway Instance C - 62.5k connections]
    end

    subgraph Event & PubSub Bus
        RedisPS[(Redis Cluster Pub/Sub)]
        KafkaBroker[[Kafka Event Ingestion Bus]]
    end

    subgraph Internal Microservices
        ChatSvc[Chat Application Service]
        NotifSvc[Notification Broadcast Engine]
        DB[(PostgreSQL Primary DB)]
    end

    C1 -->|TCP Traffic| L4LB
    C2 -->|HTTP Keep-Alive| L4LB
    C3 -->|TCP Traffic| L4LB

    L4LB -->|Session Affinity| GW1
    L4LB -->|Session Affinity| GW2
    L4LB -->|Session Affinity| GW3

    GW1 <-->|Publish / Subscribe| RedisPS
    GW2 <-->|Publish / Subscribe| RedisPS
    GW3 <-->|Publish / Subscribe| RedisPS

    GW1 -->|Ingested User Events| KafkaBroker
    GW2 -->|Ingested User Events| KafkaBroker
    GW3 -->|Ingested User Events| KafkaBroker

    KafkaBroker -->|Stream Events| ChatSvc
    KafkaBroker -->|Stream Events| NotifSvc

    ChatSvc -->|Publish Message Delivery| RedisPS
    NotifSvc -->|Publish Broadcasts| RedisPS
    ChatSvc -->|Persist History| DB

2. Client Connection & Message Replay Flow

When a client reconnects (e.g., after losing cell signal), it passes its Last-Event-ID. The Gateway queries a Redis Cache or Message Store to replay missed messages to ensure zero data loss.

sequenceDiagram
    autonumber
    actor Client as Client Browser
    participant LB as L4 Load Balancer
    participant GW as Stateful Gateway Node
    participant Redis as Redis Cache / PubSub
    participant DB as PostgreSQL Store

    Client->>LB: GET /api/v1/streams/events (Last-Event-ID: seq-9988)
    LB->>GW: Route Connection to Gateway Node A
    GW->>Redis: Check if user session active & retrieve missed messages since seq-9988
    alt Missed Messages Found in Cache
        Redis-->>GW: Return events [seq-9989, seq-9990]
        GW-->>Client: Stream missed events immediately
    else Cache Miss / Cold Start
        GW->>DB: Query database for historic messages
        DB-->>GW: Return historical logs
        GW-->>Client: Stream historical events
    end
    Note over Client, GW: Persistent Connection Established
    GW->>Redis: Subscribe to Redis channel: user-notifications:user-123

Low-Level Design (LLD) & Data Models

A robust low-level design requires a concrete database schema to track user registrations, offline message queues, and active connection instances.

1. Database Schema (PostgreSQL DDL)

To store messages that are sent while a user is offline, we utilize a persistent transaction ledger.

-- Active session registry (for analytics & connection auditing)
CREATE TABLE active_connections (
    connection_id UUID PRIMARY KEY,
    user_id VARCHAR(128) NOT NULL,
    gateway_node_ip VARCHAR(45) NOT NULL, -- IPv4 or IPv6 format
    protocol VARCHAR(10) NOT NULL, -- 'WS' or 'SSE'
    connected_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    last_heartbeat TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Persistent Offline Message Queue
CREATE TABLE offline_messages (
    message_id UUID PRIMARY KEY,
    recipient_id VARCHAR(128) NOT NULL,
    sender_id VARCHAR(128) NOT NULL,
    payload JSONB NOT NULL,
    sequence_number BIGSERIAL NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_active_conns_user ON active_connections(user_id);
CREATE INDEX idx_offline_msg_recipient_seq ON offline_messages(recipient_id, sequence_number ASC);

2. Compilable Connection & Backpressure Manager

This thread-safe Java class utilizes virtual threads (introduced in JDK 21) to manage active SSE/WebSocket connections, enforce outbound backpressure buffers, and prevent slow consumers from consuming excessive heap memory.

package com.codesprintpro.realtime;

import java.io.IOException;
import java.util.Map;
import java.util.UUID;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.logging.Level;
import java.util.logging.Logger;

/**
 * Handles active real-time connections, offering bounded memory buffers
 * and backpressure mitigation for slow-consuming client sockets.
 */
public class ConnectionManager {

    private static final Logger LOGGER = Logger.getLogger(ConnectionManager.class.getName());
    private static final int MAX_BUFFER_CAPACITY = 200; // Drop events if queue exceeds this threshold

    // Registry of active client sessions
    private final Map<String, ClientSession> activeSessions = new ConcurrentHashMap<>();

    public void registerSession(String userId, String connectionId, SseOutputChannel channel) {
        ClientSession session = new ClientSession(userId, connectionId, channel);
        activeSessions.put(userId, session);
        LOGGER.log(Level.INFO, "Registered session {0} for user {1}", new Object[]{connectionId, userId});
    }

    public void unregisterSession(String userId) {
        ClientSession session = activeSessions.remove(userId);
        if (session != null) {
            session.close();
            LOGGER.log(Level.INFO, "Unregistered session for user {0}", userId);
        }
    }

    public void pushEvent(String userId, String payload) {
        ClientSession session = activeSessions.get(userId);
        if (session == null) {
            LOGGER.log(Level.WARNING, "User {0} is offline. Buffering to offline database.", userId);
            return;
        }

        boolean accepted = session.enqueueMessage(payload);
        if (!accepted) {
            LOGGER.log(Level.SEVERE, "Backpressure triggered! Buffer overflow for user {0}. Terminating connection to prevent heap memory exhaustion.", userId);
            unregisterSession(userId);
        }
    }

    public static class ClientSession {
        private final String userId;
        private final String connectionId;
        private final SseOutputChannel channel;
        private final BlockingQueue<String> messageQueue = new LinkedBlockingQueue<>(MAX_BUFFER_CAPACITY);
        private final AtomicBoolean isRunning = new AtomicBoolean(true);
        private final ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor();

        public ClientSession(String userId, String connectionId, SseOutputChannel channel) {
            this.userId = userId;
            this.connectionId = connectionId;
            this.channel = channel;
            // Begin asynchronous virtual thread draining loop
            executor.submit(this::drainQueue);
        }

        public boolean enqueueMessage(String message) {
            return messageQueue.offer(message); // Returns false if queue is full (Backpressure)
        }

        private void drainQueue() {
            try {
                while (isRunning.get() && !Thread.currentThread().isInterrupted()) {
                    String message = messageQueue.poll(5, TimeUnit.SECONDS);
                    if (message != null) {
                        channel.write(message);
                    }
                }
            } catch (InterruptedException e) {
                LOGGER.log(Level.WARNING, "Draining loop interrupted for connection {0}", connectionId);
                Thread.currentThread().interrupt();
            } catch (IOException e) {
                LOGGER.log(Level.WARNING, "Socket write failure for connection {0}: {1}", new Object[]{connectionId, e.getMessage()});
            } finally {
                close();
            }
        }

        public void close() {
            if (isRunning.compareAndSet(true, false)) {
                executor.shutdownNow();
                try {
                    channel.close();
                } catch (IOException e) {
                    LOGGER.log(Level.SEVERE, "Failed to close network socket channel", e);
                }
            }
        }
    }

    public interface SseOutputChannel {
        void write(String data) throws IOException;
        void close() throws IOException;
    }
}

Scaling Challenges & System Bottlenecks

1. File Descriptor Limits

In Unix-based operating systems, every network connection occupies a file descriptor (FD). By default, operating system configurations restrict file descriptors to 1,024 per process. For a server meant to manage 65,000 connections, this is an immediate bottleneck.

  • Solution: We must increase the limits in /etc/security/limits.conf:
    * soft nofile 1048576
    * hard nofile 1048576
    

2. TCP Buffer Sizing

By default, the operating system allocates 4 Kilobytes to 128 Kilobytes of memory per TCP buffer. For millions of idle connections, this allocates massive, unutilized heap buffers.

  • Solution: We must scale down kernel allocations for idle stateful servers using sysctl.conf:
    net.ipv4.tcp_rmem = 4096 87380 16777216
    net.ipv4.tcp_wmem = 4096 65536 16777216
    

3. Load Balancer Port Exhaustion

When a stateful gateway establishes downstream connections, it maps them using Source IP, Source Port, Destination IP, and Destination Port. Since there are only 65,535 usable source ports per destination IP, our Load Balancer can exhaust ports when routing traffic to a single gateway server.

  • Solution: Assign multiple Virtual IPs (VIPs) to each gateway instance. This multiplies port availability exponentially since each unique destination IP unlocks an additional 65,535 local ports.

Technical Trade-offs & Compromises

1. Stateful WebSocket vs. Stateless HTTP/2 SSE

This comparison highlights the fundamental operational trade-offs:

Factor WebSocket (RFC 6455) Server-Sent Events (SSE)
Protocol Flow Bi-directional TCP Framing Uni-directional Server-Push
Transport Protocol Raw TCP Sockets Standard HTTP/1.1 or HTTP/2
Load Balancing Complex (Requires Sticky Sessions / L4 Routing) Highly compatible with standard L7 Gateways
Proxy Compatibility Prone to blockages by corporate firewalls Highly transparent through standard CDNs
Client Reconnection Manual implementation required in SDK Automatic browser-native retry logic

2. Event Routing: Redis Pub/Sub vs. Kafka

  • Redis Pub/Sub (Selected for real-time routing): In-memory, sub-millisecond execution times. Redis is optimal because real-time message routing is ephemeral. Once a user receives a message, it can be safely purged from memory.
  • Kafka (Selected for persistent ingestion): High throughput with durable partition storage. We utilize Kafka for state-ingestion pipelines (e.g., chat histories and analytics) but avoid using it for client socket notifications to prevent massive disk read amplification.

Failure Scenarios & Operational Resiliency

1. Thundering Herd Storm Control

When a gateway server restarts, 62,500 clients disconnect simultaneously and immediately attempt to reconnect. This storm saturates downstreams, causing high CPU spikes and database lock contention.

  • Resiliency Plan: The client SDK must implement exponential backoff with jitter for reconnection: $$T_{\text{wait}} = \min(T_{\text{max}}, T_{\text{base}} \times 2^{\text{retry}} + \text{RandomJitter})$$ Additionally, our Load Balancer must actively rate limit new TCP handshakes (SYN packets) to 1,000 handshakes per second per instance.

2. Detecting Silent Dead Connections

Mobile clients routinely lose network coverage without clean socket closure. If the server keeps these dead connections open, it will leak memory and send events into empty sockets.

  • Resiliency Plan: Implement strict double heartbeats. The gateway sends ping frames every 30 seconds. If the client fails to respond with a pong frame within 15 seconds, the server immediately tears down the TCP socket and frees all allocated memory resources.

Candidate Verbal Script (Interview Guide)

Mock Interview Dialogue

Interviewer: “You need to design a system that pushes real-time stock alerts to 10 Million users. Tell me how you would design this, and how you would choose between WebSockets and Server-Sent Events (SSE).”

Candidate: “First, I will analyze the requirements. Stock alerts represent a classic unidirectional data flow: the server processes stock market feeds and pushes updates to the user. The user does not need to send messages back to the server over that same persistent socket.

Because of this unidirectional flow, I would select Server-Sent Events (SSE) running over HTTP/2 as our primary communication protocol. WebSocket is a powerful tool for bi-directional needs like chat or gaming. However, for unidirectional push notifications, SSE is much simpler to implement, works natively over standard HTTP/2, auto-reconnects automatically out-of-the-box, bypasses corporate firewalls that block raw TCP upgrades, and integrates easily with standard Layer 7 load balancers.”

Interviewer: “That makes sense. But how do you scale this horizontally? If a user is connected to Server A, and the stock processing service publishes an Apple stock alert on Server B, how does that event reach the user?”

Candidate: “We must implement an ephemeral, high-speed routing bus. I would deploy a Redis Cluster to act as our Pub/Sub layer. When a client establishes an SSE connection to a specific gateway server—say Gateway Node A—that node registers the session and immediately subscribes to a dedicated Redis channel corresponding to that user, such as user-notifications:user-123.

When the stock processing engine detects a stock split, it publishes the notification event to the specific user's Redis channel. Redis broadcasts the event to Gateway Node A. The node intercepts the event and streams it down the open HTTP/2 chunked socket to the client. This keeps our gateway servers completely stateless: any gateway can receive any request because Redis handles the routing layer.”

Interviewer: “What happens if a user is on a slow cellular connection? How do you prevent that single slow client from degrading the performance of your entire gateway?”

Candidate: “This is a critical backpressure problem. If a client consumes messages slowly, TCP window sizes shrink, and the write buffer on the server begins to accumulate packets. If we buffer unchecked, we will quickly run out of JVM heap memory, triggering Garbage Collection (GC) pauses and crashing the entire server.

To mitigate this, I would implement a Bounded Queue Backpressure Strategy within our gateway connection manager. Each connection session will have a thread-safe message queue with a strict maximum capacity—such as 200 messages. If the client’s network slows down and the queue fills up to 200, the server will intentionally drop subsequent events. If the queue remains saturated for more than a few seconds, the gateway will actively terminate the connection and close the socket. This sheds the load, protects server memory, and frees up system resources for healthy clients.”

Want to track your progress?

Sign in to save your progress, track completed lessons, and pick up where you left off.