Real-time capabilities like push notifications, live telemetry, dynamic collaborative editors, and instant messaging have transitioned from luxury features to baseline user expectations. Historically, web architectures relied on polling mechanisms where the client repeatedly queried the server for updates. While simple, short polling and long polling introduce high latency, consume massive CPU cycles, and saturate network cards with empty HTTP headers.
Modern real-time systems utilize persistent, stateful TCP channels. Selecting between WebSocket and Server-Sent Events (SSE) is one of the most critical structural decisions a backend architect can make. Selecting the wrong protocol introduces massive, unnecessary complexity, load balancing bottlenecks, and operational overhead. This guide outlines the deep technical blueprints, scaling mechanics, and code frameworks required to build production-grade real-time systems.
Requirements & Core Constraints
To design a system that serves millions of concurrent users, we must establish rigorous operational parameters. Our target is a generalized real-time event distribution platform capable of supporting a high-density active user base.
1. Functional Constraints
- Bidirectional Channel: Supporting interactive features (e.g., instant messaging and co-authoring) where the client sends messages and the server pushes updates concurrently.
- Unidirectional Channel: Supporting broadcast-style features (e.g., live stock feeds and system notifications) with high-efficiency server-push capability.
- Message Ordering: Strict sequence guarantees must be preserved per user session.
- Latency Target: Server-push latency must remain under 50 milliseconds to ensure a live user experience.
2. Non-Functional SLAs
- Concurrency: The platform must maintain 10 Million concurrent persistent connections simultaneously.
- Throughput: Support an ingestion rate of 100,000 incoming messages per second, and a broadcast delivery rate of 500,000 events per second.
- Target Availability: 99.999% connection uptime (less than 5.26 minutes of total unplanned downtime per year).
- Graceful Degradation: The system must handle connection spikes without cascading crashes, utilizing socket shedding and adaptive backpressure.
3. Back-of-the-Envelope Estimates
Let us compute the memory and bandwidth constraints for maintaining 10 Million concurrent connections:
- Memory Footprint: Each open socket requires kernel space memory (read/write buffers) and user-space memory (session trackers). Assuming a highly optimized footprint of 30 Kilobytes per connection (16 KB TCP read/write buffer + 14 KB application context): $$\text{Total Memory} = 10,000,000 \times 30,000 \text{ bytes} \approx 300 \text{ Gigabytes}$$ To run this safely with a 50% buffer, we require at least 450 Gigabytes of RAM across our gateway fleet.
- Ingress Bandwidth: Assuming an average client message payload is 500 bytes: $$\text{Ingress Bandwidth} = 100,000 \text{ messages/sec} \times 500 \text{ bytes} = 50 \text{ Megabytes per second (400 Mbps)}$$
- Egress Bandwidth: Assuming egress broadcast message size is 1 Kilobyte: $$\text{Egress Bandwidth} = 500,000 \text{ messages/sec} \times 1,000 \text{ bytes} = 500 \text{ Megabytes per second (4 Gbps)}$$
- IP Port Availability: A single server IP can support a maximum of 65,535 outbound connections per destination IP due to port exhaustion limits. Therefore, we must deploy a minimum of 160 Gateway instances (each managing around 62,500 connections) and utilize multiple virtual IPs (VIPs) on our Load Balancer to bypass this limit.
API Design & Core Contracts
To facilitate standard protocol handshakes and STOMP message routing, we define production-ready API contracts.
1. HTTP to WebSocket Upgrade Handshake
The handshake starts as a standard HTTP GET request with upgrade headers:
GET /api/v1/realtime/connect HTTP/1.1
Host: realtime.codesprintpro.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Authorization: Bearer <JWT_TOKEN>
The server responds with a 101 Switching Protocols status code:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
2. STOMP Protocol Payload Schema
Once the persistent TCP socket is active, the client and server communicate using STOMP frames over the raw socket.
STOMP Connect Frame:
{
"command": "CONNECT",
"headers": {
"accept-version": "1.2",
"host": "realtime.codesprintpro.com",
"heart-beat": "10000,10000"
},
"body": ""
}
STOMP Subscribe Frame (Client listening to private notifications):
{
"command": "SUBSCRIBE",
"headers": {
"id": "sub-001",
"destination": "/user/queue/notifications"
},
"body": ""
}
STOMP Send Message Frame (Client publishing chat event):
{
"command": "SEND",
"headers": {
"destination": "/app/chat.sendMessage",
"content-type": "application/json"
},
"body": {
"recipientId": "user-8877",
"messageType": "TEXT",
"content": "Hello World! This is an interactive message."
}
}
3. Server-Sent Events (SSE) Stream Subscription
For unidirectional server-push features, the client makes a standard HTTP GET request:
GET /api/v1/streams/events HTTP/1.1
Host: realtime.codesprintpro.com
Accept: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
Last-Event-ID: event-seq-9988
Authorization: Bearer <JWT_TOKEN>
The server keeps the HTTP channel open indefinitely, streaming data chunks formatted as plain text:
HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
Transfer-Encoding: chunked
event: message
id: event-seq-9989
retry: 5000
data: {"type": "STOCK_TICK", "symbol": "AAPL", "price": 182.45, "timestamp": 1716388450}
event: system-notification
id: event-seq-9990
data: {"type": "ALERT", "severity": "INFO", "message": "Scheduled database maintenance starting in 1 hour."}
High-Level Design (HLD)
The global architecture must route traffic from clients to a horizontally scaled set of gateway nodes while utilizing a central message bus to sync state.
1. Core Platform Architecture
The following diagram details the flow of messages through the system. Stateless client requests are routed through a Layer 4 (TCP-based) Load Balancer to a cluster of Stateful Connection Gateways.
graph TD
subgraph Client Fleet
C1[Mobile App - WS]
C2[Web Browser - SSE]
C3[Desktop Client - WS]
end
subgraph Edge Layer
L4LB[Layer 4 Load Balancer / AWS NLB]
APIGW[API Gateway / Envoy Proxy]
end
subgraph Real-Time Stateful Gateways
GW1[Gateway Instance A - 62.5k connections]
GW2[Gateway Instance B - 62.5k connections]
GW3[Gateway Instance C - 62.5k connections]
end
subgraph Event & PubSub Bus
RedisPS[(Redis Cluster Pub/Sub)]
KafkaBroker[[Kafka Event Ingestion Bus]]
end
subgraph Internal Microservices
ChatSvc[Chat Application Service]
NotifSvc[Notification Broadcast Engine]
DB[(PostgreSQL Primary DB)]
end
C1 -->|TCP Traffic| L4LB
C2 -->|HTTP Keep-Alive| L4LB
C3 -->|TCP Traffic| L4LB
L4LB -->|Session Affinity| GW1
L4LB -->|Session Affinity| GW2
L4LB -->|Session Affinity| GW3
GW1 <-->|Publish / Subscribe| RedisPS
GW2 <-->|Publish / Subscribe| RedisPS
GW3 <-->|Publish / Subscribe| RedisPS
GW1 -->|Ingested User Events| KafkaBroker
GW2 -->|Ingested User Events| KafkaBroker
GW3 -->|Ingested User Events| KafkaBroker
KafkaBroker -->|Stream Events| ChatSvc
KafkaBroker -->|Stream Events| NotifSvc
ChatSvc -->|Publish Message Delivery| RedisPS
NotifSvc -->|Publish Broadcasts| RedisPS
ChatSvc -->|Persist History| DB
2. Client Connection & Message Replay Flow
When a client reconnects (e.g., after losing cell signal), it passes its Last-Event-ID. The Gateway queries a Redis Cache or Message Store to replay missed messages to ensure zero data loss.
sequenceDiagram
autonumber
actor Client as Client Browser
participant LB as L4 Load Balancer
participant GW as Stateful Gateway Node
participant Redis as Redis Cache / PubSub
participant DB as PostgreSQL Store
Client->>LB: GET /api/v1/streams/events (Last-Event-ID: seq-9988)
LB->>GW: Route Connection to Gateway Node A
GW->>Redis: Check if user session active & retrieve missed messages since seq-9988
alt Missed Messages Found in Cache
Redis-->>GW: Return events [seq-9989, seq-9990]
GW-->>Client: Stream missed events immediately
else Cache Miss / Cold Start
GW->>DB: Query database for historic messages
DB-->>GW: Return historical logs
GW-->>Client: Stream historical events
end
Note over Client, GW: Persistent Connection Established
GW->>Redis: Subscribe to Redis channel: user-notifications:user-123
Low-Level Design (LLD) & Data Models
A robust low-level design requires a concrete database schema to track user registrations, offline message queues, and active connection instances.
1. Database Schema (PostgreSQL DDL)
To store messages that are sent while a user is offline, we utilize a persistent transaction ledger.
-- Active session registry (for analytics & connection auditing)
CREATE TABLE active_connections (
connection_id UUID PRIMARY KEY,
user_id VARCHAR(128) NOT NULL,
gateway_node_ip VARCHAR(45) NOT NULL, -- IPv4 or IPv6 format
protocol VARCHAR(10) NOT NULL, -- 'WS' or 'SSE'
connected_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
last_heartbeat TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Persistent Offline Message Queue
CREATE TABLE offline_messages (
message_id UUID PRIMARY KEY,
recipient_id VARCHAR(128) NOT NULL,
sender_id VARCHAR(128) NOT NULL,
payload JSONB NOT NULL,
sequence_number BIGSERIAL NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_active_conns_user ON active_connections(user_id);
CREATE INDEX idx_offline_msg_recipient_seq ON offline_messages(recipient_id, sequence_number ASC);
2. Compilable Connection & Backpressure Manager
This thread-safe Java class utilizes virtual threads (introduced in JDK 21) to manage active SSE/WebSocket connections, enforce outbound backpressure buffers, and prevent slow consumers from consuming excessive heap memory.
package com.codesprintpro.realtime;
import java.io.IOException;
import java.util.Map;
import java.util.UUID;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.logging.Level;
import java.util.logging.Logger;
/**
* Handles active real-time connections, offering bounded memory buffers
* and backpressure mitigation for slow-consuming client sockets.
*/
public class ConnectionManager {
private static final Logger LOGGER = Logger.getLogger(ConnectionManager.class.getName());
private static final int MAX_BUFFER_CAPACITY = 200; // Drop events if queue exceeds this threshold
// Registry of active client sessions
private final Map<String, ClientSession> activeSessions = new ConcurrentHashMap<>();
public void registerSession(String userId, String connectionId, SseOutputChannel channel) {
ClientSession session = new ClientSession(userId, connectionId, channel);
activeSessions.put(userId, session);
LOGGER.log(Level.INFO, "Registered session {0} for user {1}", new Object[]{connectionId, userId});
}
public void unregisterSession(String userId) {
ClientSession session = activeSessions.remove(userId);
if (session != null) {
session.close();
LOGGER.log(Level.INFO, "Unregistered session for user {0}", userId);
}
}
public void pushEvent(String userId, String payload) {
ClientSession session = activeSessions.get(userId);
if (session == null) {
LOGGER.log(Level.WARNING, "User {0} is offline. Buffering to offline database.", userId);
return;
}
boolean accepted = session.enqueueMessage(payload);
if (!accepted) {
LOGGER.log(Level.SEVERE, "Backpressure triggered! Buffer overflow for user {0}. Terminating connection to prevent heap memory exhaustion.", userId);
unregisterSession(userId);
}
}
public static class ClientSession {
private final String userId;
private final String connectionId;
private final SseOutputChannel channel;
private final BlockingQueue<String> messageQueue = new LinkedBlockingQueue<>(MAX_BUFFER_CAPACITY);
private final AtomicBoolean isRunning = new AtomicBoolean(true);
private final ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor();
public ClientSession(String userId, String connectionId, SseOutputChannel channel) {
this.userId = userId;
this.connectionId = connectionId;
this.channel = channel;
// Begin asynchronous virtual thread draining loop
executor.submit(this::drainQueue);
}
public boolean enqueueMessage(String message) {
return messageQueue.offer(message); // Returns false if queue is full (Backpressure)
}
private void drainQueue() {
try {
while (isRunning.get() && !Thread.currentThread().isInterrupted()) {
String message = messageQueue.poll(5, TimeUnit.SECONDS);
if (message != null) {
channel.write(message);
}
}
} catch (InterruptedException e) {
LOGGER.log(Level.WARNING, "Draining loop interrupted for connection {0}", connectionId);
Thread.currentThread().interrupt();
} catch (IOException e) {
LOGGER.log(Level.WARNING, "Socket write failure for connection {0}: {1}", new Object[]{connectionId, e.getMessage()});
} finally {
close();
}
}
public void close() {
if (isRunning.compareAndSet(true, false)) {
executor.shutdownNow();
try {
channel.close();
} catch (IOException e) {
LOGGER.log(Level.SEVERE, "Failed to close network socket channel", e);
}
}
}
}
public interface SseOutputChannel {
void write(String data) throws IOException;
void close() throws IOException;
}
}
Scaling Challenges & System Bottlenecks
1. File Descriptor Limits
In Unix-based operating systems, every network connection occupies a file descriptor (FD). By default, operating system configurations restrict file descriptors to 1,024 per process. For a server meant to manage 65,000 connections, this is an immediate bottleneck.
- Solution: We must increase the limits in
/etc/security/limits.conf:* soft nofile 1048576 * hard nofile 1048576
2. TCP Buffer Sizing
By default, the operating system allocates 4 Kilobytes to 128 Kilobytes of memory per TCP buffer. For millions of idle connections, this allocates massive, unutilized heap buffers.
- Solution: We must scale down kernel allocations for idle stateful servers using
sysctl.conf:net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216
3. Load Balancer Port Exhaustion
When a stateful gateway establishes downstream connections, it maps them using Source IP, Source Port, Destination IP, and Destination Port. Since there are only 65,535 usable source ports per destination IP, our Load Balancer can exhaust ports when routing traffic to a single gateway server.
- Solution: Assign multiple Virtual IPs (VIPs) to each gateway instance. This multiplies port availability exponentially since each unique destination IP unlocks an additional 65,535 local ports.
Technical Trade-offs & Compromises
1. Stateful WebSocket vs. Stateless HTTP/2 SSE
This comparison highlights the fundamental operational trade-offs:
| Factor | WebSocket (RFC 6455) | Server-Sent Events (SSE) |
|---|---|---|
| Protocol Flow | Bi-directional TCP Framing | Uni-directional Server-Push |
| Transport Protocol | Raw TCP Sockets | Standard HTTP/1.1 or HTTP/2 |
| Load Balancing | Complex (Requires Sticky Sessions / L4 Routing) | Highly compatible with standard L7 Gateways |
| Proxy Compatibility | Prone to blockages by corporate firewalls | Highly transparent through standard CDNs |
| Client Reconnection | Manual implementation required in SDK | Automatic browser-native retry logic |
2. Event Routing: Redis Pub/Sub vs. Kafka
- Redis Pub/Sub (Selected for real-time routing): In-memory, sub-millisecond execution times. Redis is optimal because real-time message routing is ephemeral. Once a user receives a message, it can be safely purged from memory.
- Kafka (Selected for persistent ingestion): High throughput with durable partition storage. We utilize Kafka for state-ingestion pipelines (e.g., chat histories and analytics) but avoid using it for client socket notifications to prevent massive disk read amplification.
Failure Scenarios & Operational Resiliency
1. Thundering Herd Storm Control
When a gateway server restarts, 62,500 clients disconnect simultaneously and immediately attempt to reconnect. This storm saturates downstreams, causing high CPU spikes and database lock contention.
- Resiliency Plan: The client SDK must implement exponential backoff with jitter for reconnection:
$$T_{\text{wait}} = \min(T_{\text{max}}, T_{\text{base}} \times 2^{\text{retry}} + \text{RandomJitter})$$
Additionally, our Load Balancer must actively rate limit new TCP handshakes (
SYNpackets) to 1,000 handshakes per second per instance.
2. Detecting Silent Dead Connections
Mobile clients routinely lose network coverage without clean socket closure. If the server keeps these dead connections open, it will leak memory and send events into empty sockets.
- Resiliency Plan: Implement strict double heartbeats. The gateway sends ping frames every 30 seconds. If the client fails to respond with a pong frame within 15 seconds, the server immediately tears down the TCP socket and frees all allocated memory resources.
Candidate Verbal Script (Interview Guide)
Mock Interview Dialogue
Interviewer: “You need to design a system that pushes real-time stock alerts to 10 Million users. Tell me how you would design this, and how you would choose between WebSockets and Server-Sent Events (SSE).”
Candidate: “First, I will analyze the requirements. Stock alerts represent a classic unidirectional data flow: the server processes stock market feeds and pushes updates to the user. The user does not need to send messages back to the server over that same persistent socket.
Because of this unidirectional flow, I would select Server-Sent Events (SSE) running over HTTP/2 as our primary communication protocol. WebSocket is a powerful tool for bi-directional needs like chat or gaming. However, for unidirectional push notifications, SSE is much simpler to implement, works natively over standard HTTP/2, auto-reconnects automatically out-of-the-box, bypasses corporate firewalls that block raw TCP upgrades, and integrates easily with standard Layer 7 load balancers.”
Interviewer: “That makes sense. But how do you scale this horizontally? If a user is connected to Server A, and the stock processing service publishes an Apple stock alert on Server B, how does that event reach the user?”
Candidate: “We must implement an ephemeral, high-speed routing bus. I would deploy a Redis Cluster to act as our Pub/Sub layer. When a client establishes an SSE connection to a specific gateway server—say Gateway Node A—that node registers the session and immediately subscribes to a dedicated Redis channel corresponding to that user, such as user-notifications:user-123.
When the stock processing engine detects a stock split, it publishes the notification event to the specific user's Redis channel. Redis broadcasts the event to Gateway Node A. The node intercepts the event and streams it down the open HTTP/2 chunked socket to the client. This keeps our gateway servers completely stateless: any gateway can receive any request because Redis handles the routing layer.”
Interviewer: “What happens if a user is on a slow cellular connection? How do you prevent that single slow client from degrading the performance of your entire gateway?”
Candidate: “This is a critical backpressure problem. If a client consumes messages slowly, TCP window sizes shrink, and the write buffer on the server begins to accumulate packets. If we buffer unchecked, we will quickly run out of JVM heap memory, triggering Garbage Collection (GC) pauses and crashing the entire server.
To mitigate this, I would implement a Bounded Queue Backpressure Strategy within our gateway connection manager. Each connection session will have a thread-safe message queue with a strict maximum capacity—such as 200 messages. If the client’s network slows down and the queue fills up to 200, the server will intentionally drop subsequent events. If the queue remains saturated for more than a few seconds, the gateway will actively terminate the connection and close the socket. This sheds the load, protects server memory, and frees up system resources for healthy clients.”