Case Study: Designing WhatsApp (Real-time Messaging at Scale)

Real-time instant messaging platforms like WhatsApp, WeChat, or Discord represent one of the most demanding networking and data storage challenges in computer science. These systems must support hundreds of millions of concurrent socket connections, guarantee low-latency delivery, order messages strictly, and persist data at a rate of petabytes per month.

This case study details the system design of a global real-time chat platform capable of scaling to 500M Daily Active Users (DAU) sending 50 billion messages daily, maintaining sub-100ms end-to-end latency.

1. Requirements & Core Constraints

Functional Requirements

1-on-1 Real-time Chat: Low-latency, full-duplex messaging between pairs of clients.
Delivery & Read Receipts: Real-time delivery ticks (sent, delivered, read status).
Presence Management: Real-time "online/last seen" status indicator.
Group Chat Operations: Message delivery to small/medium group sizes (up to 1,000 members) with strict order guarantees.

Non-Functional Requirements (SLAs)

Sub-100ms Latency: Total transit time of a message between two online users must remain under 100ms (P99).
High Concurrency: The system must sustain up to 50 million concurrent persistent WebSocket connections globally.
Strict Ordering: Messages within a specific conversation must be rendered in chronological sequence.
Resilient Durability: Zero message loss. Undelivered messages must reside in persistent storage until successfully flushed.

Back-of-the-Envelope Estimation (500M DAU)

Total Daily Messages: $50,000,000,000$ messages/day
Average Message QPS: $$\text{Average QPS} = \frac{50,000,000,000}{86400} \approx 578,700 \text{ QPS}$$
Peak Messaging QPS (3x Factor): $$\text{Peak QPS} = 578,700 \times 3 \approx 1,736,100 \text{ QPS}$$
WebSocket Server Footprint:
- A standard Linux virtual server optimized for high network I/O can support approximately $1,000,000$ concurrent TCP connections.
- Active Connections: 50,000,000 peak concurrent connections.
- Servers Required: $$\text{Servers} = \frac{50,000,000}{1,000,000} = 50 \text{ optimized socket nodes}$$
- To account for redundancy, connection spikes, and multi-region deployment, we target a 100-node Gateway Cluster.
Storage Sizing (Cassandra):
- Average message payload: 128 bytes of text + metadata.
- Daily Storage volume: $$\text{Storage/Day} = 50 \times 10^9 \text{ messages} \times 128 \text{ bytes} \approx 6.4 \text{ TB/day}$$
- Yearly raw storage (no replication): $$\text{Storage/Year} = 6.4 \text{ TB} \times 365 \approx 2.33 \text{ PB}$$
- With a 3x replication factor, this demands approximately 7 PB/year of physical disk capacity.

2. API Design & Core Contracts

Since HTTP adds considerable handshake overhead, real-time messaging is handled primarily through long-lived full-duplex WebSocket connections. However, traditional HTTPS endpoints are retained for metadata, group management, and user configuration.

HTTP Schema: Create Group Conversation

Endpoint: POST /v1/conversations/groups

Request Payload:

{
  "group_name": "Distributed Systems Mastery",
  "member_user_ids": [
    "usr_9912a",
    "usr_8830b",
    "usr_7721c"
  ]
}

Response Payload (HTTP 201 Created):

{
  "conversation_id": "group_conv_8892f3a",
  "created_at": 1774312860,
  "status": "active"
}

WebSocket Frame Schema: Send Real-Time Message

When connected to the Gateway, clients submit compact JSON frames:

Target Action: SEND_MESSAGE

Payload:

{
  "message_id": "msg_f89b1c20-a192",
  "conversation_id": "conv_usr_12_usr_99",
  "sender_id": "usr_12",
  "receiver_id": "usr_99",
  "content_type": "text",
  "payload": "Hi, are you online?",
  "timestamp": 1774312860
}

3. High-Level Design (HLD)

The architecture splits real-time persistent connections (Stateful Gateway Zone) from microservices and backing storage engines (Stateless Processing Zone).

graph TD
    ClientA[Client A Browser / App] -->|1. WebSocket Connection| WS1[WebSocket Gateway Node 1]
    ClientB[Client B Browser / App] -->|8. Push Message| WS2[WebSocket Gateway Node 2]
    
    subgraph Stateful Gateway Cluster
        WS1
        WS2
    end
    
    subgraph Connection Registry
        WS1 -->|2. Register Session| RedisRegistry[(Redis Session Store)]
        WS2 -->|Register Session| RedisRegistry
    end
    
    subgraph Core Processing Zone
        WS1 -->|3. Route Message| MsgBroker[Kafka Messaging Cluster]
        MsgBroker -->|4. Push Event| MsgProcessor[Message Processing Service]
        
        MsgProcessor -->|5. Persist| Cassandra[(ScyllaDB / Cassandra Message Store)]
        MsgProcessor -->|6. Query Session| RedisRegistry
        
        MsgProcessor -->|7. Route to Node 2| WS2
    end
    
    subgraph Metadata & Presence Zone
        ClientA -->|Heartbeat Pin| PresenceWS[Presence Service WebSockets]
        PresenceWS -->|Update Status| RedisPresence[(Redis Presence Store)]
    end
    
    style Stateful Gateway Cluster fill:#1e40af,stroke:#fff,stroke-width:2px,color:#fff
    style Connection Registry fill:#047857,stroke:#fff,stroke-width:2px,color:#fff
    style Cassandra fill:#b91c1c,stroke:#fff,stroke-width:2px,color:#fff

End-to-End Delivery Flow:

Client A Sends Message: Client A pushes a JSON frame containing msg_f89 over its persistent WebSocket connection to WebSocket Gateway Node 1.
Session Registration: The gateway queries the Redis Session Store to record the connection's node mapping.
Kafka Buffer Routing: Gateway Node 1 writes the raw event directly to a partitioned Kafka Messaging Cluster, separating logs by conversation_id to guarantee ordering.
Message Processor Consumer: A consumer process pulls the message from Kafka, performs spam checks, and writes the persistent record directly into Cassandra.
Session Routing: The processor queries Redis to find Client B's location. Redis shows Client B is connected to Gateway Node 2. The processor forwards the message to Gateway Node 2.
Active Delivery: Gateway Node 2 pushes the message down the socket to Client B's active session.

4. Low-Level Design (LLD) & Data Models

1. Database Selection Rationale: Cassandra vs. Relational

High Write Volume: Instant messaging systems generate massive time-series write operations.
LSM-Tree Rationale: Cassandra and ScyllaDB utilize LSM-Tree (Log-Structured Merge-tree) storage engines. LSM-trees convert random disk writes into high-speed sequential appends in memory (SSTables), providing far higher write performance than B-Trees (standard SQL DBs).
Wide-Row Time Series: Cassandra partitions data naturally by conversation_id, ordering records by timestamp on disk, which matches chat history querying patterns.

2. Cassandra DDL Schema

-- Represents persistent chat logs
CREATE KEYSPACE chat_platform WITH replication = {
    'class': 'NetworkTopologyStrategy', 
    'us-east': 3, 
    'eu-west': 3
};

USE chat_platform;

-- Persists message history per unique conversation
CREATE TABLE conversation_messages (
    conversation_id text,
    message_time timestamp,
    message_id uuid,
    sender_id text,
    content text,
    content_type text,
    delivery_status text, -- 'SENT', 'DELIVERED', 'READ'
    PRIMARY KEY (conversation_id, message_time)
) WITH CLUSTERING ORDER BY (message_time DESC);

-- Offline message delivery queue (stores messages until target comes online)
CREATE TABLE offline_delivery_queue (
    receiver_id text,
    message_id uuid,
    conversation_id text,
    sender_id text,
    content text,
    message_time timestamp,
    PRIMARY KEY (receiver_id, message_id)
);

5. Scaling Challenges & System Bottlenecks

1. Scaling the Stateful Connection Tier

Traditional web servers spin up one OS thread per HTTP connection. However, spinning up 50 million OS threads is physically impossible on standard system kernels.

The Solution: Build stateful socket gateways using non-blocking I/O multiplexing (epoll on Linux) using light-weight user-space green-threads (Erlang/BEAM processes, Go goroutines, or Node event loops).
Optimizing memory allocations per socket connection is critical. Reducing buffer sizes from 64KB to 4KB allows a 16GB memory server to host up to 1 million sockets concurrently.

2. Presence Heartbeat Congestion (The $O(N^2)$ Problem)

If 1 billion users check-in every 10 seconds via pings, the database would face 100M QPS of presence updates, crashing the network.

The Solution: Limit status updates using a subscriber architecture instead of a global broadcast:
- The system stores status values in localized Redis aggregates.
- When User A opens a chat pane with User B, the client establishes an active subscription channel.
- Presence is only pushed to active conversation subscribers rather than a user's entire contact list.

sequenceDiagram
    autonumber
    Client A->>Gateway Node 1: WebSocket Frame: "Hi B"
    Gateway Node 1-->>Client A: HTTP ACK (Status: SENT)
    Gateway Node 1->>Kafka Broker: Push Message Event
    Kafka Broker->>Message Processor: Consume Message
    Message Processor->>Cassandra DB: Write Message Record
    Message Processor->>Redis Registry: Query Client B Location
    alt Client B is Online
        Redis Registry-->>Message Processor: Returns "Gateway Node 2"
        Message Processor->>Gateway Node 2: Forward Message
        Gateway Node 2->>Client B: WebSocket Push
        Client B-->>Gateway Node 2: Delivery ACK
        Gateway Node 2->>Message Processor: Update Status to DELIVERED
        Message Processor->>Cassandra DB: UPDATE delivery_status
        Message Processor->>Gateway Node 1: Forward ACK to Node 1
        Gateway Node 1->>Client A: WebSocket Push (Double Tick)
    else Client B is Offline
        Redis Registry-->>Message Processor: Offline
        Message Processor->>Cassandra DB: Write to offline_delivery_queue
    end

6. Resilience & Failure Scenarios

1. Packet Losses & Reliable Delivery ACKs

In mobile environments, connections frequently drop. If the socket reports successful writes but the phone actually has no service, messages will vanish.

Resilience Plan: Implement client-to-server and server-to-client Application-Level ACKs:
- The sender assigns a local sequence key to the message.
- The Gateway processes the message, registers it in the DB, and sends a SENT_ACK back to the sender. The sender shows a single check-mark.
- Once the receiver confirms receiving the frame, it sends a DELIVERED_ACK back to the gateway.
- The server updates the database status and pushes a double-tick update to the sender.

2. WebSocket Node Crashes & Mass Reconnections

If a WebSocket gateway server hosting 1 million active users crashes, all 1 million clients will immediately attempt to reconnect simultaneously, triggering a massive thundering herd event that can easily crash load balancers and database backends.

Resilience Plan: Configure clients to use Exponential Backoff with Jitter when reconnecting. Use Anycast DNS routers to shift connection profiles dynamically across other surviving nodes.

7. Staff Engineer Perspective & Key Technical Trade-offs

1. WebSockets vs. HTTP/2 Server-Sent Events (SSE)

WebSockets (Full-Duplex):
- Pros: Symmetric communication. Low overhead for bidirectional messaging.
- Cons: Stateful, bypasses traditional HTTP firewalls, requires custom load-balancing strategies.
SSE (HTTP/2 Multiplexed):
- Pros: Uni-directional push from server to client. Uses HTTP/2 channels, resolving proxy blockages.
- Cons: Clients must execute separate REST posts to submit messages, causing high client connection overhead.
Trade-off Decision: We select WebSockets (or custom TCP/XMPP) for mobile messaging apps. WebSockets avoid HTTP header overhead, saving precious battery and data consumption on mobile devices.

8. Candidate Verbal Mock Interview Script

Interviewer: "You chose Cassandra to store messages. How do you handle group messaging where one user sends a message to a group of 1,000 members? Do you write 1,000 separate records to Cassandra?"

Candidate: "This is a classic 'Write Fan-Out' vs 'Read Fan-Out' trade-off decision.

For groups up to 1,000 members, writing 1,000 separate records (write fan-out) is extremely expensive and wastes database disk space. Instead, we implement a Read Fan-Out with Group Inbox pattern.

Instead of duplicating the message for every group member on write, we write a single record into our group_messages table:

CREATE TABLE group_messages (
    group_id text,
    message_time timestamp,
    message_id uuid,
    sender_id text,
    content text,
    PRIMARY KEY (group_id, message_time)
);

When a user opens a group chat, they query the single partition for group_id directly, fetching the conversation thread efficiently in chronological order on disk.

To handle offline indicators (which messages are read by whom), we maintain a tiny metadata registry in a separate wide-column table:

CREATE TABLE group_read_status (
    group_id text,
    user_id text,
    last_read_message_time timestamp,
    PRIMARY KEY (group_id, user_id)
);

By storing a single message per group and tracking user offsets, we achieve near-instantaneous writes while keeping database read queries optimized under our < 100ms SLA."

System Design: Designing WhatsApp (Real-time Messaging at Scale)

Bridge the gap between architecture diagrams and implementation details.