System Design

System Design: Designing WhatsApp (Real-time Messaging)

How does WhatsApp handle billions of messages per day? A technical deep dive into WebSockets, XMPP, Message Persistence, and Presence Management.

Sachin Sarawgi·April 20, 2026·3 min read
#system-design#whatsapp#real-time-chat#websockets#distributed-systems#scalability

System Design: Designing WhatsApp (Real-time Messaging)

Building a chat application like WhatsApp or Facebook Messenger requires managing millions of persistent connections and ensuring that messages are delivered reliably with ultra-low latency.

1. Core Requirements

  • One-to-One Chat: Real-time messaging between two users.
  • Group Chat: Messaging in groups of up to 1000+ users.
  • Message Status: Sent, Delivered, and Read receipts.
  • Last Seen: Tracking user online/offline status (Presence).
  • Media Support: Sending images, videos, and documents.

2. High-Level Architecture

The system relies on a Connection Layer and a Message Layer:

  • Chat Service: Maintains persistent connections with clients.
  • Presence Service: Tracks user status.
  • Push Notification Service: For users who are currently offline.
  • Media Service: Handles file uploads and downloads.

3. Persistent Connections: WebSockets

In traditional HTTP, a client must request data. For chat, we need a bi-directional connection so the server can "push" messages to the client instantly.

  • The Solution: Use WebSockets. They keep a single TCP connection open, allowing for high-frequency, low-overhead data transfer.
  • Scalability: A single server can handle around 65,000 to 1M concurrent WebSocket connections depending on the OS tuning.

4. Message Flow (The Life of a Message)

  1. User A sends a message to User B.
  2. Chat Server receives it and acknowledges it to User A (Sent receipt).
  3. The server checks if User B is online.
  4. If Online: The server pushes the message to User B via their active WebSocket.
  5. If Offline: The server stores the message in a Pending Queue (usually Cassandra) and triggers a Push Notification.
  6. When User B opens the app, they pull all pending messages.

5. Handling Group Chats

Group chats are more complex because one message must be delivered to many users.

  • For Small Groups: The server simply iterates through all group members and sends the message to each.
  • For Large Groups: Use a Fan-out approach. Store the message once and maintain a "read pointer" for each user in the group.

6. Presence Management (Last Seen)

Tracking the "online" status of millions of users is a high-write operation.

  • The Optimization: Instead of updating the database on every heartbeat, use an in-memory store like Redis. If a user hasn't sent a heartbeat for 30 seconds, they are marked offline.

7. Database Selection

  • Message Store: Cassandra is perfect due to its high write throughput and sequential storage of messages for a specific conversation.
  • Metadata/Users: PostgreSQL or MongoDB.
  • Presence/Cache: Redis.

Summary

The secret to WhatsApp's success is its extreme efficiency. By using WebSockets for real-time delivery and Cassandra for massive write volumes, you can build a messaging platform that scales to the entire world.

📚

Recommended Resources

Designing Data-Intensive ApplicationsBest Seller

The definitive guide to building scalable, reliable distributed systems by Martin Kleppmann.

View on Amazon
Kafka: The Definitive GuideEditor's Pick

Real-time data and stream processing by Confluent engineers.

View on Amazon
Apache Kafka Series on Udemy

Hands-on Kafka course covering producers, consumers, Kafka Streams, and Connect.

View Course

Practical engineering notes

Get the next backend guide in your inbox

One useful note when a new deep dive is published: system design tradeoffs, Java production lessons, Kafka debugging, database patterns, and AI infrastructure.

No spam. Just practical notes you can use at work.

Sachin Sarawgi

Written by

Sachin Sarawgi

Engineering Manager and backend engineer with 10+ years building distributed systems across fintech, enterprise SaaS, and startups. CodeSprintPro is where I write practical guides on system design, Java, Kafka, databases, AI infrastructure, and production reliability.

Found this useful? Share it: