System Design: Designing WhatsApp (Real-time Messaging)
Building a chat application like WhatsApp or Facebook Messenger requires managing millions of persistent connections and ensuring that messages are delivered reliably with ultra-low latency.
1. Core Requirements
- One-to-One Chat: Real-time messaging between two users.
- Group Chat: Messaging in groups of up to 1000+ users.
- Message Status: Sent, Delivered, and Read receipts.
- Last Seen: Tracking user online/offline status (Presence).
- Media Support: Sending images, videos, and documents.
2. High-Level Architecture
The system relies on a Connection Layer and a Message Layer:
- Chat Service: Maintains persistent connections with clients.
- Presence Service: Tracks user status.
- Push Notification Service: For users who are currently offline.
- Media Service: Handles file uploads and downloads.
3. Persistent Connections: WebSockets
In traditional HTTP, a client must request data. For chat, we need a bi-directional connection so the server can "push" messages to the client instantly.
- The Solution: Use WebSockets. They keep a single TCP connection open, allowing for high-frequency, low-overhead data transfer.
- Scalability: A single server can handle around 65,000 to 1M concurrent WebSocket connections depending on the OS tuning.
4. Message Flow (The Life of a Message)
- User A sends a message to User B.
- Chat Server receives it and acknowledges it to User A (Sent receipt).
- The server checks if User B is online.
- If Online: The server pushes the message to User B via their active WebSocket.
- If Offline: The server stores the message in a Pending Queue (usually Cassandra) and triggers a Push Notification.
- When User B opens the app, they pull all pending messages.
5. Handling Group Chats
Group chats are more complex because one message must be delivered to many users.
- For Small Groups: The server simply iterates through all group members and sends the message to each.
- For Large Groups: Use a Fan-out approach. Store the message once and maintain a "read pointer" for each user in the group.
6. Presence Management (Last Seen)
Tracking the "online" status of millions of users is a high-write operation.
- The Optimization: Instead of updating the database on every heartbeat, use an in-memory store like Redis. If a user hasn't sent a heartbeat for 30 seconds, they are marked offline.
7. Database Selection
- Message Store: Cassandra is perfect due to its high write throughput and sequential storage of messages for a specific conversation.
- Metadata/Users: PostgreSQL or MongoDB.
- Presence/Cache: Redis.
Summary
The secret to WhatsApp's success is its extreme efficiency. By using WebSockets for real-time delivery and Cassandra for massive write volumes, you can build a messaging platform that scales to the entire world.
