System Design: Designing a Pub/Sub Messaging Platform
A Pub/Sub (Publish/Subscribe) system is a fundamental pattern for decoupling services. It allows producers to send messages without knowing who the consumers are, enabling highly flexible and asynchronous architectures. Designing a system like Google Cloud Pub/Sub at scale is an advanced architectural challenge.
1. Core Concepts
- Topic: A named resource to which messages are sent.
- Subscription: A named resource representing the stream of messages from a topic.
- Publisher: Sends messages to a topic.
- Subscriber: Receives messages from a subscription.
2. Decoupling and Scalability
Unlike a message queue where each message is delivered to only one consumer, a Pub/Sub system delivers a copy of each message to every subscription. This requires the system to maintain a separate "read pointer" for every subscription.
3. High-Level Architecture
- API/Frontend Service: Authenticates requests and routes them to the correct topic/subscription.
- Metadata Service: Manages topic/subscription configurations (stored in Zookeeper or etcd).
- Storage Service: A persistent, distributed log service (like a simplified Kafka or Pulsar).
- Delivery Service: Pushes messages to subscribers (via Webhooks, gRPC streams, or long polling).
4. Message Delivery Semantics
- At-Least-Once Delivery: The default for most systems. The server waits for an from the subscriber. If none is received within a timeout, the message is redelivered.
- Exactly-Once Delivery: Significantly harder to achieve. It requires coordination between the server's state (which message was sent) and the client's state (which message was processed). This is usually implemented using Idempotency Keys and persistent state in the subscriber.
5. Storage and Retention
- Retention: Messages are usually kept for a fixed time (e.g., 7 days) or until acknowledged by all subscriptions.
- Deduplication: The server should maintain a short-term cache of message IDs (using a Bloom Filter) to discard duplicate messages sent by misbehaving producers.
6. The "Backlog" Problem
If a subscriber is much slower than the publisher, the system must buffer millions of messages.
- Solution: Use Backpressure. The delivery service can throttle the flow or "drop" older messages if the subscriber is too far behind (if the business use case allows it).
Summary
Building a Pub/Sub platform is an exercise in Asynchronous Coordination. By leveraging a persistent log backend, decoupled delivery agents, and clear delivery semantics, you can provide a reliable foundation for event-driven systems at any scale.
