System DesignAdvancedarticle

System Design: Designing a Pub/Sub Messaging Platform

How to design a scalable Pub/Sub system like Google Cloud Pub/Sub. Deep dive into topics, subscriptions, message persistence, and exactly-once delivery.

Sachin SarawgiApril 20, 20262 min read2 minute lesson

System Design: Designing a Pub/Sub Messaging Platform

A Pub/Sub (Publish/Subscribe) system is a fundamental pattern for decoupling services. It allows producers to send messages without knowing who the consumers are, enabling highly flexible and asynchronous architectures. Designing a system like Google Cloud Pub/Sub at scale is an advanced architectural challenge.

1. Core Concepts

  • Topic: A named resource to which messages are sent.
  • Subscription: A named resource representing the stream of messages from a topic.
  • Publisher: Sends messages to a topic.
  • Subscriber: Receives messages from a subscription.

2. Decoupling and Scalability

Unlike a message queue where each message is delivered to only one consumer, a Pub/Sub system delivers a copy of each message to every subscription. This requires the system to maintain a separate "read pointer" for every subscription.

3. High-Level Architecture

  • API/Frontend Service: Authenticates requests and routes them to the correct topic/subscription.
  • Metadata Service: Manages topic/subscription configurations (stored in Zookeeper or etcd).
  • Storage Service: A persistent, distributed log service (like a simplified Kafka or Pulsar).
  • Delivery Service: Pushes messages to subscribers (via Webhooks, gRPC streams, or long polling).

4. Message Delivery Semantics

  • At-Least-Once Delivery: The default for most systems. The server waits for an from the subscriber. If none is received within a timeout, the message is redelivered.
  • Exactly-Once Delivery: Significantly harder to achieve. It requires coordination between the server's state (which message was sent) and the client's state (which message was processed). This is usually implemented using Idempotency Keys and persistent state in the subscriber.

5. Storage and Retention

  • Retention: Messages are usually kept for a fixed time (e.g., 7 days) or until acknowledged by all subscriptions.
  • Deduplication: The server should maintain a short-term cache of message IDs (using a Bloom Filter) to discard duplicate messages sent by misbehaving producers.

6. The "Backlog" Problem

If a subscriber is much slower than the publisher, the system must buffer millions of messages.

  • Solution: Use Backpressure. The delivery service can throttle the flow or "drop" older messages if the subscriber is too far behind (if the business use case allows it).

Summary

Building a Pub/Sub platform is an exercise in Asynchronous Coordination. By leveraging a persistent log backend, decoupled delivery agents, and clear delivery semantics, you can provide a reliable foundation for event-driven systems at any scale.

📚

Recommended Resources

Designing Data-Intensive ApplicationsBest Seller

The definitive guide to building scalable, reliable distributed systems by Martin Kleppmann.

View on Amazon
Kafka: The Definitive GuideEditor's Pick

Real-time data and stream processing by Confluent engineers.

View on Amazon
Apache Kafka Series on Udemy

Hands-on Kafka course covering producers, consumers, Kafka Streams, and Connect.

View Course

Practical engineering notes

Get the next backend guide in your inbox

One useful note when a new deep dive is published: system design tradeoffs, Java production lessons, Kafka debugging, database patterns, and AI infrastructure.

No spam. Just practical notes you can use at work.

Sachin Sarawgi

Written by

Sachin Sarawgi

Engineering Manager and backend engineer with 10+ years building distributed systems across fintech, enterprise SaaS, and startups. CodeSprintPro is where I write practical guides on system design, Java, Kafka, databases, AI infrastructure, and production reliability.

Keep Learning

Move through the archive without losing the thread.

Related Articles

More deep dives chosen from shared tags, category overlap, and reading difficulty.

System DesignAdvanced

System Design: Designing Airbnb (Hotel/Home Booking)

System Design: Designing Airbnb (Hotel/Home Booking) Designing a platform like Airbnb or Booking.com involves two distinct technical challenges: Search (helping users find the perfect place) and Concurrency (ensuring tha…

Apr 20, 20263 min read
Deep Dive
#system-design#airbnb#booking-system
System DesignAdvanced

System Design: Designing a Distributed BLOB Store (like S3/GCS)

System Design: Designing a Distributed BLOB Store An object store (BLOB store) is a fundamental building block of cloud infrastructure. Unlike a file system, it provides a simple interface (PUT, GET, DELETE) to store lar…

Apr 20, 20262 min read
Deep Dive
#system-design#object-storage#distributed-systems
System DesignAdvanced

System Design: Designing a Distributed Logging System (TB/Day Scale)

System Design: Designing a Distributed Logging System In a microservices architecture with thousands of containers, logs are scattered everywhere. You need a centralized system that can ingest terabytes of log data every…

Apr 20, 20263 min read
Deep Dive
#system-design#logging#elk-stack
System DesignAdvanced

System Design: Designing a Distributed Message Queue (Kafka Architecture)

System Design: Designing a Distributed Message Queue A Distributed Message Queue is the backbone of modern asynchronous architecture. It allows services to communicate without being tightly coupled. While many use Apache…

Apr 20, 20263 min read
Deep Dive
#system-design#kafka#message-queue

More in System Design

Category-based suggestions if you want to stay in the same domain.