Backpressure Propagation

When your database is slow, your worker is slow. When your worker is slow, your Kafka consumer lags. When Kafka lags, your producer buffer fills up. Backpressure is the signal that propagates this state upstream so you don't overwhelm the system.

1. TCP-Level vs. App-Level

TCP: Default. If the buffer is full, the OS stops reading from the socket.
Application: You must explicitly send a "Server Busy" (503/429) signal to upstream services.

2. Reactive Streams

Using libraries like Project Reactor or Akka Streams, you can implement a demand-based flow. The consumer asks for exactly N messages, ensuring it is never fed more than it can handle.

3. Backpressure must cross service boundaries

Many teams implement backpressure inside one process but lose control between services.
Real resilience requires propagation through every layer:

DB pool saturation -> worker concurrency reduction
worker lag -> broker consumer pause or reduced poll volume
queue depth growth -> upstream rate limiting
API pressure -> client-visible 429/503 with retry hints

If any boundary ignores pressure, the system shifts failure rather than absorbing it.

4. Synchronous call chain patterns

For request/response microservices:

set strict per-hop timeouts
cap concurrent in-flight requests
use bounded queues (avoid infinite buffering)
shed non-critical features first

Infinite queueing hides overload until latency collapse becomes broad outage.

5. Async pipeline patterns (Kafka/SQS)

For event-driven systems:

dynamic consumer concurrency based on downstream health
pause/resume partitions when processing backlog crosses thresholds
dead-letter poison messages quickly
differentiate retryable vs non-retryable failures

Throughput goals should never exceed safe downstream processing capacity.

6. Backpressure and priority

Not all workloads are equal.
Introduce priority classes:

Tier 0: payments/login/core writes
Tier 1: standard business operations
Tier 2: analytics/enrichment/non-critical jobs

During overload, shed Tier 2 first, then Tier 1, while preserving Tier 0 as long as possible.

7. Observability signals

Track these together:

queue depth and age
consumer lag by partition
request rejection rate (429/503)
thread pool and connection pool saturation
end-to-end latency percentiles

Backpressure is healthy when rejection increases in a controlled way while core SLOs stay stable.

8. Common anti-patterns

retry storms without jitter/backoff
unbounded in-memory buffers
no distinction between overload and functional errors
silently dropping critical messages
autoscaling without load-shedding controls

Backpressure is not "failing more"; it is failing intentionally to protect system integrity.

Sachin Sarawgi

Engineering Manager and backend engineer with 10+ years building distributed systems across fintech, enterprise SaaS, and startups. CodeSprintPro is where I write practical guides on system design, Java, Kafka, databases, AI infrastructure, and production reliability.

Backpressure Propagation: Designing Flow Control in Microservices

Backpressure Propagation

1. TCP-Level vs. App-Level

2. Reactive Streams

3. Backpressure must cross service boundaries

4. Synchronous call chain patterns

5. Async pipeline patterns (Kafka/SQS)

6. Backpressure and priority

7. Observability signals

8. Common anti-patterns

Sachin Sarawgi

Reliability Engineering Mastery

Distributed Snapshots: Chandy-Lamport Algorithm

System Design: Designing Multi-Region Active-Active Architectures

Distributed Locking: The Danger of Fencing Tokens

Distributed Garbage Collection: Managing References Across Networks