What Is Event-Driven Architecture?
Mental Model
A synchronous API call is like a phone call — both parties must be available at the same time. A message queue is like email — the sender sends it and moves on; the receiver processes it when ready. Event-driven architecture is choosing email at the system design level.
In a traditional REST-based architecture, service A directly calls service B. If B is slow, A is slow. If B is down, A fails. As your system grows, this tight coupling becomes your bottleneck.
Event-driven architecture solves this by introducing a message broker between services:
# Synchronous (tight coupling):
OrderService → [HTTP] → PaymentService → [HTTP] → InventoryService
↑ A must wait for B to respond before proceeding
# Event-Driven (loose coupling):
OrderService → [Event: order.placed] → Message Broker
PaymentService ← reads from broker when ready
InventoryService ← reads from broker independently
NotificationService ← reads from broker independently
The result: services are independent, scalable, and resilient to each other's failures.
When to Use Messaging (and When Not To)
Messaging is not always the right answer. The decision comes down to your consistency and coupling requirements:
| Scenario | Use REST/gRPC | Use Messaging |
|---|---|---|
| Need an immediate response | ✅ | ❌ |
| Read API (GET requests) | ✅ | ❌ |
| One service talks to one service | ✅ | Consider it |
| Fan-out (one event → many consumers) | ❌ | ✅ |
| Async work (email, notifications) | ❌ | ✅ |
| Handling traffic spikes (buffering) | ❌ | ✅ |
| Audit log / event sourcing | ❌ | ✅ |
| Service B doesn't need to be "real-time" | ❌ | ✅ |
The wrong choice: Using Kafka for a user-facing API that needs to return a result in the same HTTP response. Messaging is for fire-and-forget or asynchronous fan-out, not request-response.
The Three Messaging Systems You Need to Know
Apache Kafka: The Distributed Commit Log
Kafka is not a traditional message queue. It is a distributed, partitioned, replicated commit log designed for high-throughput event streaming.
Mental model: Kafka is a database optimized for sequential reads and writes. Events are stored durably and consumers read them at their own pace. Unlike a queue, messages are not deleted after consumption — they are retained for a configurable period (default: 7 days).
Use Kafka when:
- You need to replay events (audit trail, event sourcing)
- Multiple independent consumer groups need the same data
- Throughput > 100K events/second
- Event ordering per entity is required (e.g., all events for user-123 in sequence)
- You are building real-time stream processing pipelines
Do not use Kafka when:
- You need flexible routing (topic-per-message-type doesn't scale to hundreds of types)
- Your messages need priorities (Kafka has no native priority queue)
- You need simple task queues with competing consumers (RabbitMQ is better)
RabbitMQ: The Message Broker
RabbitMQ implements the AMQP protocol and is designed for flexible routing and task queues. It has the concept of exchanges (routing rules) and queues (buffers) as separate entities.
Mental model: RabbitMQ is a post office with a sophisticated routing system. You tell it the rules (exchanges + bindings), and it routes messages to the right queues automatically.
Use RabbitMQ when:
- You need complex routing (direct, fanout, topic, headers exchanges)
- Messages should be deleted after successful acknowledgment
- You need priority queues
- Task distribution across competing workers is your use case
- Your throughput is < 50K messages/second
AWS SQS + SNS + EventBridge: The Managed Cloud Option
For teams on AWS who don't want to operate their own broker:
- SQS: Simple queue. At-least-once delivery. Best for task queues and async processing.
- SNS: Fan-out pub/sub. One message → many SQS queues or Lambda functions.
- EventBridge: Event bus with sophisticated routing rules, schema registry, and 200+ AWS service integrations.
Use the AWS stack when: You're fully on AWS, want zero infrastructure management, and can accept the higher per-message cost.
The Learning Path
Phase 1: Kafka Fundamentals (The Foundation)
Start here. Kafka is the most important messaging system for backend engineers to understand deeply.
→ Kafka Internals Deep Dive
Partitions, offsets, consumer groups, ISR, producer acknowledgments
→ Kafka Exactly-Once Semantics
Idempotent producers, transactional APIs, read-process-write atomicity
→ Kafka Consumer Groups Explained
How group rebalancing works, partition assignment strategies
→ Kafka Zero-Copy Throughput
The OS-level optimization that makes Kafka fast
Estimated time: 4-5 hours
Phase 2: Kafka Operations (Production Skills)
→ Kafka Consumer Lag Playbook
Diagnosing and fixing consumer lag — the most important operational skill
→ Kafka Consumer Rebalancing Playbook
Stop-the-world rebalances, cooperative sticky rebalancing
→ Kafka Partition Skew Management
Hot partitions, key skew, and partition strategies at scale
→ Kafka Rebalance Storms and Cooperative-Sticky Strategy
Advanced consumer group stability patterns
Estimated time: 4 hours
Phase 3: RabbitMQ & Alternatives
→ RabbitMQ Internals Deep Dive
Exchanges, queues, bindings, and the AMQP protocol
→ RabbitMQ Quorum Queues and Raft Consensus
High availability guarantees and when to use them
→ SQS, Kafka, and EventBridge: AWS Comparison
When to choose each AWS messaging service
→ Retry Queues vs Dead Letter Queues (DLQ)
Architecting resilient message consumers with failure handling
Estimated time: 3 hours
Phase 4: Distributed Patterns
→ Event Sourcing & CQRS in Production
Storing state as events, separate read/write models
→ The Transactional Outbox Pattern
Guaranteed at-least-once event delivery without distributed transactions
→ Change Data Capture (CDC) with Debezium
Turning database changes into event streams
→ Kafka Streams for Real-Time Processing
Stateful stream processing without a separate framework
Estimated time: 5 hours
Prerequisites Checklist
Before starting this track, confirm you have:
- Built at least one REST API that talks to a database
- Understand basic concurrency concepts (threads, async processing)
- Familiar with Docker (you'll run Kafka/RabbitMQ locally)
- Comfortable reading Java or Python code (examples use both)
If you're missing any of these, the Backend Systems Mastery track covers the REST API fundamentals you'll need.
Quick Start: Run Kafka Locally in 3 Minutes
# Start Kafka with Docker Compose
cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
kafka:
image: confluentinc/cp-kafka:7.6.0
ports:
- "9092:9092"
environment:
KAFKA_NODE_ID: 1
KAFKA_PROCESS_ROLES: broker,controller
KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT
KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk
EOF
docker compose up -d
# Create a test topic
docker exec kafka kafka-topics --create \
--bootstrap-server localhost:9092 \
--topic order-events \
--partitions 6 \
--replication-factor 1
# Produce a test message
echo '{"orderId": "123", "status": "placed"}' | \
docker exec -i kafka kafka-console-producer \
--bootstrap-server localhost:9092 \
--topic order-events
# Consume it
docker exec kafka kafka-console-consumer \
--bootstrap-server localhost:9092 \
--topic order-events \
--from-beginning
Key Vocabulary
| Term | Definition |
|---|---|
| Producer | Application that writes events to a topic |
| Consumer | Application that reads events from a topic |
| Topic | Named stream of events (like a database table) |
| Partition | Ordered, append-only log within a topic; unit of parallelism |
| Offset | Sequential ID of a record within a partition |
| Consumer Group | Set of consumers that share the work of consuming a topic |
| Broker | A Kafka server node; stores partitions and serves requests |
| ISR | In-Sync Replicas: replicas fully caught up with the leader |
| DLQ | Dead Letter Queue: where failed messages are sent after max retries |
| Exactly-once | Each message is processed once and only once, even on failure |
Start Here
The first technical deep-dive in this track is Kafka Internals:
→ Kafka Internals Deep Dive: Partitions, Offsets, and Consumer Groups
If you're specifically interested in RabbitMQ:
→ RabbitMQ Internals Deep Dive
For the AWS cloud-managed path:
→ SQS, Kafka, and EventBridge: Choosing the Right AWS Messaging Service
Key Takeaways
- Messaging decouples services in time and space — the producer does not need to know if, when, or how many consumers process its events.
- Kafka is a distributed commit log optimized for high-throughput ordered streams; RabbitMQ is a message broker optimized for routing and task queues.
- The path to mastery moves from basic pub/sub → consumer groups & offsets → exactly-once semantics → production operations.