Event-Driven Architecture: CQRS and Event Sourcing in Practice

Mental Model

In traditional CRUD (Create, Read, Update, Delete) architectures, the same database model is used for both writing and reading data. Under high traffic, this creates locking contention and complex SQL joins that degrade database throughput. CQRS (Command Query Responsibility Segregation) and Event Sourcing resolve this by segregating write commands from read queries, and storing state as a sequence of immutable events rather than mutable rows.

Instead of executing UPDATE accounts SET balance = balance + 100 on a single row, an event-sourced system appends a FundsDeposited event to an immutable log. The current state is a derived value, computed by reading all past events in sequence. By decoupling the write path from the read path, we can optimize both independently, scaling writes to high velocities while serving complex read queries with low latency.

System Requirements

To implement a production-grade CQRS/ES banking system, we define these requirements:

Functional Requirements

Immutable State Ledger: Every state mutation (depositing money, withdrawing, locking account) must be stored as an immutable event record.
Dynamic Projections: The system must build a read-optimized view showing the user's current account balance.
Command Validation: The write path must validate command business rules (e.g. preventing withdrawals exceeding current balance) before committing events.
Auditing and Replay: The system must allow reconstructing the system state at any historical point in time by replaying events up to that specific timestamp.

Non-Functional Requirements

Write Path SLA: Appending an event to the Event Store must maintain a P99 latency of less than 15 milliseconds.
Read Path SLA: Fetching the account balance projection must return a response in less than 5 milliseconds.
Consistency Window: The lag between an event commit and its read-side projection update must be under 100 milliseconds.
Optimistic Concurrency: Prevent simultaneous write command updates from corrupting the transaction sequence under concurrent processing.

API Design and Interface Contracts

To segregate writes and reads, the system exposes decoupled command and query interfaces. Below is a structured JSON API payload representing a Command write request, followed by the corresponding Query read contract:

Command Request Payload (Client to Write API)

Endpoint: POST /v1/commands/accounts/deposit

Request Payload (JSON):

{
  "command_id": "cmd-880099-deposit",
  "command_type": "DEPOSIT_FUNDS",
  "aggregate_id": "acc-998811",
  "payload": {
    "amount_cents": 50000,
    "currency": "USD",
    "reference": "Payroll deposit"
  },
  "created_at": "2026-05-23T10:00:00.123Z"
}

Query API Response (Read API to Client)

Endpoint: GET /v1/queries/accounts/acc-998811/balance

Response Payload (JSON - 200 OK):

{
  "aggregate_id": "acc-998811",
  "current_balance_cents": 150000,
  "currency": "USD",
  "last_updated_sequence": 14,
  "last_updated_at": "2026-05-23T10:00:00.223Z"
}

High-Level Architecture

CQRS and Event Sourcing decouple operations by splitting datastores into Write Models (Event Stores) and Read Models (Projections).

1. Command Write Pipeline

The client submits a command. The Command Handler fetches past events to rebuild the aggregate's current state, validates the business logic, and appends new events atomically to the Event Store. These events are published to a Message Bus.

graph TD
    Client[Client Command] -->|POST /api/v1/commands| Handler[Command Handler]
    Handler -->|1. Fetch past events| Store[(Event Store)]
    Handler -->|2. Validate & Append| Store
    Store -->|3. Publish Event| Bus[Kafka Event Bus]
    
    %% Style annotations
    classDef writeColor fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
    class Handler,Store writeColor;

2. Read-Side Projection Pipeline

The Projection Processor consumes events from the Message Bus and writes updates to a read-optimized View Database (such as Elasticsearch or Redis). The Query Service serves client reads directly from this View DB with zero SQL joins.

graph TD
    Bus[Kafka Event Bus] -->|Consume| Projection[Projection Processor]
    Projection -->|1. Write optimized update| ViewDB[(Read-Optimized View DB)]
    ClientQuery[Client Query] -->|GET /api/v1/balance| QueryService[Query Service]
    QueryService -->|2. Fetch| ViewDB
    
    %% Style annotations
    classDef readColor fill:#fff3e0,stroke:#e65100,stroke-width:2px;
    class Projection,ViewDB readColor;

Low-Level Design and Schema

Below is a production-ready, compilable Java class modeling an Event Sourced Aggregate. It handles event replay, state mutation application, and command execution:

package com.codesprintpro.eventdriven;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ConcurrentHashMap;

public class EventSourcedAccount {
    private final String aggregateId;
    private int sequence = 0;
    private long balanceCents = 0;
    private final List<AccountEvent> changes = new ArrayList<>();

    public EventSourcedAccount(String aggregateId) {
        this.aggregateId = aggregateId;
    }

    /**
     * Rebuilds aggregate state by replaying a sequence of past events.
     */
    public void replay(List<AccountEvent> history) {
        for (AccountEvent event : history) {
            apply(event, false);
        }
    }

    /**
     * Executes a deposit command. Validates parameters and commits
     * a corresponding event to the state log.
     */
    public void deposit(long amountCents) {
        if (amountCents <= 0) {
            throw new IllegalArgumentException("Amount must be positive");
        }
        AccountEvent event = new AccountEvent(
                this.aggregateId, ++this.sequence, "FUNDS_DEPOSITED", amountCents
        );
        apply(event, true);
    }

    private void apply(AccountEvent event, boolean isNew) {
        this.sequence = event.getSequence();
        if ("FUNDS_DEPOSITED".equals(event.getEventType())) {
            this.balanceCents += event.getPayload();
        }
        if (isNew) {
            this.changes.add(event);
        }
    }

    public List<AccountEvent> getChanges() {
        return this.changes;
    }

    public long getBalanceCents() {
        return this.balanceCents;
    }

    public static class AccountEvent {
        private final String aggregateId;
        private final int sequence;
        private final String eventType;
        private final long payload;

        public AccountEvent(String aggregateId, int seq, String type, long val) {
            this.aggregateId = aggregateId;
            this.sequence = seq;
            this.eventType = type;
            this.payload = val;
        }

        public String getAggregateId() { return aggregateId; }
        public int getSequence() { return sequence; }
        public String getEventType() { return eventType; }
        public long getPayload() { return payload; }
    }
}

The write-side event tables are mapped with primary composite keys on (aggregate_id, sequence) to ensure that no two events can share the same offset, enforcing physical partition integrity.

Scaling Challenges and Capacity Estimation

Operating event-sourced architectures at scale exposes distinct low-level system constraints:

1. Long Replay Latency (Large Event Logs)

If an account has been active for 5 years, it can contain 100,000 unique events. Loading and replaying 100,000 events to rebuild state for a single balance check is too slow, degrading write throughput.

Mitigation: Implement Snapshots. Every 100 events, the system serializes the current aggregate state (balance snapshot) and writes it to a snapshot table. When rebuilding state, the system loads the latest snapshot and only replays subsequent events.

2. Sizing Snapshots and Event Growth

Let's conduct a capacity estimation for a 10M active accounts system:

Events Per Day: Assume 5 Million transactions per day.
Raw Event Storage Size: Each event payload is 300 bytes. $$\text{Daily Event Storage} = 5,000,000 \text{ events} \times 300 \text{ bytes} = 1.5 \text{ GB per day}.$$
Snapshot Storage Size: A snapshot is written every 100 events. The size of an account state snapshot is 200 bytes. $$\text{Daily Snapshots} = (5,000,000 / 100) \times 200 \text{ bytes} = 10 \text{ MB per day}.$$ This shows that snapshots compress the active retrieval load significantly, keeping database read buffers extremely small.

3. Projection Sync Lag (Eventual Consistency)

Under write bursts, Kafka brokers can experience latency, increasing the window during which the read-optimized views are stale relative to the Event Store.

Mitigation: Implement Session Consistency Tokens. Return the event sequence number (last_updated_sequence) in the write API response. The client passes this token on reads, and the Query Service blocks until the read view has caught up to that sequence.

Failure Scenarios and Resilience

Resilience loops are required to handle system state drifts:

Scenario A: Projection Database Corruption

If the Elasticsearch read-side cluster crashes or corrupts its indices, all query views will become unavailable or return incorrect state.

Resiliency Mitigation: Execute Re-projection. Since the Event Store is the absolute, immutable source of truth, you can discard the corrupted Elasticsearch index, create a new, empty index, and replay the entire event stream from offset 0 to rebuild the views from scratch.

Scenario B: Concurrent Command Races (Optimistic Locking Failures)

If two commands attempt to mutate the same account concurrently, they will read the same sequence and attempt to commit events with the same sequence number, causing duplicate key conflicts in the Event Store.

Resiliency Mitigation: Enforce an optimistic locking column constraint on (aggregate_id, sequence) in the event table, causing the second write transaction to fail and retry safely.

Scenario C: Out-Of-Order Event Ingestion on Read-Side

Under network partitions, the projection processor may ingest events out of order (e.g. sequence 15 arrives before sequence 14).

Resiliency Mitigation: The projection processor checks the incoming event's sequence. If the event sequence is greater than the current read model's sequence plus one, it buffers the late-arriving event in a local queue and waits for the missing sequence to arrive.

Architectural Trade-offs

The choice of data pattern dictates the coordination boundaries of a platform:

Strategy	Write Speed	Read Speed	Audit Capability	Implementation Complexity
Standard CRUD	Medium (Requires lock checks)	Low (Complex joins)	None	Low
CQRS (Segregated DBs)	Medium	High (Read-optimized Views)	None	Medium
Event Sourcing (No CQRS)	High (Append-only)	Extremely Low (Slow replay)	Absolute	High
CQRS + Event Sourcing	High (Append-only)	High (Read-optimized Views)	Absolute (Complete ledger history)	Extremely High

Selecting the combined CQRS + Event Sourcing pattern yields maximum performance and audit compliance but introduces distributed coordination, eventual consistency lag, and code complexity.

Staff Engineer Perspective

Verbal Script

Interviewer: "What are the benefits of CQRS and Event Sourcing, and how do they differ from a standard database design?"

Candidate: "A standard database design uses a single, mutable record for both reads and writes. Under high traffic, this creates database locking contention and database performance degradation due to indexing and complex queries. CQRS segregates these paths: commands execute writes on a write-optimized database, while queries read from separate, read-optimized views. Event Sourcing takes this further—instead of storing the current mutable state, it records every mutation as an immutable event. State is rebuilt dynamically by replaying the sequence of events. This ensures an absolute, audit-ready ledger with zero write locks."

Interviewer: "Excellent. How does this segregation affect the consistency model of the application?"

Candidate: "It introduces Eventual Consistency. Because the write-optimized Event Store and the read-optimized views are separate databases, there is a delay while events are processed and written to the query views. This means a user might submit a write command, query the view immediately after, and observe stale data. To mitigate this and provide a clean user experience, we implement session consistency tokens. The write API returns the committed event sequence number, and the read path checks if the query index has caught up to that sequence before returning the balance."

Event-Driven Architecture: CQRS and Event Sourcing in Practice

What you will learn

Mental Model

System Requirements

Functional Requirements

Non-Functional Requirements

API Design and Interface Contracts

Command Request Payload (Client to Write API)

Query API Response (Read API to Client)

High-Level Architecture

1. Command Write Pipeline

2. Read-Side Projection Pipeline

Low-Level Design and Schema

Scaling Challenges and Capacity Estimation

1. Long Replay Latency (Large Event Logs)

2. Sizing Snapshots and Event Growth

3. Projection Sync Lag (Eventual Consistency)

Failure Scenarios and Resilience

Scenario A: Projection Database Corruption

Scenario B: Concurrent Command Races (Optimistic Locking Failures)

Scenario C: Out-Of-Order Event Ingestion on Read-Side

Architectural Trade-offs

Staff Engineer Perspective

Verbal Script

Sachin Sarawgi

Keep Learning

The Expand-Contract Pattern: Zero-Downtime Database Schema Changes

DynamoDB Single Table Design: Advanced Modeling Patterns

Related Articles

API Pagination at Scale: Why OFFSET 100,000 is a Database Killer

Bypassing the Kernel: User-Space Networking for Sub-Microsecond Performance

HyperLogLog at Scale: Billion-Cardinality Estimation

gRPC Schema Evolution: Avoiding Breaking Changes

More in System Design

LLD Mastery: The Factory Design Pattern

LLD Mastery: The Singleton Design Pattern

LLD Mastery: The Strategy Design Pattern

Event-Driven Architecture: CQRS and Event Sourcing in Practice

What you will learn

Mental Model

System Requirements

Functional Requirements

Non-Functional Requirements

API Design and Interface Contracts

Command Request Payload (Client to Write API)

Query API Response (Read API to Client)

High-Level Architecture

1. Command Write Pipeline

2. Read-Side Projection Pipeline

Low-Level Design and Schema

Scaling Challenges and Capacity Estimation

1. Long Replay Latency (Large Event Logs)

2. Sizing Snapshots and Event Growth

3. Projection Sync Lag (Eventual Consistency)

Failure Scenarios and Resilience

Scenario A: Projection Database Corruption

Scenario B: Concurrent Command Races (Optimistic Locking Failures)

Scenario C: Out-Of-Order Event Ingestion on Read-Side

Architectural Trade-offs

Staff Engineer Perspective

Verbal Script

Get the next backend guide in your inbox

Sachin Sarawgi

Keep Learning

The Expand-Contract Pattern: Zero-Downtime Database Schema Changes

DynamoDB Single Table Design: Advanced Modeling Patterns

Related Articles

API Pagination at Scale: Why OFFSET 100,000 is a Database Killer

Bypassing the Kernel: User-Space Networking for Sub-Microsecond Performance

HyperLogLog at Scale: Billion-Cardinality Estimation

gRPC Schema Evolution: Avoiding Breaking Changes

More in System Design

LLD Mastery: The Factory Design Pattern

LLD Mastery: The Singleton Design Pattern

LLD Mastery: The Strategy Design Pattern