System Design: Managing Distributed Transactions with the Saga Pattern
In a monolithic architecture, a single database transaction guarantees ACID properties across all operations. In a microservices architecture, a single business process (like "Order Placement") often spans multiple services, each with its own private database. Distributed transactions are slow, fragile, and often impossible to implement at scale. The Saga Pattern is the industry standard for maintaining data consistency in this environment.
1. Core Concepts
A Saga is a sequence of local transactions. Each local transaction updates its own service's database and publishes an event or message to trigger the next local transaction in the saga.
2. Handling Failure: Compensating Transactions
Since we don't have a global rollback mechanism (like a traditional SQL ), we must use Compensating Transactions.
- If a step in the saga fails, the system executes a series of "undo" operations to revert the changes made by previous successful steps.
- Example: If the "Payment Service" fails after the "Inventory Service" has reserved stock, the system must trigger a compensating transaction in Inventory to release the reserved items.
- Crucial Rule: Compensating transactions must be idempotent, as they might be retried due to network failures.
3. Two Saga Architectures
Choreography (Event-Based)
Each service emits events and listens to events from others. There is no central controller.
- Pros: Simple to start; loose coupling between services.
- Cons: Extremely difficult to debug; risk of cyclic dependencies; hard to monitor the state of the entire saga.
Orchestration (Command-Based)
A central "Saga Orchestrator" service manages the entire workflow, sending commands to participants and receiving replies.
- Pros: Easier to understand, debug, and monitor; avoids cyclic dependencies.
- Cons: The orchestrator can become a source of complex logic; potential bottleneck.
4. The ACD (Isolation) Problem
Sagas provide Atomicity, Consistency, and Durability, but they lack Isolation.
- Because local transactions are committed immediately, other sagas might see "dirty" or intermediate data.
- Countermeasures:
- Semantic Locks: Use a state field like to prevent other processes from using data until the saga is complete.
- Versioned Objects: Always include a version or timestamp to verify if the record is still valid before applying the next step.
5. Summary
The Saga pattern is not about preventing errors; it's about managing failure gracefully. By decomposing a global transaction into a series of local ones with robust compensating logic, you can maintain eventual consistency across complex, distributed microservices.
