A customer places an order. The Order Service reserves inventory, then calls the Payment Service to charge the card. The payment fails — the card is declined. Now what? In a monolith, you'd roll back the database transaction. In a microservices system, the inventory reservation already happened on a different service with a different database. There's no distributed transaction to roll back. The saga pattern is how microservices manage this: a sequence of local transactions, each with a compensating action that undoes it if a later step fails.
How sagas work
The happy path is a chain of forward steps:
- Order Service: create order record (status: PENDING)
- Inventory Service: reserve stock for the product
- Payment Service: charge the customer's card
- Order Service: update order status to CONFIRMED
- Notification Service: send confirmation email
A failure at step 3 triggers compensating transactions in reverse:
- Compensate step 2: Inventory Service releases the reserved stock
- Compensate step 1: Order Service marks the order as FAILED
Each compensation is its own local transaction — independent, idempotent, and retryable.
Choreography vs orchestration
Choreography sagas: each service publishes events and other services react to them. No central coordinator. Order Service publishes inventory.reserved, Payment Service listens and charges the card, publishes payment.processed, Order Service listens and confirms the order.
Orchestration sagas: a central saga orchestrator — a dedicated service or a workflow engine like Temporal or Camunda — sends commands to each participant and handles the compensation logic centrally.
For testing purposes, orchestration sagas are easier: you can directly query the orchestrator for the saga's current state. Choreography sagas require tracing events across topics, which makes assertions harder and slower to write.
Testing the happy path
@Test
void shouldCompleteOrderSagaWhenAllServicesSucceed() {
// Trigger the saga by placing an order
Response response = orderServiceClient.placeOrder(
new CreateOrderRequest(testUser.getId(), testProduct.getId(), 1));
assertThat(response.statusCode()).isEqualTo(202); // Accepted — async processing
String orderId = response.jsonPath().getString("orderId");
// Poll until the saga completes
await().atMost(15, SECONDS).untilAsserted(() -> {
Order order = orderServiceClient.getOrder(orderId);
assertThat(order.getStatus()).isEqualTo("CONFIRMED");
});
// Verify side effects in each participating service
Payment payment = paymentServiceClient.getByOrderId(orderId);
assertThat(payment.getStatus()).isEqualTo("CHARGED");
int currentStock = inventoryServiceClient.getStock(testProduct.getId());
assertThat(currentStock).isEqualTo(originalStock - 1);
}The happy path test confirms that every forward step ran and that each service reflects the correct final state. It also serves as a baseline — if this test breaks, every failure-path test is irrelevant until the happy path is restored.
Testing failure and compensation
This is the most important saga test — verify that when a step fails, the compensation chain runs correctly:
@Test
void shouldReleaseInventoryWhenPaymentFails() {
// Configure Payment Service to decline the card
paymentServiceMock.stubFor(post("/charge")
.willReturn(aResponse().withStatus(402).withBody("{\"error\":\"card_declined\"}")));
int stockBefore = inventoryServiceClient.getStock(testProduct.getId());
// Trigger the saga
orderServiceClient.placeOrder(
new CreateOrderRequest(testUser.getId(), testProduct.getId(), 1));
// Wait for compensation to complete
await().atMost(20, SECONDS).untilAsserted(() -> {
// Order should be in FAILED or PAYMENT_FAILED state
Order order = orderServiceClient.getOrder(sagaId);
assertThat(order.getStatus()).isIn("FAILED", "PAYMENT_FAILED");
// Inventory should be restored — this is the key compensation assertion
int stockAfter = inventoryServiceClient.getStock(testProduct.getId());
assertThat(stockAfter).isEqualTo(stockBefore);
});
}The assertion that matters most is the final stock count. If it does not match stockBefore, the compensation did not run. This test catches the most dangerous class of saga bugs: forward steps that succeed but whose compensations are never triggered or are silently failing.
Testing mid-saga network failures
The hardest saga test: what happens when the network fails between two saga steps?
@Test
void shouldCompensateWhenNetworkFailsMidSaga() throws Exception {
// Let inventory reservation succeed, then cut the network to Payment Service
// after the inventory step completes
CompletableFuture.runAsync(() -> {
await().until(() -> inventoryServiceClient.isReserved(testProduct.getId()));
paymentServiceProxy.toxics().bandwidth("cutoff", ToxicDirection.DOWNSTREAM, 0);
});
orderServiceClient.placeOrder(orderRequest);
// After timeout + compensation, inventory should be released
await().atMost(30, SECONDS).untilAsserted(() -> {
assertThat(inventoryServiceClient.getStock(testProduct.getId()))
.isEqualTo(originalStock);
});
}Toxiproxy is the right tool here: it lets you inject network conditions (latency, bandwidth limits, connection cuts) between specific services at a specific moment in the saga. A fixed Thread.sleep would be too fragile for this scenario.
Saga state visibility
Sagas that use an orchestrator such as Temporal or Camunda expose saga status via an API. Assert on that state directly rather than reverse-engineering it from individual service states:
// For orchestrated sagas — query the saga state machine directly
SagaStatus sagaStatus = sagaOrchestrator.getSagaStatus(sagaId);
assertThat(sagaStatus.getCurrentStep()).isEqualTo("COMPENSATION_COMPLETE");
assertThat(sagaStatus.getCompensatedSteps()).containsExactly("PAYMENT", "INVENTORY");Step 1 of 5
Step 1: Reserve inventory
Inventory Service reserves 1 unit of the product. Stock decreases from 10 to 9. A compensation handler is registered: 'if saga fails, release this reservation'.
⚠️ Common mistakes
- Not testing each failure point individually. A saga with 5 steps has 5 possible failure points, each with a different compensation chain. Testing only the happy path and one failure leaves 4 untested compensation paths. Write a test for every step that can fail — this is where sagas most commonly break in production.
- Writing compensation logic that is not idempotent. Compensation commands can be delivered more than once (at-least-once delivery). If releasing an inventory reservation is implemented as
stock = stock + quantityand the command arrives twice, stock is incremented twice. Compensations must be idempotent: check whether the compensation has already run before applying it. - Using Thread.sleep to wait for compensation instead of polling. Saga compensation time is highly variable — it depends on message broker lag, service load, and the number of steps to compensate. A fixed sleep will be too short under load and wasteful when the system is fast. Always use
await().untilAsserted(...)for saga assertions.
🎯 Practice task
- Design a 3-step saga on paper: Order Service, Payment Service, Inventory Service. Draw the forward steps and the compensation for each step failure. Identify which compensation actions must be idempotent and explain why.
- Write the happy-path saga test: trigger the full saga, poll until the final status is CONFIRMED, and assert on the side effects in each service (payment status, inventory count, order status).
- Write the payment failure test: stub the Payment Service to return 402, trigger the saga, and assert both that the order is marked FAILED and that inventory is restored to its pre-saga value.
- Make the compensation idempotent: send the compensation command twice for the inventory step. Assert the stock value is the same as if the command ran once. If it isn't, fix the compensation logic.
- Research Temporal.io's workflow testing SDK (Java). Write a unit test for a saga workflow using Temporal's
TestWorkflowEnvironment. How does this approach differ from the integration test approach used in this lesson?
The next lesson covers contract testing with Pact — how to verify that the stubs you use in component and saga tests actually match the behaviour of the real services they represent.