Eventual Consistency Testing

9 min read

A customer updates their shipping address on the User Service. The change is saved immediately. Five seconds later, an open order on the Order Service still shows the old address — Order Service caches user data and updates it asynchronously when a user.updated event arrives. Both states are correct at their respective moments. This is eventual consistency: not a bug, but a deliberate architectural trade-off that every QA engineer working with microservices must know how to test.

What eventual consistency means for QA

These four points frame the testing challenge:

  • Strong consistency: every read sees the most recent write immediately. All databases in a monolith behave this way.
  • Eventual consistency: reads may see stale data for a brief window. Each microservice database is the source of truth for its own data; other services catch up via events.
  • The QA implication: asserting state immediately after a write in an eventually-consistent system produces intermittent failures — not because the system is broken, but because the propagation window hasn't closed yet.
  • What to test instead: verify that consistency IS eventually achieved within a defined time bound (the SLA), not that it happens instantly.

The polling pattern — the right way

The correct tool is Awaitility. Here is a complete example that avoids every common trap:

@Test
void shouldReflectEmailUpdateInOrderServiceWithin10Seconds() {
    String newEmail = "alice.updated@test.com";
 
    // Update on User Service (immediate)
    userServiceClient.updateEmail(userId, newEmail);
 
    // Order Service sees the change only after the user.updated event propagates
    await()
        .atMost(10, SECONDS)
        .pollInterval(500, MILLISECONDS)
        .pollDelay(Duration.ZERO)           // start polling immediately
        .failFast("User service returned error",
            () -> userServiceClient.getUser(userId).statusCode() != 200)
        .untilAsserted(() -> {
            UserSummary cached = orderServiceClient.getCachedUser(userId);
            assertThat(cached.getEmail()).isEqualTo(newEmail);
        });
}

Each Awaitility option does specific work:

  • atMost(10, SECONDS) — fail the test if consistency hasn't been achieved in 10 seconds
  • pollInterval(500, MILLISECONDS) — check every half second rather than spinning
  • pollDelay(Duration.ZERO) — start checking immediately; the default adds a 100 ms initial delay that is never useful here
  • failFast(condition) — abort early if an upstream precondition fails, preventing a pointless 10-second wait for a broken dependency
  • untilAsserted — accepts full AssertJ assertions, not just a boolean, so your failure messages are meaningful

Setting SLA-based timeouts by environment

Eventual consistency windows vary by environment — a local Docker setup propagates in under a second, a loaded CI cluster may take 20 seconds. Hardcoding one timeout everywhere either makes local tests slow or makes CI tests flaky. Parameterise instead:

// Define consistency SLA per environment
static Duration consistencySla() {
    return switch (System.getProperty("test.env", "local")) {
        case "ci"      -> Duration.ofSeconds(20);
        case "staging" -> Duration.ofSeconds(30);
        default        -> Duration.ofSeconds(10);  // local
    };
}
 
@Test
void emailPropagatesWithinSla() {
    userServiceClient.updateEmail(userId, "new@test.com");
    await().atMost(consistencySla()).untilAsserted(() ->
        assertThat(orderServiceClient.getCachedUser(userId).getEmail())
            .isEqualTo("new@test.com")
    );
}

Pass -Dtest.env=ci in your CI Maven command and the longer timeout activates automatically with no test code changes.

Testing the SLA boundary

The test above verifies that consistency eventually happens. You should also write a test that verifies it happens WITHIN the SLA — a non-functional assertion on propagation speed:

@Test
void emailPropagationCompletesWithin5SecondsOnLocal() {
    String newEmail = "sla-test@test.com";
    Instant before = Instant.now();
 
    userServiceClient.updateEmail(userId, newEmail);
    await().atMost(10, SECONDS).untilAsserted(() ->
        assertThat(orderServiceClient.getCachedUser(userId).getEmail())
            .isEqualTo(newEmail)
    );
 
    Duration propagationTime = Duration.between(before, Instant.now());
    assertThat(propagationTime).isLessThan(Duration.ofSeconds(5));
}

If a Kafka partition is reassigned or a consumer group rebalances, this test will catch the resulting slowdown before it reaches production.

Step 1 of 5

Write to primary service

The customer updates their email in User Service. The change is committed to User Service's database immediately and returns 200 OK to the caller.

Test isolation in eventually-consistent systems

The challenge: if test A updates user 42's email and test B expects user 42 to have the old email, test B will fail intermittently depending on whether propagation has completed. Three strategies eliminate this class of race condition:

  • Use a unique entity per test: generate userId = UUID.randomUUID() in each test's @BeforeEach setup, never a shared fixture user
  • Wait for full propagation at the end of setup: after creating test data, poll until all dependent services have received the data before the assertion phase of any test begins
  • Use deterministic event ordering: publish a sentinel event after the test data events, then poll for the sentinel before asserting; when the sentinel arrives you know all preceding events have been processed

⚠️ Common mistakes

  • Using Thread.sleep instead of polling. Thread.sleep(5000) adds 5 seconds to every run of that test regardless of how fast consistency actually propagated. On a fast local machine it wastes 4.5 seconds; in a loaded CI environment the propagation may take 6 seconds and the test still fails. Awaitility with pollInterval wastes no time and adapts to the actual environment speed.
  • Sharing entity IDs across tests in eventually-consistent scenarios. If test A and test B both operate on userId = 42, test A's update event may arrive during test B's assertion window and cause a spurious failure. Always generate a unique entity ID per test — it costs nothing and eliminates an entire class of race conditions.
  • Not documenting the consistency SLA in test code. When a test has await().atMost(30, SECONDS), the 30-second number is invisible to future readers. Document it: add a comment or extract a named constant (ORDER_SERVICE_EMAIL_PROPAGATION_SLA) so teams can find and update the SLA when the system's performance characteristics change.

🎯 Practice task

  1. Write a test that updates an entity in one service and asserts its propagation to a second service. Use await().atMost(10, SECONDS).pollInterval(500, MILLISECONDS).untilAsserted(...). Run it three times consecutively and confirm it passes consistently.
  2. Replace untilAsserted with a direct assertion (no polling). Run the test 10 times. Count how many times it fails. This is the baseline flakiness rate of asserting on eventually-consistent data without polling.
  3. Add a failFast condition that aborts the poll if the source service is unreachable. Simulate the source service going down (stop the container) and verify the test fails quickly rather than waiting for the full timeout.
  4. Extract the consistency timeout to a consistencySla() method that reads a system property. Parameterise your CI configuration to pass test.env=ci and verify the longer timeout is used.
  5. Design a test for a three-hop consistency chain: Service A → (event) → Service B → (event) → Service C. Write the test, define the SLA for the full chain, and add an assertion that measures and verifies the end-to-end propagation time.

The next lesson covers contract testing with Pact — how to verify that the event schemas your producer publishes stay compatible with what your consumers expect, without running both services at the same time.

// tip to track lessons you complete and pick up where you left off across devices.