Observability in Test Environments — Microservices Testing

"Test environment" doesn't mean stripped-down environment. A test environment without logs, metrics, or traces produces failures you can't diagnose — you know something broke, but not where, why, or what was happening when it did. Bringing the same observability stack you run in production into your test environment turns a frustrating two-hour debugging session into a five-minute trace inspection.

The three pillars — and what each tells you in tests

Each pillar answers a different question when a test fails.

Logs — discrete, timestamped events. Each log line records that something happened: "order 42 created", "payment service returned 503", "retry attempt 2 of 3". In tests, logs tell you the sequence of events that led to a failure. A failed assertion is the symptom; the logs show the cause.

Metrics — aggregated numeric measurements over time: requests per second, error rate, p99 latency, connection pool utilisation. In tests, metrics tell you whether a service behaved normally under load, or whether a particular code path caused a spike in error rate. Useful for asserting non-functional requirements.

Traces — the complete request journey across services (covered in depth in the previous lesson). In tests, traces tell you where time was spent and which service introduced a failure in a distributed flow.

Adding observability to docker-compose.test.yml

Add these four services to your existing compose file:

# Add to existing docker-compose.test.yml
 
  jaeger:
    image: jaegertracing/all-in-one:1.52
    ports:
      - "16686:16686"   # UI
      - "4317:4317"     # OTLP gRPC
 
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
 
  loki:
    image: grafana/loki:latest
    ports:
      - "3100:3100"
 
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      GF_AUTH_ANONYMOUS_ENABLED: "true"
      GF_AUTH_ANONYMOUS_ORG_ROLE: Admin

Your services configure their log shippers, metrics exporters, and OTel agents to point at these addresses. Now when a test fails, Grafana shows you the last 30 minutes of logs and metrics from every service.

Asserting on logs in tests

HTTP status codes tell you what your service returned. Log assertions tell you what it was doing internally. Here is a practical pattern using Loki:

@Test
void shouldNotProduceErrorLogsDuringSuccessfulOrderPlacement() {
    Instant testStart = Instant.now();
 
    given()
        .contentType(ContentType.JSON)
        .body(validOrderRequest)
        .post("/orders")
        .then()
        .statusCode(201);
 
    // Query Loki for ERROR-level logs from the order-service since test started
    String lokiQuery = String.format(
        "{service=\"order-service\",level=\"ERROR\"}&start=%d",
        testStart.toEpochMilli() * 1_000_000  // Loki uses nanoseconds
    );
    List<LogLine> errors = lokiClient.query(lokiQuery);
 
    assertThat(errors)
        .as("No ERROR logs expected during successful order placement")
        .isEmpty();
}

An integration test that passes HTTP assertions but silently logs errors is a test that's lying to you. Explicit log assertions surface those hidden failures.

Asserting on metrics in tests

@Test
void shouldMaintainErrorRateBelowThresholdUnderLoad() throws Exception {
    // Capture baseline metric
    double baselineErrors = prometheusClient.queryInstant(
        "sum(rate(http_server_requests_seconds_count{status=~\"5..\",service=\"order-service\"}[1m]))"
    );
 
    // Run 50 concurrent requests
    ExecutorService pool = Executors.newFixedThreadPool(20);
    List<Future<Integer>> futures = IntStream.range(0, 50)
        .mapToObj(i -> pool.submit(() ->
            given().body(validOrderRequest).post("/orders").statusCode()))
        .collect(Collectors.toList());
 
    int failures = (int) futures.stream()
        .mapToInt(f -> { try { return f.get(); } catch (Exception e) { return 500; }})
        .filter(s -> s >= 500)
        .count();
 
    // Assert 5xx rate < 1%
    assertThat(failures).isLessThan(1);
}

Observability in Tests

– Structured JSON format
– Loki aggregation
– Assert no ERROR logs

– Prometheus scraping
– Error rate assertions
– Latency SLA checks

– Jaeger / OpenTelemetry
– Service interaction proof
– Flaky test diagnosis

Unified dashboard –
Correlate all three –
Post-failure inspection –

Practical observability setup for QA teams

Four concrete actions you can take this week:

Configure structured logging (JSON format) in all services — plain text logs are hard to query programmatically. Use logback-spring.xml with JsonEncoder or the equivalent in your stack.
Add health check endpoints to all services (/actuator/health) — compose containers wait on these; your tests can also poll them before firing requests.
Export basic Micrometer metrics in Spring Boot with no code changes: set management.endpoints.web.exposure.include=health,metrics,prometheus in your application properties.
Keep the observability stack in a separate docker-compose.observability.yml — tests can run without it, but it is available when debugging. This keeps your baseline compose file lean and the observability stack opt-in.

If you want to go deeper on structured logging patterns and how to standardise log schemas across services, the Test Automation Frameworks course covers this in its Logging Strategy lesson.

⚠️ Common mistakes

Running the test environment without structured logging. Free-text log lines can't be queried reliably. If your service logs "Error processing order #42 for user Alice" as a plain string, there's no way to programmatically extract the order ID or filter by severity. Always configure JSON logging (logback-spring.xml with JsonEncoder or equivalent) before connecting Loki.
Only adding observability when debugging a specific failure. Observability infrastructure should be present from the start, not added reactively. By the time you're trying to debug a flaky test that's been intermittent for two weeks, the absence of traces from those two weeks is a significant gap.
Forgetting that Prometheus scrapes on a pull interval (default 15s). If your test runs in under 15 seconds and you assert on metrics immediately after, Prometheus may not have scraped the latest values yet. For short tests, query the service's own /actuator/prometheus endpoint directly instead of going through Prometheus.

🎯 Practice task

Add Jaeger, Prometheus, and Loki to your docker-compose.test.yml. Configure one service to ship logs to Loki (using the Loki Docker log driver or a Promtail sidecar) and export metrics to Prometheus. Start the stack and verify data appears in Grafana.
Write a test that queries Loki for ERROR-level logs from a specific service during a test run. Run a happy-path request and assert no error logs were produced. Then introduce an intentional bug that causes a logged error and confirm the test catches it.
Add a Prometheus metric assertion to an existing load test: assert that the error rate stays below 1% across 50 requests. Run it, observe it passing, then break the service (e.g., shut down its database via Toxiproxy) and confirm the metric assertion fails.
Open Grafana after a test failure and use the "Explore" view to correlate a trace (from Jaeger) with the logs (from Loki) from the same time window. Write down how you linked the two — which field connected them?
Review your existing test docker-compose configuration. List every service that doesn't emit structured JSON logs. Estimate the effort to add structured logging to each. Prioritise by: which service produces the most debugging pain when tests fail?

The next lesson moves into contract testing — how to verify that the WireMock stubs you write in component tests actually match what the real downstream services produce.