End-to-End Tests in Microservices — When and How

Every test course talks about using E2E tests sparingly. In a monolith that advice is sensible but often ignored — the cost of adding one more Cypress test feels low. In a microservices system, the cost is very real: every service is another failure point, every startup adds latency, and debugging a failure requires tracing through 12 logs simultaneously. This lesson makes the case for radical restraint on E2E tests and shows how to write the small number that are truly worth their cost.

Why E2E tests are expensive in microservices

Running E2E tests in a microservices system is not the same as running them against a monolith. The costs compound at every layer:

Infrastructure cost: you need the full system running — 10+ services, 10+ databases, potentially Kafka, Redis, and external API sandboxes. Every new service added to the system adds to the baseline cost of every E2E run.
Startup time: a full staging environment can take 2–5 minutes to stabilise. E2E suites on realistic systems easily run 30–60 minutes, which makes them incompatible with pull request feedback loops.
Brittleness multiplier: each service is an independent failure point. A single unhealthy service — one that has nothing to do with your change — makes every E2E test flaky. Teams stop trusting E2E results when they fail for infrastructure reasons more often than for code reasons.
Debugging difficulty: a test failing with "timeout on checkout" could mean the Order Service, Payment Service, Inventory Service, or the network between them. Finding the culprit requires correlating logs from all services — a task that takes significantly longer than fixing the underlying bug.
Duplication: most E2E tests verify things that contract tests and integration tests already cover more efficiently. The E2E test adds risk surface without adding meaningful new coverage.

What E2E tests are actually for in microservices

Be precise about why you are writing an E2E test before you write it. The justified use cases are narrow:

Verifying critical user-facing journeys that span many services and must work in the same environment where users run — not a local Docker Compose stack, but a real staging environment with production configuration.
Catching cross-cutting concerns that no unit or integration test would catch — for example, auth middleware applied inconsistently across service gateways, or CORS headers missing on specific endpoints in combination.
Regulatory compliance — some industries require documented proof that the full system works end-to-end as part of an audit trail.
As a smoke test after deployment — not a full regression suite, but a targeted check that asks "did this release break checkout?"

If your reason does not fall into one of these categories, you are likely testing something a component test or contract test could catch with less cost.

The five E2E flows worth writing

For an e-commerce system, the entire E2E suite should fit in roughly five tests. These are the flows where failure causes the most user impact and where the interaction between services is genuinely the thing being tested:

Registration to first purchase: new user signs up → verifies email → browses → adds to cart → checkout → order confirmed
Login with expired session: user's session expires → redirected to login → re-authenticates → continues from where they were
Order → payment → confirmation email: place order → payment processed → confirmation email received (verified via email API, not inbox inspection)
Cancel order → refund: place order → cancel within window → refund initiated → refund reflected in balance
Password reset: request reset → receive email link → set new password → log in with new password

If you are tempted to add a sixth test, ask whether an existing component test or contract test already covers the failure mode you are worried about.

An E2E test in Playwright (TypeScript)

The checkout flow in code, with the patterns that make E2E tests in microservices survivable:

import { test, expect } from '@playwright/test';
import { createTestUser, getOrderStatus } from '../helpers/test-api';
 
test('user can complete checkout end to end', async ({ page }) => {
    const user = await createTestUser();  // creates user via API, not UI
 
    await page.goto('/login');
    await page.fill('[data-testid="email"]', user.email);
    await page.fill('[data-testid="password"]', user.password);
    await page.click('[data-testid="login-submit"]');
    await expect(page).toHaveURL('/dashboard');
 
    await page.goto('/products/laptop-pro');
    await page.click('[data-testid="add-to-cart"]');
    await page.goto('/cart');
    await page.click('[data-testid="checkout"]');
 
    // Payment (use Stripe test card)
    await page.fill('[data-testid="card-number"]', '4242424242424242');
    await page.fill('[data-testid="card-expiry"]', '12/28');
    await page.fill('[data-testid="card-cvc"]', '123');
    await page.click('[data-testid="place-order"]');
 
    // Verify order confirmed in UI
    await expect(page.locator('[data-testid="order-status"]'))
        .toHaveText('Order Confirmed', { timeout: 15000 });
 
    // Verify backend state via API
    const orderId = await page.locator('[data-testid="order-id"]').textContent();
    const order = await getOrderStatus(orderId!);
    expect(order.status).toBe('CONFIRMED');
    expect(order.paymentStatus).toBe('CHARGED');
});

Four patterns in this test deserve attention. createTestUser() creates preconditions via API rather than navigating the registration UI — this is faster, more stable, and keeps the test focused on the scenario being verified rather than the setup. data-testid attributes decouple selectors from layout and styling changes; when a designer renames a CSS class, the test does not break. { timeout: 15000 } on the order confirmation assertion accounts for real async processing across multiple services — the Payment Service and Order Service are not synchronous, and a 2-second default timeout will produce false failures in production-realistic environments. The final API assertion via getOrderStatus ensures the backend has processed the order correctly, not just that the UI is showing a success screen before the backend has caught up.

Running E2E tests: when and where

Frequency: nightly only. Not on every PR. Running 10 E2E tests nightly is better than 100 E2E tests run weekly because teams stopped waiting for them. The moment E2E tests slow down your PR cycle, teams find ways to skip them.
Environment: a dedicated staging environment with production-mirroring data and configuration. Not a shared environment used by manual QA — test runs that truncate tables or create synthetic orders will interfere with exploratory testing.
Flakiness policy: a flaky E2E test gets three strikes. If it fails intermittently three times without a code change, it is quarantined for investigation. Flaky E2E tests erode team trust faster than any other test type — once a team learns to ignore red E2E runs, the entire suite loses its value as a signal.

DataVisualVisual coming soon

⚠️ Common mistakes

Writing E2E tests as the primary regression suite. If your first instinct when adding a feature is to write an E2E test, you are probably testing at the wrong layer. Ask: could a component test or contract test catch this same bug? If yes, write that instead. E2E tests should be the last line of defence, not the first.
Setting up test state through the UI. Using Playwright to navigate to the registration page and fill in the sign-up form before every test that needs a logged-in user is slow and fragile. Create users and seed data via API calls in test setup, then skip straight to the scenario you are actually testing. The test setup should take milliseconds, not minutes.
Running E2E tests on every pull request. Teams that do this quickly abandon them when the PR feedback cycle stretches to 45 minutes. Move E2E tests to a nightly schedule and trust the lower layers — component tests, contract tests, integration tests — for PR feedback. Reserve E2E tests for catching what those layers cannot.

🎯 Practice task

Audit your current E2E test suite, or design one from scratch if you do not have one. List every E2E test. For each one, ask: could a contract test, component test, or integration test catch the same bug more efficiently? Mark the duplicates and count how many could be deleted or moved to a lower layer.
Identify the three most business-critical user journeys in your system — the flows where a failure would cause the most user impact or revenue loss. These are your E2E test candidates. Everything else is a candidate for deletion or replacement.
Write one E2E test for your most critical journey using the Playwright pattern shown above. Use data-testid attributes for all selectors. Seed precondition data via API calls in the test setup, not via UI navigation.
Run the test three times and record how long each run takes. Is the variance between runs more than 20%? If yes, identify which step is introducing the inconsistency — it is usually an async operation that needs a higher timeout or a more explicit wait condition.
Add an API-level assertion at the end of the test, similar to getOrderStatus in the example. Run the test without it first, then with it. What does the API assertion catch that the UI assertion alone would not? What scenario would produce a green UI assertion but a failing API assertion?

The next lesson covers service virtualisation tools — the broader toolkit for simulating dependencies that goes beyond simple HTTP stubs.