Q32 of 37 · API testing

How do you reproduce and write a regression test for a production-only race condition?

API testingSeniorapirace-conditionsconcurrencyregressionsenior

Short answer: Reproduce locally with controlled concurrency: identical concurrent requests, fast iteration. If it won't reproduce locally, instrument to capture the production timing, then reproduce in a staging environment under matching load. The regression test runs N parallel calls and asserts on the invariant that the bug violated.

Race conditions are the classic "works on my machine" bug — they need concurrency to manifest, and tests usually run sequentially. The discipline:

Step 1 — Understand the invariant the bug violated. The bug is "two users created with the same email." The invariant: "email is unique across users." That's what the regression test asserts.

Step 2 — Reproduce locally. Try the simplest first:

const calls = Array.from({ length: 50 }, () =>
  request.post('/users', { data: { email: 'race@test.com' } })
);
const results = await Promise.all(calls);
const successes = results.filter((r) => r.status() === 201);

50 concurrent identical requests; if more than one succeeds, the race is reproduced.

If it doesn't reproduce, increase pressure:

More concurrent requests (200, 1000).
Run inside the database's slowest operation (a deliberately heavy query in another transaction holding locks).
Reduce CPU available (taskset to one core).

Step 3 — Add timing precision. If it still won't reproduce, the race needs sub-millisecond alignment. Tools:

Network-level timing — tcpdump traces of production requests, then replay.
Application-level instrumentation — add temporary timing logs around the suspected critical section.
Chaos / latency injection — Toxiproxy adds artificial delay to slow the suspected path, widening the race window.

Step 4 — Reproduce in staging. Match production conditions: same database engine and version, similar concurrency profile. Run the test under load (k6, locust) against staging.

Step 5 — Write the regression test. Once you've reproduced, the test should:

Use the same concurrency pattern that triggers the bug.
Run N times in CI (10-100) — races aren't 100% reproducible.
Assert on the invariant, not the symptom.

test.describe.parallel('race regression', () => {
  for (let attempt = 0; attempt < 20; attempt++) {
    test(`attempt ${attempt}: concurrent creates produce one user`, async ({ request }) => {
      const email = `race-${attempt}-${Date.now()}@test`;
      const calls = Array.from({ length: 25 }, () =>
        request.post('/users', { data: { email } })
      );
      const results = await Promise.all(calls);
      const ok = results.filter((r) => r.status() === 201);
      expect(ok.length).toBe(1);

      const list = await (await request.get(`/users?email=${email}`)).json();
      expect(list.data.length).toBe(1);
    });
  }
});

Step 6 — Fix verification. Run the regression test against the fix. It must pass 50+ times consecutively without flake. Races aren't fixed if the test is "usually" green.

Long-term:

Retain the regression test forever — races love to come back as code changes.
Tag it (@race, @concurrency) so future engineers know it's load-sensitive.
Don't let it become flake-quarantined. If it fails intermittently, the underlying fix didn't hold; investigate.

The senior signal: invariant-first thinking, structured reproduction (local → instrument → staging-under-load), and refusing to call the bug fixed without a stable regression test.

// WHAT INTERVIEWERS LOOK FOR

Invariant-based assertion (not just 'no error'), reproduction techniques in priority order, retention of the race test, and refusal to accept 'works most of the time' as fixed.

// COMMON PITFALL

Patching the symptom (return 409 sometimes) without the regression test, and patting yourself on the back when the bug 'doesn't reproduce' for a week. Races recur; the test is the receipt.

// Related questions

How do you design tests for race conditions in a concurrent system?

Test design

How do you test idempotency keys (e.g. Stripe-style) in payment APIs?

API testing

Walk through how you'd test eventual consistency in a distributed system.

API testing

How do you reproduce and write a regression test for a production-only race condition?

Short answer

Detail

// WHAT INTERVIEWERS LOOK FOR

// COMMON PITFALL