Q32 of 37 · Selenium

How do you reduce flakiness in a 500+ test Selenium suite?

SeleniumSeniorseleniumflakinesstest-stabilityprocesssenior

Short answer

Short answer: Measure first (per-test rerun rate), categorise by cause (waits, data, environment, app bugs), then fix in priority: explicit waits everywhere, test-data isolation per test, retry only as a last resort. Quarantine the worst 5% so they don't tank trust while you fix them.

Detail

Flake at 500+ tests is rarely one cause — it's a long tail. The mistake teams make is reaching for retry-on-failure as a fix; that hides flakes rather than removing them, and slowly accumulates until the suite is meaningless.

Step 1 — measure. Wire each test's pass/fail into a database (or use Allure history, or your CI's test result store). For each test, compute: pass rate, median runtime, p95 runtime, rerun-required count. The top 5% of flaky tests usually account for 80% of suite failures.

Step 2 — categorise. For the top-20 flakiest, classify by cause:

  • Waiting — implicit waits, missing explicit waits, sleeps, race after click.
  • Data — tests share users, tenants, or seed data.
  • Environment — CI runner load, network jitter, container memory.
  • App bugs — real timing issues that should fail (rare but real).

Step 3 — fix in priority:

  1. Standardise on explicit waits. Remove Thread.sleep and implicitlyWait. A reusable wait helper (Waits.untilClickable(by)) makes the pattern uniform.

  2. Isolate test data. Each test creates its own user / tenant / records via API calls in @BeforeMethod. Cleanup via @AfterMethod or background reaper. Shared seed data is the #1 cause of mid-flake at scale.

  3. Stable selectors. Replace generated-id locators with data-test attributes. A coordinated push with the dev team — not a one-time fix.

  4. CI-resource sizing. If runners are at 90% memory, browsers thrash. Bump runner specs or reduce parallel count.

  5. Quarantine the worst 5%. Move them to a "flaky" group that doesn't gate the build. Track them as a debt list with weekly reviews. Better than retrying or ignoring.

Step 4 — set a budget. "We will accept a 0.5% per-test flake rate; anything above gets quarantined." Codify this and stick to it. Without a number, flake quietly grows back.

Step 5 — retry as a last resort, scoped narrowly. TestNG IRetryAnalyzer retrying once is acceptable for known-flaky network-dependent tests; retrying 3 times for everything is the failure mode that kills suite trust.

The cultural piece: publish the flake report weekly. Visibility is what keeps it from drifting back up.

// WHAT INTERVIEWERS LOOK FOR

Data-driven approach (measure, categorise, prioritise), explicit waits + test-data isolation as the high-leverage fixes, quarantine over retry, and a numerical flake budget.

// COMMON PITFALL

Retrying every failing test 3 times. Pass rate looks great, but real bugs hide in the retries — and the suite quietly stops being a quality signal.