Back to Test metrics
Test

Test flakiness rate

The percentage of test runs where the same test produces different results without a code change — a test reliability metric.

testflakinessreliability

// Formula

flaky test runs÷total runs×100%

// About this metric

A flaky test is one that passes and fails non-deterministically for the same code. Flakiness erodes trust in the test suite: once engineers learn that a failing test might just be flaky, they start ignoring failures — and real regressions get missed.

Flakiness rate measures the proportion of test runs that produce a flaky result (same test, different outcome, no code change between runs). It is calculated by: flaky test runs divided by total test runs, expressed as a percentage.

Google's engineering practices documentation identifies flakiness above 3% as the threshold where engineers begin treating test results as untrustworthy. Below 1% is considered healthy; 1–3% is a watch zone requiring active remediation.

Common causes of flakiness: timing dependencies (tests that rely on sleep() rather than explicit waits), shared state between tests (one test's side effects breaking another), environment instability (test data, network calls, date/time dependencies), and race conditions in async code. Most flakiness can be eliminated with disciplined test isolation and explicit synchronisation patterns.

// Calculator

🧮 Calculator

Same test, different result without code change

Your flakiness rate0.3%

// Benchmark

You're in the 'Good' range — 0.300 %.

Source: Google Engineering Practices; industry consensus

Above 3% flakiness, engineers learn to ignore failures — that's when flakiness causes real bugs to ship.

// When to use this metric

Monitor flakiness rate per test and per suite. The most actionable version identifies the specific tests responsible for most of the flakiness — typically a small number of tests ("the flaky top 10") drive most of the suite unreliability.

Track flakiness rate separately from pass rate so that flaky failures don't contaminate your suite health signal. Some CI systems flag tests as "suspected flaky" based on re-run behaviour and exclude them from the pass rate calculation.

// Common pitfall

Above 3% flakiness, engineers learn to ignore failures — that's when flakiness causes real bugs to ship. The danger is not the flaky tests themselves, but the cultural shift where "the build failed" stops meaning anything. Once engineers start rerunning builds until they pass without investigating the failure, your entire test suite has effectively become optional. Treat flakiness above 1% as a first-class bug, not an inconvenience.