Q41 of 42 · Playwright

How would you set quality bars and SLA targets for a Playwright suite owned by a QA team?

PlaywrightLeadplaywrightmetricsleadershipquality-barslead

Short answer

Short answer: Track flake rate (<1%), suite duration (<30 min for full, <10 min for smoke), escape rate (<10% of prod bugs), and critical-journey coverage (100%). Publish weekly. Set 7-day quarantine SLA for flakes. Tie investments to numbers — e.g., 'cycle time over 15 min triggers a sprint goal to investigate.'

Detail

Without explicit bars, a Playwright team measures success by "we ran tests" — which says nothing about quality. Explicit bars create accountability and a shared definition of "the suite is healthy."

The four bars I'd set:

1. Flake rate. % of test runs where a test fails on first attempt and passes on retry. Target <1% per spec. Compute weekly. A spec exceeding 1% goes into quarantine within 7 days. Playwright's HTML report tracks this per-test; aggregate via JUnit.

2. Suite duration. Wall time from PR push to all-shards-green. Targets:

  • Smoke (every PR): <10 minutes.
  • Full regression (PR-to-main): <30 minutes.
  • Cross-browser nightly: <60 minutes.

Track p50 and p95. A 20% regression in p95 triggers investigation in the next sprint.

3. Escape rate. Production bugs that the automation should have caught, divided by total prod bugs. Target <10%. Each escape gets a "missing test" ticket; closing the ticket adds the test that would have caught it.

4. Critical-journey coverage. % of the team's defined critical user journeys (typically 10-15) that have working automated coverage. Target: 100%. Drift below triggers a fix, not new feature work.

Process around the bars:

  • Weekly health post with the four numbers, trend arrows, quarantined tests, and any escape root causes from the past week. Visible to the whole eng org.
  • 7-day quarantine SLA. Every quarantined test has a named owner and a deadline; past 7 days, the owner is paged or the test is deleted.
  • PR gates. Smoke must pass on PR. Full regression must pass on PR-to-main. Skipping a test (test.skip) requires lead approval.
  • Quarterly retros. Bars reviewed; numbers comfortably hit get tightened, numbers not improving get explicit attention.

Cultural framing:

  • Publish failure modes, not just success. "We had 3 escapes last quarter; here's what we added" builds credibility.
  • Reward fixing flakes, not just shipping tests. A flaky test that catches everything is worse than a stable one with narrower coverage.
  • Cost transparency. Show the CI bill for the suite; tie investment to value (escapes prevented, hours saved). Leadership cares about ROI.

Avoid the lead trap of too many metrics. Four numbers everyone knows beats twenty nobody looks at.

// WHAT INTERVIEWERS LOOK FOR

Specific numerical targets, publication / accountability mechanisms, escape rate as the outcome metric, and the discipline of culturally framing flake fixes alongside new tests.

// COMMON PITFALL

Setting only 'all tests pass' as the bar — a binary that says nothing about flake or escape.