Q41 of 42 · Playwright
How would you set quality bars and SLA targets for a Playwright suite owned by a QA team?
Short answer
Short answer: Track flake rate (<1%), suite duration (<30 min for full, <10 min for smoke), escape rate (<10% of prod bugs), and critical-journey coverage (100%). Publish weekly. Set 7-day quarantine SLA for flakes. Tie investments to numbers — e.g., 'cycle time over 15 min triggers a sprint goal to investigate.'
Detail
Without explicit bars, a Playwright team measures success by "we ran tests" — which says nothing about quality. Explicit bars create accountability and a shared definition of "the suite is healthy."
The four bars I'd set:
1. Flake rate. % of test runs where a test fails on first attempt and passes on retry. Target <1% per spec. Compute weekly. A spec exceeding 1% goes into quarantine within 7 days. Playwright's HTML report tracks this per-test; aggregate via JUnit.
2. Suite duration. Wall time from PR push to all-shards-green. Targets:
- Smoke (every PR): <10 minutes.
- Full regression (PR-to-main): <30 minutes.
- Cross-browser nightly: <60 minutes.
Track p50 and p95. A 20% regression in p95 triggers investigation in the next sprint.
3. Escape rate. Production bugs that the automation should have caught, divided by total prod bugs. Target <10%. Each escape gets a "missing test" ticket; closing the ticket adds the test that would have caught it.
4. Critical-journey coverage. % of the team's defined critical user journeys (typically 10-15) that have working automated coverage. Target: 100%. Drift below triggers a fix, not new feature work.
Process around the bars:
- Weekly health post with the four numbers, trend arrows, quarantined tests, and any escape root causes from the past week. Visible to the whole eng org.
- 7-day quarantine SLA. Every quarantined test has a named owner and a deadline; past 7 days, the owner is paged or the test is deleted.
- PR gates. Smoke must pass on PR. Full regression must pass on PR-to-main. Skipping a test (
test.skip) requires lead approval. - Quarterly retros. Bars reviewed; numbers comfortably hit get tightened, numbers not improving get explicit attention.
Cultural framing:
- Publish failure modes, not just success. "We had 3 escapes last quarter; here's what we added" builds credibility.
- Reward fixing flakes, not just shipping tests. A flaky test that catches everything is worse than a stable one with narrower coverage.
- Cost transparency. Show the CI bill for the suite; tie investment to value (escapes prevented, hours saved). Leadership cares about ROI.
Avoid the lead trap of too many metrics. Four numbers everyone knows beats twenty nobody looks at.
// WHAT INTERVIEWERS LOOK FOR
// COMMON PITFALL
// Related questions
How would you justify the choice of Playwright over Cypress to a director skeptical of changing tools?
Playwright
How do you set quality bars and metrics for an automation team owning Cypress?
Cypress
How do you set quality bars and metrics that actually motivate a team without burning them out?
Behavioural