TEST DESIGN
Risk-Based Testing.
Risk-Based A test strategy that allocates testing effort in proportion to risk — concentrating the deepest testing on areas with the highest probability of failure and the greatest impact if they fail.
What it is
Risk-Based Testing is a strategy for prioritising test effort when there is never enough time to test everything exhaustively. Instead of testing all features with equal depth, RBT scores each feature area or test item on two dimensions: Likelihood (how probable is a failure here?) and Impact (how severe would that failure be?). The product of the two scores produces a Risk Score that determines test depth. High-scoring areas get exhaustive testing and exploratory sessions; low-scoring areas may receive only a smoke test or be explicitly deferred. RBT makes the implicit trade-off visible — teams can show stakeholders exactly why testing was concentrated in specific areas, and which risks were consciously accepted.
When to use it
When to use
- Sprint or release planning: when 20+ features must be tested in a fixed time window and you need a defensible prioritisation
- Regression scope selection: to decide which areas get full regression vs sanity-check vs skip
- New code risk assessment: new or recently modified code has higher Likelihood — RBT captures this explicitly
- Stakeholder communication: the heat-grid is a visual argument for why QA effort was concentrated in specific areas
- Test automation prioritisation: use risk scores to decide which paths to automate first
When NOT to use
- Safety-critical or regulated systems that require exhaustive coverage by mandate — RBT is a risk-reduction strategy, not a substitute for statutory completeness requirements
- When risk scores are purely subjective and no domain experts are available to validate them — garbage-in garbage-out; the matrix is only as good as the scoring input
- As the only test strategy: RBT tells you WHERE to test deeply, not WHAT to test — it must be combined with other techniques (BVA, decision tables, exploratory) for actual test design
How it works
Score each item on two 1–3 scales: Likelihood (1 = rare, 2 = possible, 3 = likely) and Impact (1 = cosmetic, 2 = degraded UX, 3 = data loss/security/revenue). Multiply to get a Risk Score (1–9). Place each item in the heat-grid and apply the depth rule for its cell. The worked example below uses five e-commerce features.
E-commerce release — 5 features scored for Likelihood × Impact
| Feature | Likelihood | Impact | Score | Risk level | Test depth | Result |
|---|---|---|---|---|---|---|
| Payment processing | 2 | 3 | 6 | High | Full regression + negative paths | Reject |
| Auth / login | 2 | 3 | 6 | High | Full regression + security checks | Reject |
| Cart & checkout | 2 | 2 | 4 | Medium | Happy path + boundary values | Accept |
| Search | 1 | 2 | 2 | Low | Smoke test only | Accept |
| Static content | 1 | 1 | 1 | Low | Smoke test only | Accept |
⚠ Verdict column here indicates focus (reject = full test attention; accept = lighter coverage) — adapt labelling to your team's vocabulary.
Step by step
List all features, test areas, or user stories in scope
Work from the sprint board, release notes, or change log. Include both changed and unchanged areas — unchanged areas can still fail when dependencies change.
Score Likelihood (1–3) for each item
1 = this area rarely fails (stable, well-tested, no recent changes). 2 = failures are plausible (recent changes, complex logic, integration points). 3 = failures are likely (new feature, known fragility, third-party dependency, DST/timezone logic, financial calculation).
Score Impact (1–3) for each item
1 = cosmetic or minor UX issue, user can work around it. 2 = degraded experience, reduced functionality, partial data issue. 3 = data loss, security vulnerability, revenue impact, complete feature failure, regulatory risk.
Calculate Risk Score and place on the heat-grid
Score = Likelihood × Impact. Place each item in the corresponding cell of the 3×3 grid. Score 9 (Critical, 3×3) and score 6 (High, 2×3 or 3×2) items go to the top of the test queue. Score ≤2 (Low) items receive only smoke coverage.
Assign test depth to each risk level
Critical (9): exhaustive testing + exploratory session + regression. High (6): full functional coverage + negative paths + security-relevant checks. Medium (3–4): happy path + key boundary values + one negative path. Low (1–2): smoke test — does it launch, does the core path work.
Review with the team and document consciously accepted risks
Present the matrix to developers, product, and stakeholders. Explicitly call out any Low items that are being consciously deprioritised. This converts an implicit decision ('we didn't get to it') into an explicit risk acceptance.
Pitfalls & what it misses
Impact scoring without domain input
A QA engineer rating payment processing at Impact=1 because 'bugs are always caught before production' misunderstands the scale. Impact measures business consequence IF it escapes to production, not probability of escape. Always calibrate Impact ratings with a product owner or domain expert.
Treating Low-risk items as zero-test
Low risk means LOW test effort, not NO test effort. A static content page with L=1/I=1 still needs a smoke test — broken links and missing images are real defects even if the score is low.
RBT as a one-time activity
Risk profiles change with every sprint. A feature that was Low-risk last sprint may become High-risk after a refactor. Re-score at the start of each release cycle, not once per year.
Using RBT to justify skipping mandatory test types
In regulated environments (healthcare, finance, aviation), some tests are legally required regardless of risk score. RBT does not override regulatory test obligations — it supplements them by determining depth within the mandated coverage.
Paired utility
// Related resources