TEST DESIGN

Risk-Based Testing.

Risk-Based A test strategy that allocates testing effort in proportion to risk — concentrating the deepest testing on areas with the highest probability of failure and the greatest impact if they fail.

6
steps
4
pitfalls
8 min
read
Risk-BasedRBTintermediate

What it is

Risk-Based Testing is a strategy for prioritising test effort when there is never enough time to test everything exhaustively. Instead of testing all features with equal depth, RBT scores each feature area or test item on two dimensions: Likelihood (how probable is a failure here?) and Impact (how severe would that failure be?). The product of the two scores produces a Risk Score that determines test depth. High-scoring areas get exhaustive testing and exploratory sessions; low-scoring areas may receive only a smoke test or be explicitly deferred. RBT makes the implicit trade-off visible — teams can show stakeholders exactly why testing was concentrated in specific areas, and which risks were consciously accepted.

When to use it

When to use

  • Sprint or release planning: when 20+ features must be tested in a fixed time window and you need a defensible prioritisation
  • Regression scope selection: to decide which areas get full regression vs sanity-check vs skip
  • New code risk assessment: new or recently modified code has higher Likelihood — RBT captures this explicitly
  • Stakeholder communication: the heat-grid is a visual argument for why QA effort was concentrated in specific areas
  • Test automation prioritisation: use risk scores to decide which paths to automate first

When NOT to use

  • Safety-critical or regulated systems that require exhaustive coverage by mandate — RBT is a risk-reduction strategy, not a substitute for statutory completeness requirements
  • When risk scores are purely subjective and no domain experts are available to validate them — garbage-in garbage-out; the matrix is only as good as the scoring input
  • As the only test strategy: RBT tells you WHERE to test deeply, not WHAT to test — it must be combined with other techniques (BVA, decision tables, exploratory) for actual test design

How it works

Score each item on two 1–3 scales: Likelihood (1 = rare, 2 = possible, 3 = likely) and Impact (1 = cosmetic, 2 = degraded UX, 3 = data loss/security/revenue). Multiply to get a Risk Score (1–9). Place each item in the heat-grid and apply the depth rule for its cell. The worked example below uses five e-commerce features.

E-commerce release — 5 features scored for Likelihood × Impact

FeatureLikelihoodImpactScoreRisk levelTest depthResult
Payment processing236HighFull regression + negative pathsReject
Auth / login236HighFull regression + security checksReject
Cart & checkout224MediumHappy path + boundary valuesAccept
Search122LowSmoke test onlyAccept
Static content111LowSmoke test onlyAccept

Verdict column here indicates focus (reject = full test attention; accept = lighter coverage) — adapt labelling to your team's vocabulary.

Step by step

  1. List all features, test areas, or user stories in scope

    Work from the sprint board, release notes, or change log. Include both changed and unchanged areas — unchanged areas can still fail when dependencies change.

  2. Score Likelihood (1–3) for each item

    1 = this area rarely fails (stable, well-tested, no recent changes). 2 = failures are plausible (recent changes, complex logic, integration points). 3 = failures are likely (new feature, known fragility, third-party dependency, DST/timezone logic, financial calculation).

  3. Score Impact (1–3) for each item

    1 = cosmetic or minor UX issue, user can work around it. 2 = degraded experience, reduced functionality, partial data issue. 3 = data loss, security vulnerability, revenue impact, complete feature failure, regulatory risk.

  4. Calculate Risk Score and place on the heat-grid

    Score = Likelihood × Impact. Place each item in the corresponding cell of the 3×3 grid. Score 9 (Critical, 3×3) and score 6 (High, 2×3 or 3×2) items go to the top of the test queue. Score ≤2 (Low) items receive only smoke coverage.

  5. Assign test depth to each risk level

    Critical (9): exhaustive testing + exploratory session + regression. High (6): full functional coverage + negative paths + security-relevant checks. Medium (3–4): happy path + key boundary values + one negative path. Low (1–2): smoke test — does it launch, does the core path work.

  6. Review with the team and document consciously accepted risks

    Present the matrix to developers, product, and stakeholders. Explicitly call out any Low items that are being consciously deprioritised. This converts an implicit decision ('we didn't get to it') into an explicit risk acceptance.

Pitfalls & what it misses

Impact scoring without domain input

A QA engineer rating payment processing at Impact=1 because 'bugs are always caught before production' misunderstands the scale. Impact measures business consequence IF it escapes to production, not probability of escape. Always calibrate Impact ratings with a product owner or domain expert.

Treating Low-risk items as zero-test

Low risk means LOW test effort, not NO test effort. A static content page with L=1/I=1 still needs a smoke test — broken links and missing images are real defects even if the score is low.

RBT as a one-time activity

Risk profiles change with every sprint. A feature that was Low-risk last sprint may become High-risk after a refactor. Re-score at the start of each release cycle, not once per year.

Using RBT to justify skipping mandatory test types

In regulated environments (healthcare, finance, aviation), some tests are legally required regardless of risk score. RBT does not override regulatory test obligations — it supplements them by determining depth within the mandated coverage.

Paired utility

// Related resources