Guided Walkthrough — From Bug Report to Generated Test

This lesson walks one complete bug-to-test cycle as a worked example — the single highest-ROI workflow in the course. The bug is a real-feeling vague ticket; the assistant reproduces it, the AI emits a regression test, you review and commit. Total elapsed time: roughly fifteen minutes for a flow that traditionally consumes one to two hours of triage and authoring. Read this with your own real bug from the previous lesson's preparation in mind — the example here is a stand-in for whatever's actually sitting in your tracker.

The seven parts below are deliberately granular. Each is a real moment in the loop where it would have been tempting to cut a corner and pay for it later. Following the structure produces an artefact you can paste into a code review and have it taken seriously.

Part 1 — The bug report (vague)

The ticket lands in the queue:

PETMART-2049 Title: Cart total is wrong sometimes Reporter: support@petmart Description: "A customer reports the cart total is wrong sometimes when they have multiple of the same item. Couldn't repro myself. Could be related to the SAVE15 promo. Help?"

That's everything. No steps, no environment, no exact totals. "Cannot reproduce, please add steps" is exactly the loop AI triage is designed to break.

Part 2 — Hand it to the assistant

Open Claude Desktop. Paste:

A customer reported this bug on PetMart staging:
 
"Cart total is wrong sometimes when I have multiple of the same item.
Possibly related to SAVE15 promo."
 
Investigate on https://demo.petmart.com using the test account
qa-mcp-01@petmart.test / Demo123!. Reproduce or rule out the bug. Try:
 
- Adding the same product 2, 5, 10 times
- Adding via the "+" button vs typing the quantity directly
- Different products at different price points (a $5 toy, a $25 collar, a $80 bag)
- Mixing same-product-multiples with one-off items in the same cart
- With and without the SAVE15 coupon applied
 
For every cart shape, capture the displayed subtotal, discount, and total.
Then propose a hypothesis for the trigger condition and capture a Playwright
trace of one reproducing run.
 
Format the result as a triable bug report (title, steps, expected, actual,
environment, attachments).

A few things made that prompt good: a specific test account (no production creds), an explicit variation grid, "reproduce or rule out" (so a "can't repro" finding is also a valid outcome), and a structured output format. Specificity in is specificity out.

Part 3 — Claude's findings

Eight minutes of tool calls later, the assistant returns:

Reproduces with 5+ of the same item added via the "+" button and the SAVE15 coupon applied. Subtotal is computed correctly; the discount line is computed against (quantity - 1) × price instead of quantity × price. Final total under-discounts and, for sufficiently large carts, displays a negative number on screen.

Cart shapes that reproduced:

5× $5 Squeaky Mouse + SAVE15 → displayed total: $4 (expected $21.25)

10× $5 Squeaky Mouse + SAVE15 → displayed total: $-3 (expected $42.50)

Cart shapes that didn't:

2× same-item via "+" + SAVE15 → discount appears correct (off-by-one rounds away)

5× same-item without SAVE15 → no discount applied, subtotal correct

5× same-item via typed-quantity + SAVE15 → discount correct (different code path)

Hypothesis: the "+" button increments a quantity counter that the discount calculation reads from, but the calculation runs before the increment is committed — so it always discounts against quantity - 1. The typed-quantity path commits before the discount runs and is unaffected.

Trace saved to /tmp/playwright-traces/petmart-2049-repro.zip. Screenshot saved to /tmp/playwright-screenshots/petmart-2049-cart.png.

That hypothesis is gold — it's not just "the bug reproduces", it's "the bug fires on this code path and not that one". The developer can go straight to the increment handler.

Part 4 — Generate the regression test

The assistant has an open MCP session, the staging app, and the verified repro. The next prompt:

Generate a Playwright regression test that reproduces this bug deterministically.
Should fail until the underlying calculation is fixed, and pass once it is.
 
Conventions:
- File: tests/regression/cart-quantity-discount.spec.ts
- Use the existing CartPage POM at tests/pages/CartPage.ts (pasted below)
- Use the formatPrice helper from tests/utils/format.ts
- Use the authenticatedPage fixture from tests/fixtures/auth.ts
- Test data: read SAVE15 from tests/fixtures/coupons.ts (already exists)
- Use getByRole / getByLabel locators; no CSS or XPath
- No waitForTimeout — use web-first assertions
- Include one happy-path control test (typed-quantity, same coupon, correct total)
  and one bug-reproducing test (clicked-quantity, same coupon, currently wrong total)
 
[paste tests/pages/CartPage.ts here]

The pasted POM file is non-negotiable. Without it the assistant invents method names; with it, the generated test reads like the team wrote it.

Part 5 — Claude's generated test

import { test, expect } from '../../fixtures/auth';
import { CartPage } from '../../pages/CartPage';
import { formatPrice } from '../../utils/format';
import { COUPONS } from '../../fixtures/coupons';
 
const SQUEAKY_MOUSE = { sku: 'TOY-001', name: 'Squeaky Mouse', price: 5.00 };
 
test.describe('Cart quantity × coupon discount', () => {
  test('typed-quantity path — discount is correct', async ({ authenticatedPage: page }) => {
    const cart = new CartPage(page);
    await cart.addItemBySku(SQUEAKY_MOUSE.sku);
    await cart.setQuantityByTyping(SQUEAKY_MOUSE.sku, 5);
    await cart.applyCoupon(COUPONS.SAVE15);
 
    const subtotal = SQUEAKY_MOUSE.price * 5;
    const expectedTotal = subtotal * (1 - 0.15);
 
    await expect(cart.subtotalDisplay).toHaveText(formatPrice(subtotal));
    await expect(cart.totalDisplay).toHaveText(formatPrice(expectedTotal));
  });
 
  test('clicked-quantity path — discount must match (regression for PETMART-2049)', async ({ authenticatedPage: page }) => {
    const cart = new CartPage(page);
    await cart.addItemBySku(SQUEAKY_MOUSE.sku);
    for (let i = 0; i < 4; i += 1) {
      await cart.incrementQuantityByButton(SQUEAKY_MOUSE.sku);
    }
    await cart.applyCoupon(COUPONS.SAVE15);
 
    const subtotal = SQUEAKY_MOUSE.price * 5;
    const expectedTotal = subtotal * (1 - 0.15);
 
    await expect(cart.subtotalDisplay).toHaveText(formatPrice(subtotal));
    await expect(cart.totalDisplay).toHaveText(formatPrice(expectedTotal));
    await expect(cart.totalDisplay).not.toContainText('-');
  });
});

Two tests, one control and one regression, sharing the same fixture and POM. The negative-total guard at the end is a small touch the assistant added on its own — exactly the kind of "this could regress in a new way" assertion humans often forget.

Part 6 — Review and integrate

Don't merge yet. Walk it through the same checklist you use for any AI-generated diff:

Selectors stable? All locators come through CartPage, which uses getByRole-based methods. Yes.
Assertions meaningful? Subtotal, total, and a no negative guard. Yes — and the negative guard explicitly tests the surface symptom of the bug.
Test data isolated? SKU is hard-coded but references seeded fixture data; the cart is fresh per test via the auth fixture. Yes.
Naming consistent? Test names describe behaviour, not implementation. The (regression for PETMART-2049) annotation matches the team's existing convention. Yes.
Self-cleaning? The authenticatedPage fixture handles cookie reset between tests. Yes.

Run locally:

npx playwright test tests/regression/cart-quantity-discount.spec.ts --headed

Outcomes you want to see:

The control test (typed quantity) passes. This proves the test infrastructure works against the live app.
The regression test (clicked quantity) fails, with an assertion error showing the under-discounted total. This confirms the test is actually testing the bug.

If both pass, the test isn't capturing the bug — re-prompt the assistant with the exact failure to push for a tighter assertion. If both fail, the control assumptions are wrong — debug those first.

Part 7 — Submit, fix, close the loop

Open a PR with the generated test, the trace path, and a description that names the workflow:

PR: Add regression test for PETMART-2049 (cart quantity × coupon discount)

Generated via Playwright MCP from the support ticket. The clicked-quantity test currently fails (reproduces the reported bug); the typed-quantity test passes (control). Both will pass once the increment handler is fixed (likely in src/cart/quantity.ts).

Trace: attached. Hypothesis: discount calculation reads quantity - 1 because it runs before the click increment is committed.

Reviewed for: locator stability, assertion strength, fixture cleanliness. AI-assisted; full diff reviewed line-by-line.

The reviewer accepts on green-control / red-regression. The bug ticket links to the failing CI run as evidence. The developer fixes the underlying bug; the CI now turns the regression test from red to green; both PRs land. The bug is closed and the suite has a permanent guard against the same regression returning.

Where the time actually went

Step 1 of 6

Triage prompt + AI repro

≈ 8 min — one prompt with the variation grid; the assistant runs the matrix and writes the structured report.

What this changes about the bug-triage meeting

If even half of the "cannot reproduce" tickets in your backlog can be resolved this way, the daily triage meeting changes shape. Instead of "please add steps and we'll look at it next sprint", it's "let's run the prompt now". The user gets a real answer the same day, the developer gets a regression test attached, and the QA engineer's time goes into harder bugs and new feature coverage instead of the vague-ticket loop.

The compounding effect is the real story. Every cycle adds one regression test to the suite. After thirty cycles, thirty real bugs from real users have permanent guards. The suite stops being a snapshot of what someone thought to test and starts being a record of every regression the product has ever had. That is the long-term shape of AI-augmented QA.

Reference: where each step came from

Reproduction prompt — Chapter 4, lesson 2 (bug reproduction).
Variation grid — same lesson; the "vary across cart sizes and price points" sentence is what turns a yes/no answer into a precise repro.
Test generation prompt — Chapter 3, lesson 1 (generating tests from natural language).
POM-paste-as-style-anchor — Chapter 3, lesson 3 (reverse-engineering POMs).
Review checklist — Chapter 5, lesson 1 (combining with existing suites).
Security defaults — Chapter 5, lesson 3 (test account, staging only).

You're not learning anything new in this lesson; you're seeing every previous lesson play together in a single workflow. That composition is what makes the capstone the capstone.

What to do with this for your real bug

Take the vague ticket you wrote down at the end of the previous lesson and run this exact loop against it. Use your team's real conventions, your real test account, your real Page Objects, your real PR template. The bug you pick should be:

Vague enough that the support team has been unable to reproduce.
Stable in staging (i.e., not a transient outage from yesterday).
Not in a sensitive area — keep your first run on something low-risk.

When the run lands a merged regression test, you've completed the central exercise of the capstone. The next lesson reflects on what worked, what didn't, and where to take the skill next.