Prompt patterns for test authoring

10 min read · Reviewed May 2026 · prompting

Prompts are test code. A throwaway prompt written once in a chat window produces a throwaway test that degrades the moment the feature changes. A version-controlled prompt pattern, refined over six months of daily use and shared across the team, produces a test suite that ages the way the prompt library ages — systematically. Six pattern categories have proved their value: set-up, role, constraints, output shape, examples, and anti-examples. The last category is the one most teams skip and the highest-return investment of the six.

READ TIME10 min

DIFFICULTYintermediate

REVIEWEDMay 2026

YOU'LL LEARNSix prompt-pattern categories proven to produce maintainable test code, with real examples of what works and what doesn't.

Six categories of prompt patterns

Not a laundry list — a covering set for the decisions an agent makes on your behalf every time it authors a test.

The six categories cover the decisions an AI agent makes when authoring a test: what context to operate in, what role to assume, what to avoid, what format to produce, what good looks like, and what bad looks like. Give the agent all six and the output quality is measurably higher than giving it a scenario description alone — not because the model is smarter, but because the ambiguity that generates low-quality output has been resolved before the model starts.

Most teams begin with scenario description only — "write a test for the login flow" — and then iterate by correcting the output. That iteration loop is a prompt pattern you are not capturing. Moving it from reactive correction into a proactive prompt library replaces one-off fixes with systematic quality.

Six prompt-pattern categories for maintainable test authoring

Set-up and role patterns

Most prompt failures come from the agent not knowing what "good" looks like in your project — set-up and role patterns fix that.

The set-up pattern establishes project context before any test scenario is described. It tells the agent which framework is in use, what the naming conventions are, where test files live, and what the page-object structure looks like. Without this context, the agent guesses — and its guesses reflect training data from the broader ecosystem, not your codebase.

The role pattern layers on behavioural expectations: the agent is acting as a senior SDET who prioritises maintainability over coverage speed, optimises for selector resilience, and reviews every assertion for specificity before marking the test complete. This is not cosmetic framing. It shifts the agent's optimisation target from "produces valid test code" to "produces test code worth keeping".

A combined set-up + role prompt prefix is the highest-return starting point for a team prompt library. Write it once, version-control it in a `.prompts/` directory alongside the test suite, and every engineer on the team starts from the same baseline — including new joiners on their first day.

You are a senior SDET working on a TypeScript + Playwright project.

Project conventions:
- Tests live in tests/e2e/, organised by feature area
- Page objects live in tests/e2e/pages/, one file per page or domain
- Use explicit waits only (waitForSelector, expect().toBeVisible()) — never waitForTimeout
- Selector preference: getByRole > getByTestId > getByLabel > getByText
- Assertions must be specific enough that a real defect would cause them to fail
- Describe blocks use feature names; test names describe the expected outcome

The test framework is Playwright 1.45+ with @playwright/test runner.
Existing test for reference: [paste one representative test here]

Now write a test for the following scenario:

Set-up + role prompt prefix — version-control this in your repo as .prompts/sdet-prefix.md

Constraint patterns — the high-leverage ones

Two explicit negative constraints eliminate entire categories of output you would otherwise spend time correcting.

Constraints are the gap between what the agent produces by default and what your team accepts. Two categories produce the highest return: timing constraints (eliminating arbitrary sleeps and delays) and selector-hierarchy constraints (enforcing resilient locator patterns). Both are cases where the agent will reliably produce the wrong pattern unless told not to — because both patterns exist at high frequency in the training data.

For timing, the default agent behaviour is to add `waitForTimeout` or equivalent wherever the test interacts with async UI. This produces tests that pass in fast environments and flake in slow ones — the failure mode that is hardest to diagnose after the fact. The constraint "use only explicit waits; never use waitForTimeout, page.waitFor, or arbitrary time delays" is short, specific, and eliminates the failure category entirely.

For selectors, the default is to use whatever locator the current DOM snapshot provides — often a deeply nested CSS selector or a text match on user-facing strings. Both patterns break under refactoring. The constraint "prefer getByRole > getByTestId > getByLabel > getByText, and never use CSS path selectors" produces a resilient hierarchy with one instruction.

// ❌ Without constraint — agent default output
await page.click('[data-cy="submit-button"]');
await page.waitForTimeout(2000); // arbitrary delay; fails under load
await expect(page.locator('.success-message')).toBeVisible();

Without timing constraint — agent produces fragile, flake-prone output

// ✓ With constraint: "use only explicit waits; never waitForTimeout"
await page.getByRole('button', { name: 'Submit' }).click();
await expect(page.getByRole('alert', { name: /order confirmed/i })).toBeVisible();

With timing constraint — agent produces explicit, event-driven wait

Output shape and example patterns

Tell the agent what form its output should take, and give it two tests that already work — quality improves more than any model upgrade.

Output shape constraints specify the format of the response: produce a complete TypeScript file, not a code snippet; use the file path `tests/e2e/[feature]/[scenario].spec.ts`; include a describe block matching the feature name; do not include imports already covered by the test fixture file. Each constraint reduces paste-friction and makes the generated file closer to a direct commit.

The example pattern is systematically underused. When you give the agent two existing tests from the repository and ask it to write a third matching their style, the output quality improves significantly — not because the model is smarter, but because the ambiguity in "matches repo conventions" has been resolved to something concrete. The agent reads the examples and infers naming rules, assertion style, and fixture usage.

Combine output shape with examples and you have eliminated the two most common reasons a generated test needs heavy editing before it can be committed: incorrect file structure and inconsistent code style. Two paragraphs of prompt context produce a materially cleaner first draft.

Anti-examples are underrated

Showing the agent what wrong looks like is more effective than describing what right looks like — and most teams skip this entirely.

Anti-examples are the highest-return pattern that most prompt libraries omit. The intuition against them is understandable: it feels redundant to show the agent what to avoid when you have already told it what to do. But the agent's training data contains thousands of examples of `waitForTimeout`, fragile CSS selectors, and tautological assertions. A positive constraint competes with that training signal. An anti-example that illustrates the exact pattern to avoid removes that ambiguity.

"Do not use waitForTimeout" is a constraint. "Do not use waitForTimeout — see this anti-pattern: [example with the exact call you want eliminated]" is an anti-example. The latter is consistently more effective because the agent can pattern-match against its output before returning it. The concrete anti-example is more discriminating than the abstract rule.

This applies to selector quality, assertion specificity, and file structure. One well-chosen bad example per constraint category is worth more than two paragraphs of positive instruction. For a team building a prompt library from scratch, collecting the most frequent errors from recent code reviews and converting each into an anti-example is the highest-leverage first step.

Treat prompts as test code

A prompt library that improves over six months is worth more than any single model upgrade.

Prompt patterns decay at the same rate as the codebase they describe. When the test framework version changes, when a major refactor shifts the page-object structure, when the team adopts a new assertion approach — the prompt set needs updating. A prompt library that is not version-controlled drifts silently from the codebase it was written for, becoming less useful without anyone noticing.

Concretely: store prompt files in the repository under a `prompts/` or `.prompts/` directory. Treat changes to the prompt library as code review-worthy events, the same as changes to `eslint.config.js` or `playwright.config.ts`. Use the platform.claude.com/docs prompt engineering guide as a reference for pattern categories, but the specific constraints for your project are yours to maintain and your team's to own.

// PRODUCTION

Persist your prompt patterns in version control alongside the test suite. Treat changes to the prompt library as code review-worthy events. A team prompt library that improves systematically over six months is worth more than any single model upgrade.