Series

Testing AI products.

Evaluating AI features when there's no single correct answer — and using AI on the test side without fooling yourself. Testing AI breaks the usual playbook: outputs vary, so you test properties instead of equality. This series covers evaluating AI features, reviewing AI-written tests, and where AI genuinely helps a QA workflow versus where it's a trap.

Who it's forQA engineers testing AI featuresAutomation engineers using AI

// overview

Testing AI breaks the playbook QA grew up on. There's no single correct output, so “does it equal the expected value?” stops working — and a lot of teams either freeze or wave the feature through. This series is about testing AI features anyway, by checking properties and boundaries instead of exact strings.

It covers both sides of AI in QA: evaluating AI products (hallucinations, refusals, scope, grounded facts), and using AI on the test side without fooling yourself — reviewing AI-written tests that pass for the wrong reasons, and knowing where AI genuinely saves time versus where it's a trap.

The throughline: AI changes what you assert and what you log, not whether you test. The judgement is still the job.

// reading order

Tutorials·13 June 2026 · 9 min read
How I evaluate an AI chatbot before release
A practical evaluation pass for AI chat features: hallucinations, refusals, prompt injection, and the cases with no single right answer.
ai-testingllmevaluation
Deep dives·13 June 2026 · 9 min read
Prompt injection testing for QA engineers
LLMs can't reliably separate instructions from data, so user input can hijack the model. Direct and indirect injection, what to check for, and how to report it QA-safe.
ai-testingsecurity-testingprompt-injectionllm
Tutorials·13 June 2026 · 8 min read
What QA should log when testing AI features
A screenshot isn't a repro when outputs vary. Capture the full assembled prompt, retrieved context, model version, and parameters so an AI bug is actually reproducible.
ai-testingobservabilityllm
Tutorials·13 June 2026 · 9 min read
How to review AI-written Playwright tests
AI writes plausible Playwright tests that pass for the wrong reasons. Here is the review checklist that catches them.
ai-testingplaywrightreview
Opinions·8 January 2026 · 9 min read
AI-generated tests are useful — but not for the reason you think
AI writes 80% of a test 80% of the way, and the remaining 20% is exactly the part that makes it a test. Where AI saves time, where it's a trap, and the distinction that separates the two.
aicopilottesting
Tutorials·30 December 2025 · 10 min read
Using Claude and Copilot for test writing: a practical playbook
The practical playbook for AI-assisted test writing in 2026. The prompts that work, the prompts that don't, and the human-in-the-loop checkpoints that keep AI from writing tests that pass for the wrong reasons.
aiclaudecopilotworkflow
Tutorials·13 June 2026 · 9 min read
The hallucination test cases I run on AI features
Concrete test cases for AI hallucination — unanswerable questions, false premises, invented entities, citations — and how to judge answers with no 'correct' value.
ai-testingllmhallucinationtest-cases

// RELATED QA.CODES RESOURCES

Course

AI for QA hub

Tool

Prompt library

Next seriesSecurity testing for QA

Testing AI products.

How I evaluate an AI chatbot before release

Prompt injection testing for QA engineers

What QA should log when testing AI features

How to review AI-written Playwright tests

AI-generated tests are useful — but not for the reason you think

Using Claude and Copilot for test writing: a practical playbook

The hallucination test cases I run on AI features

// RELATED QA.CODES RESOURCES