AI Tools for QA
// Definition
The growing category of AI-powered tools QA engineers use day to day — coding copilots (GitHub Copilot, Cursor), AI test generators, self-healing locator engines, visual-AI diffing, and LLM evaluation harnesses. The common thread is that they accelerate or automate parts of the testing workflow, but each shifts effort rather than removing it: the QA skill becomes choosing the right tool, prompting it well, and critically reviewing its output rather than trusting it blindly.
// Why it matters
AI tooling is reshaping what a QA day looks like — test scaffolding, locator maintenance, and exploratory ideas that took hours now take minutes. But the tools fail in confident, plausible ways (a generated test that asserts nothing, a healed locator that hides a real regression), so the engineer who can evaluate and supervise them is far more valuable than one who either ignores them or trusts them uncritically. Knowing the landscape is how you pick the few that fit your stack instead of chasing every demo.
// How to test
These are tools to evaluate and adopt, not a single thing to assert on. The QA approach: • categorise the tool — does it generate, maintain, review, or evaluate? Each needs different oversight. • trial it on YOUR codebase, not the vendor demo — measure time saved vs review cost • keep a human gate: AI-generated tests still need a review that they assert the right thing • for LLM-backed tools, build an eval set so you can tell a model/prompt change from a regression • adopt the few that pay back; resist tool sprawl that adds maintenance without removing it
// Common mistakes
- Trusting AI-generated tests without checking they actually assert meaningful behaviour
- Adopting tools from demos without measuring real time-saved-vs-review-cost on your code
- Letting self-healing or auto-generation mask genuine regressions instead of surfacing them
// Related terms
AI Testing
The use of AI — language models, machine-learning classifiers, and AI-powered platforms — to accelerate testing tasks: generating test code from descriptions, analysing logs and stack traces, suggesting edge cases, healing broken locators, comparing screenshots intelligently, and triaging failures. AI augments QA engineers; it does not replace the judgement, exploration, and risk-modelling work that humans still do best.
Large Language Model (LLM)
A neural network trained on massive text datasets to predict the next word in a sequence. Modern LLMs like Claude, GPT-4, and Gemini can answer questions, write code, summarise documents, and follow multi-step instructions — but they don't 'know' anything, they predict plausible continuations from patterns in training data. This is why they sometimes produce confident-sounding falsehoods (hallucinations) and why prompt design matters so much. In QA, LLMs are useful for generating test scaffolding, summarising bug reports, and drafting documentation — but their output always needs human review before it ships.
Prompt Engineering
The craft of writing inputs to AI tools — language models, chat assistants, coding assistants — so that the output is useful, specific, and aligned with the task. Core principles include being specific about format, providing project context (existing patterns, conventions, examples), asking for chain-of-thought reasoning, enumerating edge cases up front, and iterating across multiple turns rather than expecting a perfect first response.
Learn more · AI Tools for QA
Chapter 1 · Lesson 1: How AI Is Changing QA — A Realistic View