The previous lessons treated Claude Code as a tool that responds to individual prompts. Agentic workflows flip that model: you describe a high-level goal, Claude Code breaks it into steps, executes them in sequence, handles the results of each step, and reports back when done. For large, well-defined QA tasks, this is where AI assistance moves from "faster typing" to something qualitatively different.
What makes a good agentic task
Not every task should be handed to an agent. Good candidates share three properties:
- Decomposable — the goal can be broken into clear sequential steps
- Bounded — the scope is well-defined and the files involved are known
- Reviewable — you can verify the output at the end (or at checkpoints)
Poor candidates are tasks that require judgement calls Claude Code cannot make: "decide which tests are worth writing," "determine what the right architecture is," "figure out why the product is slow." Those need a human in the loop at every step, not at the end.
A well-structured agentic prompt
Add test coverage for the new payment refund flow.
Steps:
1. Read docs/features/refunds.md to understand the feature spec
2. Identify all scenarios that need testing (happy path, edge cases, errors)
3. Check tests/ to see which scenarios already have coverage
4. For uncovered scenarios, generate Playwright tests
5. Save new tests to tests/refunds/
6. Run the new tests — list any that need fixes
7. Report: what was added, what passed, what needs attention
Get my approval before running any tests. Pause after step 3 and show me
the coverage gap analysis before writing any code.The explicit pause in the middle is not just politeness — it is a checkpoint that lets you catch a wrong direction before Claude Code writes files you will have to undo.
The plan-first mode
For complex workflows, start with /plan before executing:
> /plan
> Audit our entire test suite for these three anti-patterns:
> 1. page.waitForTimeout usage
> 2. Hardcoded credentials in test files
> 3. Assertions without descriptive error messages
>
> For each, list affected files and propose a fix strategy.Claude Code outlines the investigation and the proposed changes without touching any files. You review the plan, redirect if something is wrong, then approve execution. This is particularly valuable when "start over" would be expensive.
Delegating investigation tasks
Agentic workflows are excellent for investigation tasks where the answer is not known upfront:
Investigate why our CI test runs have been getting slower over the past month.
1. Read our GitHub Actions workflow files
2. Look at the test run timing data in the last 20 CI runs (use GitHub MCP if available)
3. Identify the top contributors to slow runtime
4. For each one, suggest a specific optimisation
5. Estimate the time saving for each optimisationClaude works through this systematically — reading workflow files, analysing timing data, cross-referencing slow tests with the test code — and produces a prioritised recommendation. The investigation would take a human two hours; Claude Code does it in five minutes.
Generation-and-verify loops
Agentic workflows can self-correct within a defined scope:
Generate Playwright tests for every endpoint documented in docs/api/.
For each test: write it, run it, and if it fails, attempt one fix.
If the fix does not resolve the failure, mark it as "needs attention"
and move to the next endpoint.
Show me a summary table at the end: endpoint, test status, any needing attention.Claude generates, runs, observes the output, fixes if needed, and continues — all without prompting for each endpoint individually.
Managing cost and scope
Agentic sessions consume more tokens than single prompts. Practical controls:
- Use checkpoints (
pause after step X and wait for approval) for workflows with major phases - Use Sonnet for most steps; only switch to Opus when reasoning quality has been insufficient
- Commit after each major phase — you can revert to the last checkpoint if a phase goes wrong
- Use
/costperiodically in long sessions to track spend
For team workflows, set token budget expectations in CLAUDE.md so everyone operates with the same awareness.
Step 1 of 6
Define the goal clearly
Write a prompt that names the high-level objective, the steps, and where to pause for approval. Vague goals produce wandering agents.
⚠️ Common Mistakes
- Fully autonomous runs without checkpoints on large tasks. A 50-step agentic run that goes wrong at step 10 creates 40 steps of consequences to untangle. Build in checkpoints for any workflow that touches more than 10 files.
- Handing investigation tasks to agents without scope bounds. "Investigate all the problems in our test suite" can spiral into an hours-long, expensive session. Bound the scope: specific directories, specific anti-patterns, specific time range.
- Not reviewing generated tests because the agent ran them and they passed. A passing test is not a correct test. Agentic generation still produces the same assertion-quality issues as single-prompt generation. Review is non-negotiable.
🎯 Practice Task
Run a bounded agentic task. 20–30 minutes.
- Pick a well-defined QA task: coverage audit for a specific feature, a specific type of anti-pattern search, or a set of tests to generate for a documented feature.
- Write a prompt that breaks it into numbered steps with at least one explicit pause for your approval.
- Use
/planfirst if the task involves more than five files. - Run it, review at the checkpoint, approve continuation.
- At the end, check the output: what did the agent get right autonomously? What required a correction after the checkpoint?
Chapter 5 moves from Claude Code's internal capabilities to how it fits into your wider QA workflow — CI pipelines, code review, and keeping costs sustainable.