Claude Code in Your Test CI/CD Pipeline

9 min read

Claude Code is an interactive tool by default, but it has a non-interactive mode designed exactly for CI/CD use. The --print flag runs a single prompt and exits — no TTY required, no approval prompts, just output. This lesson covers the CI use cases worth setting up, a working GitHub Actions example, and the cost and security considerations that matter before you put an LLM call in your pipeline.

The non-interactive mode

claude --print "Run the smoke suite and summarise any failures" \
       --model claude-sonnet-4-6

--print accepts a prompt, executes it, writes the output to stdout, and exits with a non-zero code if the task fails. It reads from the current directory like an interactive session, so it has full access to your test files and logs. The ANTHROPIC_API_KEY environment variable provides the credentials — no interactive OAuth in CI.

CI use cases that pay for themselves

Auto-investigate test failures. When a test run fails, Claude analyses the failure log and recent commits and produces a diagnosis comment on the PR. Developers get a starting hypothesis before they even open the CI logs.

Nightly flake triage. A scheduled workflow collects that day's intermittent failures, passes them to Claude, and outputs a ranked list of flake candidates with suggested root causes. Run it at midnight, read the results in the morning.

Release notes from test results. Before cutting a release, Claude scans the test run results and recent commits and generates a test-focused changelog — what was covered, what was added, what changed.

Test coverage drift detection. On every PR, Claude reads the diff and the existing test files and flags features that changed without any corresponding test changes.

A working GitHub Actions workflow

name: Test Failure Investigation
on:
  workflow_run:
    workflows: ["E2E Tests"]
    types: [completed]
 
jobs:
  investigate-failure:
    if: github.event.workflow_run.conclusion == 'failure'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
 
      - name: Install Claude Code
        run: npm install -g @anthropic-ai/claude-code
 
      - name: Download test results
        uses: actions/download-artifact@v4
        with:
          name: test-results
          path: test-results/
 
      - name: Analyse failure
        run: |
          claude --print "
            The E2E test run just failed.
            Read the test results in test-results/ and the recent git log.
            Write a 3-paragraph analysis:
            1. Which tests failed and the likely root cause
            2. Which recent commits are most likely responsible
            3. Recommended next steps for the developer
          " > failure-analysis.md
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
 
      - name: Upload analysis
        uses: actions/upload-artifact@v4
        with:
          name: failure-analysis
          path: failure-analysis.md

The analysis appears as a downloadable CI artefact. A further step can post it as a PR comment using the GitHub CLI — but that requires a GITHUB_TOKEN with write access, which is worth treating as a separate permission decision.

Cost considerations for CI

Interactive sessions are billed per token. A CI call analysing a failure log might cost $0.05–$0.20. At high PR volume, that adds up. Practical controls:

  • Use Sonnet (claude-sonnet-4-6) for all CI tasks — it is faster and cheaper than Opus
  • Gate on failure — only trigger Claude analysis when the workflow conclusion is failure, not on every run
  • Set scope limits in the prompt — "read only test-results/ and the last 10 git commits" is cheaper than "look at everything"
  • Monitor the Anthropic Console for the first two weeks after enabling any CI integration

Security practices

Claude Code in CI runs with whatever credentials and network access the runner has. Treat this carefully:

  • Store the API key in GitHub Secrets, never in the workflow file
  • Grant Claude Code read-only shell permissions in CI: it should read logs and write analysis files, not git push or deploy anything
  • Avoid passing production secrets or PII into the prompt — Claude Code sends context to the Anthropic API
  • Review the analysis workflow's scope annually, especially if the runner has access to sensitive infrastructure

Step 1 of 5

Test suite fails in CI

The E2E workflow completes with a failure conclusion. Test results are saved as artefacts.

⚠️ Common Mistakes

  • Giving Claude Code write permissions in CI without careful review. Analysis tasks need read access and the ability to write artefact files. Git push, deployment commands, and anything touching shared infrastructure should be denied explicitly.
  • Running on every push without cost controls. Trigger only on failure, scope the prompt to the relevant artefacts, and use Sonnet. Run cost projections before enabling for a high-volume repository.
  • Trusting CI analysis as definitive. Claude Code in CI produces a starting hypothesis, not a verified root cause. Developers should treat the analysis as a useful first read, not a closed investigation.

🎯 Practice Task

Set up a failure analysis workflow for a real project. 20–30 minutes.

  1. Create a GitHub Actions workflow that triggers when your test workflow fails.
  2. Add a step that installs Claude Code and calls claude --print with a prompt that reads the failure logs and recent commits.
  3. Upload the output as a CI artefact.
  4. Trigger a test failure (comment out an assertion or break a selector) and verify the analysis artefact is generated and useful.

The next lesson covers what to look for when reviewing the test code Claude Code produces — in CI and in normal development.

// tip to track lessons you complete and pick up where you left off across devices.