Docker Containers and CI Best Practices

8 min read

The single most-reported "weird CI bug" with browser tests: it works locally, it fails on CI, and the diff is one pixel of font anti-aliasing or a missing locale package. The cause is environment drift — your laptop runs macOS with one set of fonts, the CI runner is Ubuntu with a different set. Docker is how serious teams eliminate the variable: bake the OS, the fonts, the browser binaries, the Node version into one image, and run tests inside that image both locally and in CI. Identical bytes everywhere. This lesson is the official Playwright Docker image, the patterns for using it, and the broader CI best-practices checklist that turns "tests are flaky on CI" into "tests just work."

The official Playwright Docker image

Microsoft publishes Docker images that match each Playwright release exactly:

mcr.microsoft.com/playwright:v1.44.0-jammy   # Ubuntu 22.04 (Jammy)
mcr.microsoft.com/playwright:v1.44.0-noble   # Ubuntu 24.04 (Noble)

Each image contains:

  • A pinned Ubuntu LTS
  • A pinned Node LTS
  • All three browser binaries (Chromium, Firefox, WebKit)
  • All system dependencies the browsers need (fonts, codecs, GTK, fontconfig)
  • Playwright pre-installed at the matching version

You don't pull latest — you pull a specific version that matches the @playwright/test in your package.json. When you upgrade Playwright, you upgrade the image tag in lockstep. Same version local and remote = same renderer = same screenshots = same test results.

Running tests inside Docker locally

docker run --rm -it \
  -v $(pwd):/work \
  -w /work \
  mcr.microsoft.com/playwright:v1.44.0-jammy \
  npx playwright test

Three flags:

  • -v $(pwd):/work mounts your project into the container at /work.
  • -w /work sets the working directory.
  • --rm removes the container after the test run finishes.

The first run pulls the image (~2 GB; one-time cost). Subsequent runs reuse the cached layer. Inside the container, npx playwright test runs with the pinned environment — same as CI.

Run a specific test:

docker run --rm -it -v $(pwd):/work -w /work \
  mcr.microsoft.com/playwright:v1.44.0-jammy \
  npx playwright test login.spec.ts

Updating visual baselines inside Docker

The single biggest reason to use Docker locally: regenerating snapshots:

docker run --rm -it -v $(pwd):/work -w /work \
  mcr.microsoft.com/playwright:v1.44.0-jammy \
  npx playwright test --update-snapshots

The new baselines render in the Docker environment — exactly the bytes CI will produce. Commit them. CI now passes byte-for-byte because the renderer is identical. No more "passed locally, failed in CI" on visual tests.

A custom Dockerfile for your project

For more control (specific Node version, custom deps, CI-ready image), build your own based on Playwright's:

FROM mcr.microsoft.com/playwright:v1.44.0-jammy
 
WORKDIR /app
 
COPY package*.json ./
RUN npm ci
 
COPY . .
 
CMD ["npx", "playwright", "test"]

Build and run:

docker build -t my-playwright-tests .
docker run --rm -it my-playwright-tests

This pre-installs your project's dependencies into the image. Pushing the image to a registry (GitHub Container Registry, Docker Hub) means CI can docker pull it instead of running npm ci on every run — another speedup.

Docker in GitHub Actions

The cleanest way to run inside Docker on Actions is the container: field:

jobs:
  test:
    runs-on: ubuntu-latest
    container:
      image: mcr.microsoft.com/playwright:v1.44.0-jammy
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - run: npx playwright test

The whole job runs inside the Playwright Docker image. No npx playwright install --with-deps needed (already in the image). No setup-node (already in the image). Just install your deps and run.

Caveat: container: runs on a Linux host only, so this exact pattern doesn't work for Windows or macOS runners. For pure Linux pipelines (the common case for browser tests), it's the cleanest setup.

Why pin the renderer

A practical example. Without Docker:

  • Local dev: macOS, Apple's font rendering, San Francisco UI font.
  • CI: Ubuntu Jammy, Liberation fonts, no LCD anti-aliasing.

The same <h1>Welcome</h1> renders ~3 pixels differently per character. Visual snapshot tests pass on your laptop, fail on CI. You spend a week chasing a non-bug.

With Docker:

  • Local dev: Playwright Docker image (Ubuntu Jammy, Liberation fonts, identical anti-aliasing).
  • CI: Same image.

Same rendering. Same screenshots. Same tests. The variable that bit you is gone.

This isn't only about visual tests — text-overflow detection (getBoundingClientRect), CSS feature support, and even some JavaScript timing depends on the OS. Docker pins all of it.

With Docker vs without

Local-vs-CI environment parity, with and without Docker

Without Docker

  • Local: your OS, your fonts, your Node version

  • CI: Ubuntu LTS, default fonts, GitHub-managed Node

  • Different anti-aliasing, different font fallbacks, different timing

  • Visual tests pass locally, fail on CI on a 3-pixel diff

With Playwright Docker image

  • Local: mcr.microsoft.com/playwright:vX.Y.Z

  • CI: same image, same tag

  • Identical fonts, identical browsers, identical Node

  • Visual tests render byte-for-byte the same — same baselines work everywhere

CI best-practices checklist

Beyond Docker, a battle-tested checklist for production-grade Playwright in CI:

  • Pin every version. Playwright Docker tag (v1.44.0, not latest). Node version (lts/* is acceptable; an exact major is better). System deps (Docker handles this).
  • Cache aggressively. Playwright browsers between runs. npm ci cache via setup-node. Ideally a custom Docker image with npm ci pre-baked.
  • Use --with-deps once. First-time setup, browsers + system libs. After that, the image has them.
  • Set reasonable per-job timeouts. timeout-minutes: 30 on most jobs. Long enough for healthy runs; tight enough to catch infinite loops.
  • Always upload reports on failure. if: ${{ !cancelled() }} on the upload step. Failed runs are when you need the report most.
  • Shard suites over 5 minutes. Below that, sharding's overhead exceeds its win. Above, 4 shards is the sweet spot for most teams.
  • Run on PRs, not just merges. Catch regressions before review. Smoke suite on PR (~90s); full suite on merge (~5min).
  • Tag and split smoke vs full. @smoke tagged tests run on every commit. Full suite runs on main, scheduled nightly, or manually.
  • Configure retries explicitly. retries: process.env.CI ? 2 : 0 — 2 retries on CI mask transient infra flake; 0 locally so you see real failures fast.
  • Trace on first retry. trace: 'on-first-retry' — saves the trace.zip when a flaky test retries, doesn't bloat artefacts on healthy runs.
  • Send a notification on main branch failures. Slack, email, GitHub status checks. Failed main builds need someone's attention; PRs the developer is already watching.

A complete Docker-based workflow

Combining everything from this chapter — Docker, sharding, caching, reporting:

name: Playwright Tests
on: [push, pull_request]
 
jobs:
  test:
    timeout-minutes: 30
    runs-on: ubuntu-latest
    container:
      image: mcr.microsoft.com/playwright:v1.44.0-jammy
      options: --user 1001
    strategy:
      fail-fast: false
      matrix:
        shardIndex: [1, 2, 3, 4]
        shardTotal: [4]
 
    steps:
      - uses: actions/checkout@v4
 
      - name: Cache npm
        uses: actions/cache@v4
        with:
          path: ~/.npm
          key: npm-${{ runner.os }}-${{ hashFiles('**/package-lock.json') }}
 
      - run: npm ci
 
      - run: npx playwright test --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }}
        env:
          BASE_URL: ${{ secrets.STAGING_URL }}
          CI: true
 
      - uses: actions/upload-artifact@v4
        if: ${{ !cancelled() }}
        with:
          name: blob-report-${{ matrix.shardIndex }}
          path: blob-report
          retention-days: 1
 
  merge-reports:
    needs: test
    if: always()
    runs-on: ubuntu-latest
    container:
      image: mcr.microsoft.com/playwright:v1.44.0-jammy
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - uses: actions/download-artifact@v4
        with:
          path: all-blob-reports
          pattern: blob-report-*
          merge-multiple: true
      - run: npx playwright merge-reports --reporter html ./all-blob-reports
      - uses: actions/upload-artifact@v4
        with:
          name: html-report--final
          path: playwright-report
          retention-days: 14

Notice the container: field — every job runs inside Playwright's image. No browser install step; no system-deps step. The setup is invisible because it's pre-baked.

GitLab CI — the same pattern

For GitLab CI users, the same shape:

playwright:
  image: mcr.microsoft.com/playwright:v1.44.0-jammy
  stage: test
  parallel:
    matrix:
      - SHARD: ["1/4", "2/4", "3/4", "4/4"]
  script:
    - npm ci
    - npx playwright test --shard=$SHARD
  artifacts:
    when: always
    paths:
      - blob-report/
    expire_in: 1 day

Almost identical to Actions. The framework-level pattern (Docker image + sharding + report artefacts) carries cleanly across CI providers.

Coming from Cypress?

The mappings:

  • cypress/included Docker image → mcr.microsoft.com/playwright Docker image.
  • Cypress's parallel-record orchestrator (paid) → Playwright's --shard=N/M (free).
  • Cypress's font-rendering inconsistencies (a known long-standing issue) → Docker-pinned renderer eliminates the class.

Teams migrating from Cypress to Playwright often cite "visual tests finally pass on CI" as a top-three benefit. The combination of --update-snapshots + Docker + GitHub Actions Container Jobs is the missing piece that Cypress visual tests almost always struggle with.

⚠️ Common mistakes

  • Pulling :latest. A surprise upgrade breaks every visual test on the day it lands. Pin to a specific tag (v1.44.0-jammy) and bump deliberately when you upgrade Playwright.
  • Using Docker only on CI but not locally. Now your local snapshots come from macOS rendering and CI runs Linux. Visual tests fail with a 3-pixel diff. Always update snapshots inside the same image you run CI in.
  • Mixing runs-on: ubuntu-latest with actions/setup-node and container: mcr.microsoft.com/playwright. The container already has Node; setting up another wastes time and can produce version drift. With container:, drop setup-node entirely.

🎯 Practice task

Run your suite inside Docker, both locally and in CI. 30-40 minutes.

  1. Find your Playwright version in package.json (e.g., "@playwright/test": "1.44.0"). Match it to the Docker tag — mcr.microsoft.com/playwright:v1.44.0-jammy.

  2. Run locally inside Docker. From your project root:

    docker run --rm -it \
      -v $(pwd):/work -w /work \
      mcr.microsoft.com/playwright:v1.44.0-jammy \
      bash -c "npm ci && npx playwright test"

    First run downloads the image (~2 GB; takes a few minutes). Subsequent runs reuse the cached layers (~10 seconds startup).

  3. Update visual snapshots inside Docker. If you have any visual tests:

    docker run --rm -it \
      -v $(pwd):/work -w /work \
      mcr.microsoft.com/playwright:v1.44.0-jammy \
      bash -c "npm ci && npx playwright test --update-snapshots"

    The new baselines come from the Docker renderer. Commit them. CI now produces identical bytes — visual tests pass.

  4. Update your GitHub Actions workflow to use the container:

    jobs:
      test:
        runs-on: ubuntu-latest
        container:
          image: mcr.microsoft.com/playwright:v1.44.0-jammy
        steps:
          - uses: actions/checkout@v4
          - run: npm ci
          - run: npx playwright test

    Push and confirm the run still works — usually faster, because the image-baked browser/dep installs are skipped.

  5. Force a non-Docker visual diff. Temporarily revert the workflow to runs-on: ubuntu-latest without container:. Run a visual test you generated baselines for inside Docker. The CI run fails on a 1-2 pixel font-rendering diff. Restore the container: setting; it passes again. This is the muscle for "visual tests need a pinned renderer to be reliable."

  6. Stretch: build a custom Dockerfile that pre-installs your npm ci dependencies. Push the resulting image to GitHub Container Registry. Update the workflow to use your image instead of the upstream one. Now CI starts with npm ci already done — another 30 seconds saved per run.

That closes Chapter 8 — parallel execution and CI/CD. You now have:

  • Local parallelism via workers (8x speedup on most laptops)
  • Across-machine parallelism via sharding (another 4x in typical setups)
  • GitHub Actions with caching, artefacts, and merge reporting
  • Docker for environment parity that eliminates "works on my machine"

Combined, these turn a 10-minute serial suite into a sub-2-minute CI run that's identical to what runs on every developer's machine. The next chapter — reporting and debugging — covers what to do when those tests do fail: the HTML reporter in depth, custom reporters, the trace viewer, and the patterns for handling flaky tests without giving up parallelism.

// tip to track lessons you complete and pick up where you left off across devices.