Screenshot Comparisons with toHaveScreenshot

A button silently changing colour, a header growing 4 pixels taller, a misaligned form on the registration page — these are the kinds of regressions that pass every functional test (the click still works, the assertion still finds the text) but ship a broken product. Visual regression testing is how you catch them: take a screenshot of the page on the first run (the baseline), and on every subsequent run compare a new screenshot against it. If they differ, the test fails. Playwright ships this as a first-class assertion — toHaveScreenshot — with no third-party tools needed. This lesson is the basics, the gotchas around dynamic content and animations, and how to keep visual baselines stable across team and CI.

The simplest possible visual test

import { test, expect } from "@playwright/test";
 
test("homepage matches the baseline screenshot", async ({ page }) => {
  await page.goto("/");
  await expect(page).toHaveScreenshot();
});

The first time this runs, Playwright doesn't know what the page should look like — so it captures the current rendering and saves it as the baseline (under tests/__snapshots__/...). The test "passes" on the first run with a [matched-with-snapshot] note and the new file is yours to inspect.

Run again. This time Playwright captures the page, compares pixel-by-pixel against the baseline, and either passes (no diff) or fails (diff exceeds tolerance). The baseline is what you committed; the comparison is what changed.

Naming and organising baselines

By default, baselines are named after the test. For more control:

await expect(page).toHaveScreenshot("homepage.png");
await expect(page).toHaveScreenshot(["dashboard", "logged-in.png"]); // nested folder

The file paths land under tests/<spec>-snapshots/<browser>-<platform>/<name>.png. Playwright generates one baseline per browser per platform — Chromium-Linux differs from WebKit-Mac in font rendering even if the page is identical, so storing them separately is what you want.

Element-level screenshots

Pages are big and noisy. Most useful visual checks scope to one component:

const card = page.getByTestId("product-card").first();
await expect(card).toHaveScreenshot("product-card.png");
 
const header = page.getByRole("banner");
await expect(header).toHaveScreenshot("header.png");
 
const checkoutSummary = page.getByTestId("checkout-summary");
await expect(checkoutSummary).toHaveScreenshot("checkout-summary.png");

Element snapshots are more stable than full-page ones — a header redesign won't fail your pricing-card test. A landing-page hero change won't fail your footer test. Reach for element-level by default; full-page only when you specifically want to catch layout regressions.

Dealing with diffs — `maxDiffPixels` and `maxDiffPixelRatio`

Pixel-perfect comparison is brittle. Anti-aliasing differs by 1-2 pixels on every render; rounded fonts vary by sub-pixel amounts. Tolerate small differences:

await expect(page).toHaveScreenshot({
  maxDiffPixels: 100        // up to 100 pixels can differ
});
 
await expect(page).toHaveScreenshot({
  maxDiffPixelRatio: 0.01   // up to 1% of pixels can differ
});
 
await expect(page).toHaveScreenshot({
  maxDiffPixels: 100,
  threshold: 0.2            // per-pixel: how different two pixels must be to count as "different" (0-1)
});

maxDiffPixelRatio is the safer default — it scales with element size. A small button has fewer pixels to begin with, so a percentage tolerance behaves consistently across element sizes.

Globally tune in playwright.config.ts:

export default defineConfig({
  expect: {
    toHaveScreenshot: {
      maxDiffPixelRatio: 0.01,
      threshold: 0.2,
      animations: "disabled"
    }
  }
});

Per-test overrides win when you need stricter or looser tolerance for one specific test.

Animations are the #1 cause of flake

A page mid-fade-in, a button mid-hover-scale, a carousel mid-slide — every one renders differently between runs. Disable animations during snapshot:

await expect(page).toHaveScreenshot({ animations: "disabled" });

Or globally in config (recommended):

expect: {
  toHaveScreenshot: { animations: "disabled" }
}

animations: 'disabled' halts CSS animations and transitions for the snapshot, then restores them. The page looks "frozen" at its end-state. This single setting eliminates 80% of visual-regression flake.

Masking dynamic content

Even with animations disabled, some content changes on every run — timestamps, ad rotators, random user avatars, server health pings. Mask them:

await expect(page).toHaveScreenshot("dashboard.png", {
  mask: [
    page.getByTestId("timestamp"),
    page.getByTestId("random-ad-slot"),
    page.locator(".live-counter")
  ]
});

Each masked locator is replaced with a solid pink rectangle (configurable) before the snapshot — Playwright still captures the surrounding page, but the dynamic regions are uniform. The mask survives in the baseline; the next run masks the same areas, and the diff is zero.

Updating baselines — `--update-snapshots`

When the change is intentional (you redesigned the header, you swapped a font), regenerate baselines:

npx playwright test --update-snapshots

This re-records every visual snapshot against the current rendering. Review the changes (git diff shows binary diffs as "Bin"; opening the PNGs in your editor shows the visual change). Commit the new baselines.

For one specific test:

npx playwright test homepage.spec.ts --update-snapshots

For one specific snapshot, delete the old PNG and re-run — the next run treats it as the first run and creates a fresh baseline.

Three views of a visual test

Baseline, current, diff — what toHaveScreenshot compares

Baseline (saved)

Captured on the first run, committed to git
Stored in tests/__snapshots__/ per browser per platform
What every subsequent run compares against
Updated only when you run --update-snapshots

Current (this run)

Captured fresh by toHaveScreenshot()
Same viewport, same browser, same platform as baseline
Compared pixel-by-pixel, with masks and animation control applied
Discarded if test passes; saved if test fails

Diff (on failure)

Generated only when current ≠ baseline beyond tolerance
Highlights changed pixels in red
Saved next to the actual + expected as test artefacts
Open in the HTML report to see exactly what changed

A complete visual-regression spec

Putting every concept into a real e-commerce test:

import { test, expect } from "@playwright/test";
 
test.describe("Visual regression — Sauce Demo", () => {
  test.use({ viewport: { width: 1280, height: 720 } });
 
  test.beforeEach(async ({ page }) => {
    await page.goto("/");
    await page.getByPlaceholder("Username").fill("standard_user");
    await page.getByPlaceholder("Password").fill("secret_sauce");
    await page.getByRole("button", { name: "Login" }).click();
    await expect(page).toHaveURL(/inventory/);
  });
 
  test("inventory page matches baseline", async ({ page }) => {
    await expect(page).toHaveScreenshot("inventory.png", {
      animations: "disabled",
      mask: [page.locator("footer .copyright")] // mask year/timestamp if present
    });
  });
 
  test("first product card matches baseline", async ({ page }) => {
    const card = page.locator(".inventory_item").first();
    await expect(card).toHaveScreenshot("product-card.png", {
      animations: "disabled"
    });
  });
 
  test("cart badge updates correctly — visual after-state", async ({ page }) => {
    await page.locator(".inventory_item").first().getByRole("button", { name: "Add to cart" }).click();
    const badge = page.locator(".shopping_cart_badge");
    await expect(badge).toHaveScreenshot("cart-badge-1.png");
  });
});

Three tests, three different scopes. The first is page-level. The second is element-level. The third captures a post-action state. Each one masks or freezes anything that would vary between runs.

Committing baselines — yes, you commit them

The biggest "wait, really?" question: should __snapshots__/ go in git? Yes. The baselines are the contract — what the page is supposed to look like. They live with the test code; updating them is a deliberate code review.

git add tests/inventory.spec.ts-snapshots/
git commit -m "feat: visual regression for inventory page"

A teammate reviewing your PR sees the new PNGs and can open them. If you're updating an intentional change, the diff in the PR shows the before-and-after — which is exactly the moment to spot an unintended visual change you missed.

The exception: snapshots can balloon a repo's size. For very large suites (hundreds of full-page screenshots), some teams use Git LFS. For most projects, regular git is fine.

CI vs local — pin the renderer

The single most reported visual-regression issue: "snapshots pass locally on my Mac but fail in CI Linux." The cause is font rendering — macOS, Linux, and Windows render text with subtly different anti-aliasing. The fix is to always run snapshot tests inside the same environment that generated the baselines:

Use Playwright's official Docker image (mcr.microsoft.com/playwright) both locally (when updating baselines) and in CI.
Or generate baselines in CI directly: --update-snapshots runs against the CI environment, baselines get committed, every subsequent CI run matches.

We'll wire this up in chapter 8's Docker lesson. For now, know that "snapshots passing on your laptop but failing on CI" isn't a bug — it's an environment mismatch, fixed by pinning the renderer.

Coming from Cypress?

The mappings:

cy.matchImageSnapshot() (via cypress-image-snapshot plugin) → await expect(page).toHaveScreenshot() (built-in).
cy.viewport(1280, 720); cy.matchImageSnapshot('home') → test.use({ viewport: ... }); await expect(page).toHaveScreenshot('home.png').

The big difference: Cypress visual testing is third-party and inconsistent across plugins. Playwright's toHaveScreenshot is built-in, standardised, and integrates with the same project-based browser matrix you've already configured. If you've ever wrestled with cypress-image-snapshot's setup, the migration to Playwright visual testing is usually a big simplification.

⚠️ Common mistakes

Skipping animations: 'disabled' and chasing flakes. A snapshot that fails 1 in 10 times is almost always an animation race. Disable animations globally in config; you can re-enable per-test if you specifically want to test an animation's end state.
Forgetting masks for timestamps and rotating ads. A snapshot taken at 3:45pm UTC will fail next time it's taken at 3:46pm. Inspect every snapshotted area for content that changes per-run; mask it. If you can't mask it (e.g., it's the entire panel), don't snapshot that panel.
Updating snapshots blindly when CI fails. --update-snapshots regenerates against current rendering — if the failure was a real bug (a button shrunk by a refactor), updating the baseline locks in the bug. Always inspect the diff first; only update when you're sure the change is intentional.

🎯 Practice task

Build a visual-regression baseline for Sauce Demo. 25-30 minutes.

Create tests/visual.spec.ts:

import { test, expect } from "@playwright/test";
 
test.describe("Visual regression — Sauce Demo", () => {
  test.use({ viewport: { width: 1280, height: 720 } });
 
  test.beforeEach(async ({ page }) => {
    await page.goto("https://www.saucedemo.com");
  });
 
  test("login page baseline", async ({ page }) => {
    await expect(page).toHaveScreenshot("login.png", {
      animations: "disabled"
    });
  });
 
  test("login form element only", async ({ page }) => {
    const form = page.locator(".login_wrapper");
    await expect(form).toHaveScreenshot("login-form.png", {
      animations: "disabled",
      maxDiffPixelRatio: 0.01
    });
  });
 
  test("inventory page after login", async ({ page }) => {
    await page.getByPlaceholder("Username").fill("standard_user");
    await page.getByPlaceholder("Password").fill("secret_sauce");
    await page.getByRole("button", { name: "Login" }).click();
 
    await expect(page).toHaveScreenshot("inventory.png", {
      animations: "disabled",
      mask: [page.locator(".footer_copy")] // year text changes
    });
  });
});

Run it the first time: npx playwright test visual.spec.ts --project=chromium. Tests pass; baselines are written. Inspect tests/visual.spec.ts-snapshots/ — three PNG files exist.
Run again: npx playwright test visual.spec.ts --project=chromium. All three pass with zero diff.
Force a visual diff. Open devtools on Sauce Demo, change the username placeholder via console (document.querySelector('input[name=user-name]').placeholder = 'Email'), then re-run. The login-form test fails because the placeholder text changed. The HTML report shows the diff side-by-side. (In a real test, this would be a regression worth investigating.)
Update baselines deliberately. Run npx playwright test visual.spec.ts --update-snapshots --project=chromium. The PNGs are regenerated with the new placeholder. Commit the updated baselines.
Stretch: add visual tests for all three browsers. Run npx playwright test visual.spec.ts --update-snapshots. Inspect __snapshots__/ — there are now nine PNGs (three tests × three browsers). Each browser's font rendering differs subtly; the per-browser baselines are what stop those differences from causing flake.

You now have a visual regression suite that catches the design changes functional tests miss. The next lesson zooms in on the scope dimension — when full-page screenshots are right, when element-level is sharper, and how to handle responsive design across viewports.