Screenshot Comparisons with to_have_screenshot

Functional tests catch behaviour bugs — broken buttons, wrong validation, redirects to the wrong URL. They miss visual bugs: a CSS regression that overlaps two elements, a font-size change that pushes a button off-screen, a dark-mode rule that turns text invisible. Visual regression testing is how you catch those, and Playwright Python ships an expect(page).to_have_screenshot() matcher that handles the whole workflow — capture, store, compare, diff. This lesson covers the basic shape, named snapshots, element-level captures, threshold tuning, masking dynamic content, and the cross-browser reality of screenshot diffs.

The smallest visual test

from playwright.sync_api import Page, expect
 
 
def test_homepage_visual(page: Page):
    page.goto("/")
    expect(page).to_have_screenshot()

Two lines. The first time you run it, the test fails — there's no baseline yet. Run with --update-snapshots (or -u) once and Playwright captures the current page as the baseline:

pytest tests/test_visual.py --update-snapshots

Subsequent runs compare against that baseline. Pixel-by-pixel diff, configurable threshold, automatic fail if the diff exceeds tolerance. Baselines live in tests/__snapshots__/<browser>/ — one folder per browser project, because Chromium, Firefox, and WebKit render fonts and anti-aliasing differently.

Named snapshots

The default snapshot filename is derived from the test name. For multiple snapshots in one test, name them explicitly:

def test_states(page: Page):
    page.goto("/products")
    expect(page).to_have_screenshot("products-default.png")
 
    page.get_by_label("Sort").select_option("price-asc")
    expect(page).to_have_screenshot("products-sorted.png")
 
    page.get_by_role("button", name="Filter").click()
    expect(page).to_have_screenshot("products-filtered.png")

Three snapshots, three named files (products-default.png, products-sorted.png, products-filtered.png). Names should be self-explanatory — the file lives in version control, and a reviewer reading a PR with snapshot diffs needs to know what each represents.

Element-level snapshots

Capturing the whole page is brittle — every layout change anywhere ripples into the diff. Capture just the component you care about:

header = page.get_by_role("banner")
expect(header).to_have_screenshot("header.png")
 
product_card = page.get_by_test_id("product-card").first
expect(product_card).to_have_screenshot("product-card.png")
 
footer = page.get_by_role("contentinfo")
expect(footer).to_have_screenshot("footer.png")

Element screenshots only capture the bounding box of the matched element. Layout changes elsewhere on the page don't affect the diff. This is the right pattern for component visual regression — the header changes? Header diff fires. The footer was edited? Different diff fires. The two stay independent.

Threshold — when "exact match" is too strict

Real-world rendering has noise: anti-aliasing differs by font, sub-pixel positioning differs by zoom, sometimes the same SVG renders one pixel different on consecutive runs. Default tolerance is exact-pixel equal, which fails on noise. Loosen it:

# Allow up to 5% of pixels to differ
expect(page).to_have_screenshot(max_diff_pixel_ratio=0.05)
 
# Or: allow up to 100 pixels (absolute count) to differ
expect(page).to_have_screenshot(max_diff_pixels=100)
 
# Combine — pass if either holds
expect(page).to_have_screenshot(max_diff_pixel_ratio=0.05, max_diff_pixels=200)

max_diff_pixel_ratio is the right starting point for full pages; max_diff_pixels is better for small element snapshots where 5% might be only 50 pixels in absolute terms. Tune by trial: too strict, the suite flakes on noise; too loose, real regressions slip through. Most teams settle around 0.01-0.05 ratio.

Masking dynamic content

Pages with timestamps, ad slots, random promotions, or live pricing will diff on every run. Mask the offending regions:

expect(page).to_have_screenshot(mask=[
    page.get_by_test_id("timestamp"),
    page.get_by_test_id("ad-slot"),
    page.locator(".promo-banner"),
])

Masked regions are filled with a solid colour before comparison, so changes inside them don't trigger diffs while the rest of the layout is still validated. You can mask multiple locators at once, and locators inside mask= follow all the same rules as anywhere else (chains, filters, etc.).

Disabling animations

CSS animations and transitions cause snapshot flake — capture mid-animation and the next run captures a different frame. Freeze them:

expect(page).to_have_screenshot(animations="disabled")

Playwright pauses CSS animations and transitions before the screenshot, then resumes them. The page state captured is the end state of any in-progress animation. Combine with caret="hide" to hide the blinking text cursor in inputs:

expect(page).to_have_screenshot(animations="disabled", caret="hide")

These two flags together eliminate ~90% of visual flake.

Full-page vs viewport screenshots

By default to_have_screenshot() captures the visible viewport. For everything below the fold:

expect(page).to_have_screenshot(full_page=True)

full_page=True stitches together the whole document height. Useful for pages that scroll (long product listings, multi-section landing pages). Trade-off: large screenshots = bigger files in version control and slower diffs.

Baseline vs current vs diff

What Playwright generates on each visual run

Baseline (committed to git)

Captured once with --update-snapshots and committed
Lives in tests/__snapshots__/<browser>/
Source of truth for what the page should look like
Reviewed in PRs like any other code change

Current (run-time only)

Captured every test run
Compared pixel-by-pixel against the baseline
Discarded after the run unless the test fails
Saved alongside the diff on failure for inspection

Diff (only on failure)

Highlights pixels that differ between baseline and current
Saved to test-results/ along with both source images
The artefact you review when a visual test fails
If the diff is intentional, run --update-snapshots to accept it

The workflow: write the test, capture the baseline, commit. PR that changes the UI fails the visual test, generates a diff, reviewer looks at it, decides if the change is intentional, runs pytest --update-snapshots to accept, commits the new baseline. The discipline is the same as code review — diffs in version control, reviewed before merge.

Cross-browser baselines

Playwright stores one baseline per browser project: __snapshots__/test_visual_chromium/, __snapshots__/test_visual_firefox/, etc. The folder layout is automatic; you just have to be aware that running --update-snapshots against one browser doesn't update the others.

The practical implications:

Capture baselines for every browser the suite runs against.
Re-capture all of them when the design changes.
Don't commit a baseline from your dev machine if the team's CI runs on a different OS — font rendering on macOS differs from Linux.

Coming from Playwright TypeScript?

The mappings are exactly the case-conversion pattern from earlier chapters:

TS await expect(page).toHaveScreenshot() → Python expect(page).to_have_screenshot()
TS toHaveScreenshot('home.png') → Python to_have_screenshot("home.png")
TS { maxDiffPixelRatio: 0.05 } → Python max_diff_pixel_ratio=0.05
TS npx playwright test --update-snapshots → Python pytest --update-snapshots
TS mask: [page.locator('.timestamp')] → Python mask=[page.locator(".timestamp")]

Same workflow, snake_case parameters, same baseline-storage layout. Visual tests written for the TS course translate to Python with little more than the locator rename.

⚠️ Common mistakes

Committing baselines captured on your dev machine to a CI-running team. macOS and Linux render fonts differently, so a baseline from your laptop fails on the team's Ubuntu CI runner. Capture and commit baselines from CI (run a one-shot job with --update-snapshots, download the artefact, commit it) — or pin the OS for visual runs.
Treating every visual diff as a real failure. A 0.5% diff on a 1920×1080 page might be sub-pixel anti-aliasing noise. Tune max_diff_pixel_ratio per-test based on what's actually meaningful. A failing test that should pass loosens the team's trust in the suite faster than a passing test that should fail.
Capturing full-page snapshots of dynamic dashboards. A dashboard with five live charts, three timestamps, and an animated loading spinner will diff every single run. Either mask everything dynamic (often half the page) or — better — capture element-level snapshots of the static widgets (header, navigation, settings panel) and skip the dynamic ones.

🎯 Practice task

Capture, diff, and update visual baselines on a real page. 25-30 minutes.

Create tests/test_visual.py:

from playwright.sync_api import Page, expect
 
def test_saucedemo_login_page_visual(page: Page):
    page.goto("https://www.saucedemo.com/")
    expect(page).to_have_screenshot("login.png", animations="disabled")
 
def test_inventory_page_visual(page: Page):
    page.goto("https://www.saucedemo.com/")
    page.get_by_placeholder("Username").fill("standard_user")
    page.get_by_placeholder("Password").fill("secret_sauce")
    page.get_by_role("button", name="Login").click()
    page.wait_for_url("https://www.saucedemo.com/inventory.html")
    expect(page).to_have_screenshot("inventory.png", animations="disabled", full_page=True)

First run: pytest tests/test_visual.py -v. Both tests fail with "no baseline" — that's expected.
Generate baselines: pytest tests/test_visual.py --update-snapshots. Two PNG files appear under tests/__snapshots__/.
Re-run normally: pytest tests/test_visual.py -v. Both pass — the page renders identically to the baseline.
Force a real diff. Use the problem_user account instead of standard_user, which famously breaks product images on the inventory page. Run the inventory test — it should fail the visual diff. Open test-results/ to see the baseline, current, and diff PNGs side by side.
Tune the threshold. Add max_diff_pixel_ratio=0.05 to the inventory snapshot and re-run with problem_user. The 5% tolerance still fails because broken images differ by far more than 5%. Bump to 0.30 and re-run — passes (loose threshold). Restore to a sensible value (0.01).
Mask dynamic content. If the page had a timestamp or ad, you'd mask it. Add a mask= argument with a fake locator and confirm the test still passes — Playwright doesn't error on masks that don't match anything.
Stretch: add per-component snapshots. Capture just the inventory page's header (page.locator(".header_container")) and just the footer (page.locator(".footer")) as separate element-level snapshots. Component-level snapshots stay green even when the middle of the page changes.

You've got the visual-testing primitive. The next lesson covers full-page vs element vs cross-viewport visual testing — when each shape fits and how to organise a visual suite across breakpoints and browsers.