Functional tests catch behaviour bugs — broken buttons, wrong validation, redirects to the wrong URL. They miss visual bugs: a CSS regression that overlaps two elements, a font-size change that pushes a button off-screen, a dark-mode rule that turns text invisible. Visual regression testing is how you catch those, and Playwright Python ships an expect(page).to_have_screenshot() matcher that handles the whole workflow — capture, store, compare, diff. This lesson covers the basic shape, named snapshots, element-level captures, threshold tuning, masking dynamic content, and the cross-browser reality of screenshot diffs.
The smallest visual test
from playwright.sync_api import Page, expect
def test_homepage_visual(page: Page):
page.goto("/")
expect(page).to_have_screenshot()Two lines. The first time you run it, the test fails — there's no baseline yet. Run with --update-snapshots (or -u) once and Playwright captures the current page as the baseline:
pytest tests/test_visual.py --update-snapshotsSubsequent runs compare against that baseline. Pixel-by-pixel diff, configurable threshold, automatic fail if the diff exceeds tolerance. Baselines live in tests/__snapshots__/<browser>/ — one folder per browser project, because Chromium, Firefox, and WebKit render fonts and anti-aliasing differently.
Named snapshots
The default snapshot filename is derived from the test name. For multiple snapshots in one test, name them explicitly:
def test_states(page: Page):
page.goto("/products")
expect(page).to_have_screenshot("products-default.png")
page.get_by_label("Sort").select_option("price-asc")
expect(page).to_have_screenshot("products-sorted.png")
page.get_by_role("button", name="Filter").click()
expect(page).to_have_screenshot("products-filtered.png")Three snapshots, three named files (products-default.png, products-sorted.png, products-filtered.png). Names should be self-explanatory — the file lives in version control, and a reviewer reading a PR with snapshot diffs needs to know what each represents.
Element-level snapshots
Capturing the whole page is brittle — every layout change anywhere ripples into the diff. Capture just the component you care about:
header = page.get_by_role("banner")
expect(header).to_have_screenshot("header.png")
product_card = page.get_by_test_id("product-card").first
expect(product_card).to_have_screenshot("product-card.png")
footer = page.get_by_role("contentinfo")
expect(footer).to_have_screenshot("footer.png")Element screenshots only capture the bounding box of the matched element. Layout changes elsewhere on the page don't affect the diff. This is the right pattern for component visual regression — the header changes? Header diff fires. The footer was edited? Different diff fires. The two stay independent.
Threshold — when "exact match" is too strict
Real-world rendering has noise: anti-aliasing differs by font, sub-pixel positioning differs by zoom, sometimes the same SVG renders one pixel different on consecutive runs. Default tolerance is exact-pixel equal, which fails on noise. Loosen it:
# Allow up to 5% of pixels to differ
expect(page).to_have_screenshot(max_diff_pixel_ratio=0.05)
# Or: allow up to 100 pixels (absolute count) to differ
expect(page).to_have_screenshot(max_diff_pixels=100)
# Combine — pass if either holds
expect(page).to_have_screenshot(max_diff_pixel_ratio=0.05, max_diff_pixels=200)max_diff_pixel_ratio is the right starting point for full pages; max_diff_pixels is better for small element snapshots where 5% might be only 50 pixels in absolute terms. Tune by trial: too strict, the suite flakes on noise; too loose, real regressions slip through. Most teams settle around 0.01-0.05 ratio.
Masking dynamic content
Pages with timestamps, ad slots, random promotions, or live pricing will diff on every run. Mask the offending regions:
expect(page).to_have_screenshot(mask=[
page.get_by_test_id("timestamp"),
page.get_by_test_id("ad-slot"),
page.locator(".promo-banner"),
])Masked regions are filled with a solid colour before comparison, so changes inside them don't trigger diffs while the rest of the layout is still validated. You can mask multiple locators at once, and locators inside mask= follow all the same rules as anywhere else (chains, filters, etc.).
Disabling animations
CSS animations and transitions cause snapshot flake — capture mid-animation and the next run captures a different frame. Freeze them:
expect(page).to_have_screenshot(animations="disabled")Playwright pauses CSS animations and transitions before the screenshot, then resumes them. The page state captured is the end state of any in-progress animation. Combine with caret="hide" to hide the blinking text cursor in inputs:
expect(page).to_have_screenshot(animations="disabled", caret="hide")These two flags together eliminate ~90% of visual flake.
Full-page vs viewport screenshots
By default to_have_screenshot() captures the visible viewport. For everything below the fold:
expect(page).to_have_screenshot(full_page=True)full_page=True stitches together the whole document height. Useful for pages that scroll (long product listings, multi-section landing pages). Trade-off: large screenshots = bigger files in version control and slower diffs.
Baseline vs current vs diff
What Playwright generates on each visual run
Baseline (committed to git)
Captured once with --update-snapshots and committed
Lives in tests/__snapshots__/<browser>/
Source of truth for what the page should look like
Reviewed in PRs like any other code change
Current (run-time only)
Captured every test run
Compared pixel-by-pixel against the baseline
Discarded after the run unless the test fails
Saved alongside the diff on failure for inspection
Diff (only on failure)
Highlights pixels that differ between baseline and current
Saved to test-results/ along with both source images
The artefact you review when a visual test fails
If the diff is intentional, run --update-snapshots to accept it
The workflow: write the test, capture the baseline, commit. PR that changes the UI fails the visual test, generates a diff, reviewer looks at it, decides if the change is intentional, runs pytest --update-snapshots to accept, commits the new baseline. The discipline is the same as code review — diffs in version control, reviewed before merge.
Cross-browser baselines
Playwright stores one baseline per browser project: __snapshots__/test_visual_chromium/, __snapshots__/test_visual_firefox/, etc. The folder layout is automatic; you just have to be aware that running --update-snapshots against one browser doesn't update the others.
The practical implications:
- Capture baselines for every browser the suite runs against.
- Re-capture all of them when the design changes.
- Don't commit a baseline from your dev machine if the team's CI runs on a different OS — font rendering on macOS differs from Linux.
Coming from Playwright TypeScript?
The mappings are exactly the case-conversion pattern from earlier chapters:
- TS
await expect(page).toHaveScreenshot()→ Pythonexpect(page).to_have_screenshot() - TS
toHaveScreenshot('home.png')→ Pythonto_have_screenshot("home.png") - TS
{ maxDiffPixelRatio: 0.05 }→ Pythonmax_diff_pixel_ratio=0.05 - TS
npx playwright test --update-snapshots→ Pythonpytest --update-snapshots - TS
mask: [page.locator('.timestamp')]→ Pythonmask=[page.locator(".timestamp")]
Same workflow, snake_case parameters, same baseline-storage layout. Visual tests written for the TS course translate to Python with little more than the locator rename.
⚠️ Common mistakes
- Committing baselines captured on your dev machine to a CI-running team. macOS and Linux render fonts differently, so a baseline from your laptop fails on the team's Ubuntu CI runner. Capture and commit baselines from CI (run a one-shot job with
--update-snapshots, download the artefact, commit it) — or pin the OS for visual runs. - Treating every visual diff as a real failure. A 0.5% diff on a 1920×1080 page might be sub-pixel anti-aliasing noise. Tune
max_diff_pixel_ratioper-test based on what's actually meaningful. A failing test that should pass loosens the team's trust in the suite faster than a passing test that should fail. - Capturing full-page snapshots of dynamic dashboards. A dashboard with five live charts, three timestamps, and an animated loading spinner will diff every single run. Either mask everything dynamic (often half the page) or — better — capture element-level snapshots of the static widgets (header, navigation, settings panel) and skip the dynamic ones.
🎯 Practice task
Capture, diff, and update visual baselines on a real page. 25-30 minutes.
-
Create
tests/test_visual.py:from playwright.sync_api import Page, expect def test_saucedemo_login_page_visual(page: Page): page.goto("https://www.saucedemo.com/") expect(page).to_have_screenshot("login.png", animations="disabled") def test_inventory_page_visual(page: Page): page.goto("https://www.saucedemo.com/") page.get_by_placeholder("Username").fill("standard_user") page.get_by_placeholder("Password").fill("secret_sauce") page.get_by_role("button", name="Login").click() page.wait_for_url("https://www.saucedemo.com/inventory.html") expect(page).to_have_screenshot("inventory.png", animations="disabled", full_page=True) -
First run:
pytest tests/test_visual.py -v. Both tests fail with "no baseline" — that's expected. -
Generate baselines:
pytest tests/test_visual.py --update-snapshots. Two PNG files appear undertests/__snapshots__/. -
Re-run normally:
pytest tests/test_visual.py -v. Both pass — the page renders identically to the baseline. -
Force a real diff. Use the
problem_useraccount instead ofstandard_user, which famously breaks product images on the inventory page. Run the inventory test — it should fail the visual diff. Opentest-results/to see the baseline, current, and diff PNGs side by side. -
Tune the threshold. Add
max_diff_pixel_ratio=0.05to the inventory snapshot and re-run withproblem_user. The 5% tolerance still fails because broken images differ by far more than 5%. Bump to0.30and re-run — passes (loose threshold). Restore to a sensible value (0.01). -
Mask dynamic content. If the page had a timestamp or ad, you'd mask it. Add a
mask=argument with a fake locator and confirm the test still passes — Playwright doesn't error on masks that don't match anything. -
Stretch: add per-component snapshots. Capture just the inventory page's header (
page.locator(".header_container")) and just the footer (page.locator(".footer")) as separate element-level snapshots. Component-level snapshots stay green even when the middle of the page changes.
You've got the visual-testing primitive. The next lesson covers full-page vs element vs cross-viewport visual testing — when each shape fits and how to organise a visual suite across breakpoints and browsers.