Visual AI Testing — Applitools, Percy

Functional tests verify that the behaviour of your app is correct: clicking a button submits a form, the API returns 200, the order appears in the database. They are spectacularly bad at noticing that the form button is now invisible because of a z-index regression, that the company logo has been replaced by a broken-image icon, or that the footer is overlapping the checkout button on iPhone SE. Visual testing fills that gap. AI is what turns visual testing from a flaky-noise generator into something you can actually rely on.

Pixel-diff vs Visual AI

A pixel-diff tool takes a screenshot, compares it to a baseline pixel-by-pixel, and fails the test if anything is different. It catches everything — including invisible-to-humans changes like font anti-aliasing, sub-pixel rendering, scrollbar variations, dynamic timestamps, and ad iframes. The result is a flood of false positives. Teams either ignore the failures or stop running the tests.

Visual AI uses a model trained to distinguish meaningful UI changes from rendering noise. Anti-aliasing differences are ignored; a missing element is flagged. The signal-to-noise ratio is the headline benefit, and it's the difference between a tool that gets used and one that gets disabled in CI.

Pixel-diff vs Visual AI

Pixel-diff

Every pixel difference fails the test
False positives: anti-aliasing, fonts, dynamic timestamps
Cross-browser tests almost unusable
Free, simple to set up
Result: teams disable visual tests

Visual AI

Ignores rendering noise the model deems irrelevant
Catches layout shifts, missing elements, broken images
Cross-browser/device with appropriate tolerance
Costs money but the signal is trustworthy
Result: visual tests stay in CI

The main tools

Applitools Eyes. The category leader. "Visual AI" engine integrates with Playwright, Cypress, Selenium, Storybook, and most other frameworks. Powerful dashboard for reviewing diffs.
Percy (BrowserStack). Visual review with smart diffing. Tight integration with the rest of BrowserStack's grid.
Chromatic (by the Storybook team). Excellent for component-level visual testing — pairs naturally with a Storybook-driven design system.
Playwright built-in. toHaveScreenshot() does pixel-diff with a configurable tolerance. No AI, but free, and good enough for a lot of teams.

A Playwright + Applitools example

import { test } from '@playwright/test';
import { Eyes, BatchInfo, Target } from '@applitools/eyes-playwright';
 
test('homepage visual', async ({ page }) => {
  const eyes = new Eyes();
  await eyes.open(page, 'MyApp', 'Homepage Test');
  await page.goto('https://myapp.com');
  await eyes.check('Homepage', Target.window().fully());
  await eyes.close();
});

The check sends the screenshot to the Applitools cloud, which runs the diff and stores the result. Failures appear in a dashboard where you review them side-by-side with the baseline and decide: "expected change — accept as new baseline" or "real bug — fix the code."

What Visual AI catches that functional tests miss

Two elements unintentionally overlapping each other.
A button that's still functional but rendered behind a modal overlay.
The company logo replaced with a broken-image icon (image source changed).
A colour regression — a "danger" red turning into a "success" green.
Missing icons or fonts on specific browsers.
Layout shifts on a specific viewport size that no functional test would notice.

These are bugs that hurt customer trust, but no expect(button).toBeVisible() assertion notices them.

What it ignores by design

Anti-aliasing and sub-pixel rendering across browsers/OSes.
Dynamic content explicitly marked as ignored (timestamps, ad slots, video frames).
Scrollbar variations.
Cursor position.

You can also tell the engine to ignore specific regions per page — useful for chat widgets, A/B-tested banners, and live dashboards.

Cross-browser and cross-device

This is where Visual AI shines. The same baseline is compared against renders from Chrome, Firefox, Safari, and various mobile viewports. The model knows that the same page on iPhone SE legitimately has a different layout from a 1440px desktop, and it scores accordingly. Pixel-diff tools choke on this; Visual AI is built for it.

Costs

Applitools. Free tier with limited screenshots. Paid tiers scale by snapshot volume into mid hundreds to low thousands per month for serious usage.
Percy. Free tier; paid tiers in the low hundreds per month.
Chromatic. Free for open-source, paid for commercial.
Playwright built-in. Free.

When Visual AI is essential

Consumer-facing sites with significant UI surface area.
Cross-browser/device matrices that pixel-diff would drown.
Teams with strong brand or design-system discipline — visual regressions are real bugs in your context.
Marketing sites and landing pages where layout breakage is reputational.

When you might skip

API-heavy backend apps with little UI.
Internal tools where minor visual changes are acceptable.
Greenfield projects where you can revisit later — functional coverage first, visual second.

For framework-specific patterns, see the visual regression sections in the Cypress with TypeScript and Playwright with TypeScript courses.

⚠️ Common Mistakes

Treating every visual diff as a bug. Half the time it's an intentional design change. The dashboard's "accept as new baseline" workflow is part of the design — use it.
Snapping the whole page on every test. A focused snapshot of the changed component is faster, cheaper, and produces clearer diffs than a full-page screenshot on every flow.
Ignoring dynamic regions. Failing to mark out timestamps, ads, or chat widgets makes diffs noisy and erodes trust in the tool.
Skipping reviewer sign-off. Visual diffs need human eyes. Auto-accepting is the fastest way to silently rebaseline a real bug.

🎯 Practice Task

60 minutes.

Sign up for Applitools, Percy, or Chromatic free tier.
Add visual checks to one existing Playwright or Cypress spec — capture three key screens.
Make a small UI change in your app (intentionally) and a small CSS regression (deliberately ugly).
Run the suite and review the diffs in the dashboard. Accept the intentional change; reject the regression.
Compare to running pixel-diff with toHaveScreenshot() on the same change. Note which tool gives clearer signal.

Next lesson: AI-augmented recorders that let non-engineers author end-to-end tests.