Accessibility Testing with axe-playwright-python

Accessibility (a11y) bugs are the bugs the rest of your team can't see. Buttons without labels, colour contrasts that fail at 7am and pass at 7pm, form fields the tab order skips. Manual auditing catches some; automated scanning catches the rest — at PR time, before production. The standard scanning engine is axe-core, the same library that powers axe DevTools. The Python wrapper is axe-playwright-python, which integrates axe-core directly into your pytest-playwright tests. By the end of this lesson you'll have a fixture that scans any page for WCAG violations, custom rule selection, and assertions that gate CI on critical issues. The Manual Software Testing course has the conceptual background for accessibility — this lesson is the automation layer on top.

Installing the library

pip install axe-playwright-python

The package wraps axe-core (a JavaScript engine), injects it into the page under test, runs the scan, and returns the results to Python. No browser-extension setup, no separate configuration — just install and import.

The smallest accessibility test

from axe_playwright_python.sync_playwright import Axe
from playwright.sync_api import Page
 
 
def test_homepage_accessibility(page: Page):
    page.goto("/")
    axe = Axe()
    results = axe.run(page)
    assert results.violations_count == 0, results.generate_report()

Three steps:

Construct an Axe() scanner.
Call axe.run(page) to execute the scan against the current page state.
Assert no violations; on failure, print a human-readable report.

results.generate_report() produces a formatted multi-line summary — rule id, impact level, affected node count, fix suggestions. Drop it straight into the assert message and pytest prints it on failure.

What axe actually checks

axe-core runs ~100 rules drawn from the WCAG 2.1 specification, organised into impact levels:

Critical — blocks users from completing core flows (no alt on a content image, button with no label, tab trap).
Serious — significantly impacts users (insufficient colour contrast, form field with no associated label).
Moderate — workaround-able but degrades experience (heading-level skip, redundant link text).
Minor — best-practice nudges (missing landmark, page region without label).

In a real suite, you usually fail CI on critical and serious only — the others go in a report you review periodically.

Filtering by impact level

Out of the box, assert results.violations_count == 0 fails on any violation, including minor ones. Real teams gate on the high-impact ones:

def test_homepage_critical_a11y(page: Page):
    page.goto("/")
    results = Axe().run(page)
    high_impact = [v for v in results.violations if v["impact"] in ["critical", "serious"]]
    assert len(high_impact) == 0, (
        f"{len(high_impact)} critical/serious a11y violations:\n"
        + "\n".join(f"  [{v['impact']}] {v['id']}: {v['description']}" for v in high_impact)
    )

The list comprehension filters by impact; the assert message lists each violation with its severity, rule id, and description. Reviewers see exactly which rules failed without opening the JSON.

Targeting specific elements — `include` and `exclude`

Sometimes you want to scan only one part of the page (the new feature you shipped) or skip a known-broken third-party widget:

# Only scan inside the main content area
results = Axe().run(page, context={"include": [["#main-content"]]})
 
# Skip the third-party chat widget (we can't fix vendor code)
results = Axe().run(page, context={"exclude": [["#third-party-chat"]]})
 
# Combine — scan main content but skip its embedded ads
results = Axe().run(page, context={
    "include": [["#main-content"]],
    "exclude": [["#main-content .ad-slot"]],
})

The include/exclude values are arrays of CSS-selector arrays — that's the axe-core API shape. The outer array lets you pass multiple roots; the inner [selector] is just [selector] for a single CSS query.

Selecting rule sets

axe organises rules into tags — wcag2a, wcag2aa, wcag21aa, best-practice, etc. Run only what your team commits to:

# WCAG A only (lowest level)
results = Axe().run(page, options={"runOnly": ["wcag2a"]})
 
# WCAG A and AA (most teams' target)
results = Axe().run(page, options={"runOnly": ["wcag2a", "wcag2aa"]})
 
# WCAG 2.1 A and AA — newer spec
results = Axe().run(page, options={"runOnly": ["wcag2a", "wcag2aa", "wcag21aa"]})

You can also exclude specific rules that don't apply to your app:

results = Axe().run(page, options={
    "runOnly": ["wcag2a", "wcag2aa"],
    "rules": {"color-contrast": {"enabled": False}},  # we use a custom contrast checker
})

A scanner fixture for reuse

The pattern that scales across a suite — one fixture, used everywhere:

import pytest
from axe_playwright_python.sync_playwright import Axe
 
 
@pytest.fixture
def axe_scanner():
    return Axe()
 
 
def test_login_page_a11y(page: Page, axe_scanner):
    page.goto("/login")
    results = axe_scanner.run(page)
    assert results.violations_count == 0
 
 
def test_dashboard_a11y(authed_page, axe_scanner):
    authed_page.goto("/dashboard")
    results = axe_scanner.run(authed_page)
    assert results.violations_count == 0

Now every test that wants to scan declares axe_scanner and calls axe_scanner.run(page). Move scanner configuration into the fixture if you want a project-wide rule set:

@pytest.fixture
def axe_scanner():
    axe = Axe()
    # Set defaults here — every consumer gets them automatically
    return axe

Custom reporting — surfacing what matters

The default results.generate_report() is dense. For PR comments and CI summaries, format the output yourself:

def test_homepage_a11y_with_summary(page: Page, axe_scanner):
    page.goto("/")
    results = axe_scanner.run(page)
 
    if results.violations:
        print("\n=== Accessibility Violations ===")
        for v in results.violations:
            print(f"[{v['impact'].upper()}] {v['id']}: {v['description']}")
            print(f"  Affected: {len(v['nodes'])} element(s)")
            print(f"  Help: {v['helpUrl']}")
            print()
 
    high_impact = [v for v in results.violations if v["impact"] in ["critical", "serious"]]
    assert len(high_impact) == 0, f"{len(high_impact)} high-impact violations"

Run with pytest -s (pass-through stdout) and you get a per-test summary on the console. Pair with the next lesson's reporting integration to attach the same data to Allure or pytest-html.

Common axe rules and their impacts

Common axe rule violations and their typical impact level

image-alt — images missing alt text95% impact

button-name — buttons with no accessible name95% impact

label — form fields without labels90% impact

color-contrast — insufficient text contrast80% impact

link-name — links with no accessible text85% impact

duplicate-id — duplicate element IDs70% impact

heading-order — heading levels skipped50% impact

landmark-one-main — page missing main landmark40% impact

The numbers are illustrative — they reflect typical severity, not a strict ranking. The takeaway: image/button/label/contrast violations are the ones that block users; landmark and heading-order issues are more about discoverability and best practice.

Combining with the rest of the suite

A11y tests fit into your pytest organisation like any other test. A reasonable layout:

Per-page a11y test in the same file as the page's functional tests.
Marker @pytest.mark.a11y so you can run the a11y subset (pytest -m a11y) in a dedicated CI job.
Multi-page parametrize for a global scan: @pytest.mark.parametrize("path", ["/", "/products", "/login", "/checkout"]).

We'll wire the multi-page sweep into a per-route audit in the next lesson.

Coming from Playwright TypeScript?

The TypeScript course uses @axe-core/playwright. The Python wrapper is functionally identical:

TS import { AxeBuilder } from '@axe-core/playwright' → Python from axe_playwright_python.sync_playwright import Axe
TS await new AxeBuilder({ page }).analyze() → Python Axe().run(page)
TS .include('#main') chained methods → Python context={"include": [["#main"]]} dict
TS .disableRules(['color-contrast']) → Python options={"rules": {"color-contrast": {"enabled": False}}}

Same axe-core engine under the hood, same WCAG rule set, same impact-level taxonomy. The Python API is dict-driven instead of fluent-builder; that's the only structural difference.

⚠️ Common mistakes

Asserting on every violation, including minor ones. Failing CI on every minor heading-order tweak makes the team disable the test. Filter by impact (critical and serious) and trend the others as a metric. Fail the suite on regressions you can act on; track the rest.
Scanning the page before it's fully rendered. axe inspects the DOM at the moment of the call. If you call axe.run(page) immediately after page.goto, the SPA may still be hydrating. Add an explicit wait for a known landmark — expect(page.get_by_role("main")).to_be_visible() — before the scan.
Treating axe as the whole accessibility story. Automated tools catch ~30-40% of WCAG issues. They miss anything that requires judgement: are link texts meaningful? Does the keyboard tab order match the visual order? Is the colour-contrast technically passing but visually awful? Combine axe with manual screen-reader testing — the Manual Software Testing course covers the manual side.

🎯 Practice task

Add a11y testing to your suite. 30-40 minutes.

Install: pip install axe-playwright-python.

Add axe_scanner fixture to tests/conftest.py:

import pytest
from axe_playwright_python.sync_playwright import Axe
 
@pytest.fixture
def axe_scanner():
    return Axe()

markers =
    a11y: accessibility tests using axe-playwright-python

Create tests/test_a11y.py:

import pytest
from playwright.sync_api import Page
 
@pytest.mark.a11y
def test_login_page_a11y(page: Page, axe_scanner):
    page.goto("https://www.saucedemo.com/")
    results = axe_scanner.run(page)
    high_impact = [v for v in results.violations if v["impact"] in ["critical", "serious"]]
    assert len(high_impact) == 0, "\n".join(
        f"[{v['impact']}] {v['id']}: {v['description']}" for v in high_impact
    )
 
@pytest.mark.a11y
def test_inventory_page_a11y(page: Page, axe_scanner):
    page.goto("https://www.saucedemo.com/")
    page.get_by_placeholder("Username").fill("standard_user")
    page.get_by_placeholder("Password").fill("secret_sauce")
    page.get_by_role("button", name="Login").click()
    page.wait_for_url("https://www.saucedemo.com/inventory.html")
    results = axe_scanner.run(page)
    high_impact = [v for v in results.violations if v["impact"] in ["critical", "serious"]]
    assert len(high_impact) == 0

Run with pytest -m a11y -v -s. If Sauce Demo has any high-impact violations on these pages, the assertion fails and prints them. (Sauce Demo is a deliberately imperfect demo app — expect at least a few findings.)
Filter by rule set. Adjust the inventory test to only check WCAG 2 AA: axe_scanner.run(page, options={"runOnly": ["wcag2aa"]}). Re-run; the violation count drops.
Scope the scan. Adjust the login test to scan only the form: axe_scanner.run(page, context={"include": [[".login_wrapper-inner"]]}). Now violations outside the form aren't reported. Useful for incremental adoption — gate one component first, expand outward.

Stretch: add a parametrized a11y sweep across five paths:

@pytest.mark.a11y
@pytest.mark.parametrize("path", ["/", "/inventory.html", "/cart.html", "/checkout-step-one.html"])
def test_a11y_audit_paths(page: Page, axe_scanner, path: str):
    page.goto(f"https://www.saucedemo.com{path}")
    results = axe_scanner.run(page)
    critical = [v for v in results.violations if v["impact"] == "critical"]
    assert len(critical) == 0, f"{len(critical)} critical issues on {path}"

Some paths require login first (cart, checkout) — adapt with the auth fixture from the previous chapter.

You've got automated a11y scanning. The last lesson of this chapter is the reporting layer — saving JSON reports, pytest-html dashboards, and Allure attachments that turn raw axe output into something the team can act on.