Guided Walkthrough Part 2 — API Tests, Visual Tests, and CI/CD

Part 1 ended with a working framework and three passing tests. Part 2 fills in the rest: API tests using page.request, network mocks for empty/error/slow UI states, visual regressions via to_have_screenshot, accessibility scans with axe-playwright-python, and the GitHub Actions workflow that runs all of it on every PR with parallel execution and Allure reporting. By the end of this lesson you have the full 25-test capstone delivering CI-grade results.

Step 6 — API tests using `page.request`

The tests/api/ folder gets five tests that exercise /api/tasks directly. Tag them @pytest.mark.api so they can run without a browser when needed.

utils/api_client.py:

from playwright.sync_api import APIRequestContext
 
 
class ApiClient:
    def __init__(self, request: APIRequestContext):
        self.request = request
 
    def create_task(self, data: dict) -> dict:
        response = self.request.post("/api/tasks", json=data)
        assert response.ok, f"create_task failed: {response.status} {response.text()}"
        return response.json()
 
    def get_task(self, task_id: int) -> dict:
        response = self.request.get(f"/api/tasks/{task_id}")
        assert response.ok
        return response.json()
 
    def update_task(self, task_id: int, data: dict) -> dict:
        response = self.request.patch(f"/api/tasks/{task_id}", json=data)
        assert response.ok
        return response.json()
 
    def delete_task(self, task_id: int):
        response = self.request.delete(f"/api/tasks/{task_id}")
        assert response.ok
 
    def list_tasks(self) -> list[dict]:
        response = self.request.get("/api/tasks")
        assert response.ok
        return response.json()

tests/conftest.py (add to existing):

from utils.api_client import ApiClient
 
@pytest.fixture
def api(member_page) -> ApiClient:
    """API client authenticated as the member role via shared storage state."""
    return ApiClient(member_page.request)

tests/api/test_tasks_api.py:

import pytest
from utils.api_client import ApiClient
from utils.data_factory import create_task
 
 
@pytest.mark.api
class TestTasksApi:
    def test_create_task(self, api: ApiClient):
        task = create_task(title="API Created Task", priority="high")
        created = api.create_task(task.__dict__)
        assert created["title"] == task.title
        assert "id" in created
 
    def test_read_task(self, api: ApiClient):
        task = create_task()
        created = api.create_task(task.__dict__)
        fetched = api.get_task(created["id"])
        assert fetched["id"] == created["id"]
        assert fetched["title"] == task.title
 
    def test_update_task(self, api: ApiClient):
        created = api.create_task(create_task().__dict__)
        updated = api.update_task(created["id"], {"status": "done"})
        assert updated["status"] == "done"
 
    def test_delete_task(self, api: ApiClient):
        created = api.create_task(create_task().__dict__)
        api.delete_task(created["id"])
        # GETting after delete returns 404
        response = api.request.get(f"/api/tasks/{created['id']}")
        assert response.status == 404
 
    def test_unauthenticated_returns_401(self, playwright, base_url):
        # Fresh context with no auth — should be rejected
        anon = playwright.request.new_context(base_url=base_url)
        response = anon.get("/api/tasks")
        assert response.status == 401
        anon.dispose()

Run with pytest -m api -v. Five tests verifying the API contract, fast (no browser per test for the first four), independent (each creates and deletes its own task).

Step 7 — Network mocking for hard-to-reproduce states

Three UI states a real backend doesn't easily produce: empty list, error, slow loading. Each becomes a route-mock test.

tests/tasks/test_mocked_states.py:

import json
import time
import pytest
from playwright.sync_api import expect
from pages.task_list_page import TaskListPage
 
 
@pytest.mark.regression
class TestMockedStates:
    def test_empty_state_when_no_tasks(self, member_page):
        # Mock the GET /api/tasks endpoint to return an empty list
        member_page.route(
            "**/api/tasks",
            lambda route: route.fulfill(json=[]),
        )
        page = TaskListPage(member_page)
        page.goto()
        expect(member_page.get_by_text("No tasks yet")).to_be_visible()
        expect(page.task_cards).to_have_count(0)
 
    def test_error_state_on_500(self, member_page):
        member_page.route(
            "**/api/tasks",
            lambda route: route.fulfill(status=500, body="Server error"),
        )
        page = TaskListPage(member_page)
        page.goto()
        expect(member_page.get_by_text("Couldn't load tasks")).to_be_visible()
        expect(member_page.get_by_role("button", name="Retry")).to_be_visible()
 
    def test_loading_spinner_during_slow_response(self, member_page):
        def slow(route):
            time.sleep(3)
            route.fulfill(json=[])
        member_page.route("**/api/tasks", slow)
 
        page = TaskListPage(member_page)
        page.goto()
        # The spinner is visible while the route is sleeping
        expect(member_page.get_by_test_id("loading-spinner")).to_be_visible()
        # Eventually the empty state appears
        expect(member_page.get_by_text("No tasks yet")).to_be_visible(timeout=10_000)

Three tests, three rare UI states, no backend changes required. The 500 test in particular is the kind of coverage almost no team writes manually because reproducing a server error against staging is awkward.

Step 8 — Visual regression tests

tests/test_visual.py:

import pytest
from playwright.sync_api import expect
from pages.task_list_page import TaskListPage
 
 
@pytest.mark.visual
class TestVisualRegression:
    def test_task_list_layout(self, task_list_page: TaskListPage):
        task_list_page.goto()
        expect(task_list_page.page).to_have_screenshot(
            "task-list.png",
            animations="disabled",
            mask=[task_list_page.page.get_by_test_id("timestamp")],
            max_diff_pixel_ratio=0.01,
        )
 
    def test_task_list_responsive_mobile(self, task_list_page: TaskListPage):
        task_list_page.page.set_viewport_size({"width": 375, "height": 667})
        task_list_page.goto()
        expect(task_list_page.page).to_have_screenshot(
            "task-list-mobile.png",
            animations="disabled",
            full_page=True,
        )

Generate baselines: pytest -m visual --update-snapshots. Subsequent runs compare against them. The mask hides timestamps so dynamic content doesn't trigger false diffs.

Step 9 — Accessibility audits

tests/test_a11y.py:

import pytest
from axe_playwright_python.sync_playwright import Axe
from pages.task_list_page import TaskListPage
 
 
@pytest.fixture
def axe_scanner():
    return Axe()
 
 
@pytest.mark.a11y
class TestAccessibility:
    def test_login_page_a11y(self, page, axe_scanner):
        page.goto("/login")
        results = axe_scanner.run(page, options={"runOnly": ["wcag2a", "wcag2aa"]})
        critical = [v for v in results.violations if v["impact"] in ["critical", "serious"]]
        assert len(critical) == 0, "\n".join(
            f"[{v['impact']}] {v['id']}: {v['description']}" for v in critical
        )
 
    def test_task_list_a11y(self, task_list_page: TaskListPage, axe_scanner):
        task_list_page.goto()
        results = axe_scanner.run(task_list_page.page)
        critical = [v for v in results.violations if v["impact"] in ["critical", "serious"]]
        assert len(critical) == 0
 
    def test_new_task_dialog_a11y(self, task_list_page: TaskListPage, axe_scanner):
        task_list_page.goto()
        task_list_page.open_new_task_dialog()
        # Scan only the dialog
        results = axe_scanner.run(
            task_list_page.page,
            context={"include": [["[role='dialog']"]]}
        )
        critical = [v for v in results.violations if v["impact"] in ["critical", "serious"]]
        assert len(critical) == 0

WCAG A and AA gates the team commits to. The third test scopes the scan to just the dialog — useful when the dialog has its own accessibility concerns that shouldn't be drowned out by violations elsewhere on the page.

Step 10 — GitHub Actions CI with parallelism and Allure

.github/workflows/playwright.yml:

name: TaskMaster Tests
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  schedule:
    - cron: "0 2 * * *"
 
env:
  BASE_URL: ${{ secrets.STAGING_URL }}
 
jobs:
  test:
    timeout-minutes: 30
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        browser: [chromium, firefox]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
          cache: pip
 
      - name: Cache Playwright browsers
        uses: actions/cache@v4
        with:
          path: ~/.cache/ms-playwright
          key: playwright-${{ hashFiles('requirements.txt') }}
 
      - run: |
          pip install -r requirements.txt
          playwright install --with-deps ${{ matrix.browser }}
 
      - name: Run tests in parallel
        run: |
          pytest tests/ \
            --browser ${{ matrix.browser }} \
            -n 4 \
            --reruns 1 --reruns-delay 1 \
            --alluredir=allure-results \
            --html=reports/report.html --self-contained-html \
            --junitxml=reports/junit.xml
 
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: results-${{ matrix.browser }}
          path: |
            allure-results/
            reports/
            test-results/
          retention-days: 30
 
  publish-allure:
    needs: test
    runs-on: ubuntu-latest
    if: always()
    steps:
      - uses: actions/download-artifact@v4
        with:
          path: artifacts/
 
      - name: Merge Allure results
        run: |
          mkdir -p allure-results
          cp -r artifacts/results-*/allure-results/* allure-results/ || true
 
      - uses: simple-elf/allure-report-action@v1.7
        with:
          allure_results: allure-results
          allure_report: allure-report
 
      - uses: actions/upload-artifact@v4
        with:
          name: allure-report
          path: allure-report/

Two jobs: matrix-strategy test runs the suite on Chromium and Firefox in parallel (with -n 4 workers each, so 4-way parallelism within each runner), and publish-allure waits for both to finish, merges the results, and renders the dashboard. Visual and a11y tests are tagged with markers — adjust the pytest invocation to include or exclude them per tier.

The advanced layers in shape

UI tests (auth, CRUD, filtering)15 tests using page objects and storage-…

API tests via page.request5 contract tests for /api/tasks. Tagged…

Network-mocked states3 tests covering empty/500/slow states t…

Visual + a11y5 tests using to_have_screenshot and axe…

GitHub Actions matrix CITwo browsers × four xdist workers, Allur…

Combining storage state with `pytest-xdist`

Storage state and parallelism interact cleanly because each xdist worker has its own session — the session-scoped login fixture runs once per worker, not once per test. With -n 4 you get four workers, four logins, and four storage-state files (admin.json, member.json per worker), then every test in that worker's slice opens a context from the saved file in milliseconds.

The one gotcha: tests that modify storage state mid-test (deleting cookies, logging out) shouldn't share the file with other tests. Either give those tests their own context (built fresh from the storage state, then mutated freely without affecting the on-disk file) or mark them @pytest.mark.serial and run them outside xdist.

Allure categories — surfacing the right failures

Drop categories.json in allure-results/ to teach Allure how to bucket failures:

[
  {
    "name": "UI failures",
    "matchedStatuses": ["failed"],
    "messageRegex": ".*Locator.*|.*Timeout.*|.*element.*not visible.*"
  },
  {
    "name": "API failures",
    "matchedStatuses": ["failed"],
    "messageRegex": ".*status.*5\\d\\d.*|.*APIRequestContext.*"
  },
  {
    "name": "Visual regressions",
    "matchedStatuses": ["failed"],
    "messageRegex": ".*Screenshot comparison failed.*|.*pixel ratio.*"
  },
  {
    "name": "Accessibility violations",
    "matchedStatuses": ["failed"],
    "messageRegex": ".*violations.*|.*WCAG.*"
  }
]

The Allure dashboard now shows failures grouped by category — UI, API, Visual, A11y — instead of a flat list. Drill into "Visual regressions" and you see only the snapshot diffs; drill into "API failures" and you see only the contract failures. PMs and product managers can scan the dashboard without having to read Python tracebacks.

The 25-test count, broken down

To finish the capstone, you have:

Auth (5) — login, register, logout, invalid credentials, session-persistence-across-reload.
Tasks CRUD (5) — create via UI, edit, mark complete, delete with confirmation, empty-title validation.
Filtering (5) — by status, priority, assignee, due-date range, search query.
API (5) — create, read, update, delete, unauthenticated 401.
Visual + a11y (5) — task list visual, mobile viewport visual, login a11y, task list a11y, dialog a11y.
Plus 3 mocked-state tests — empty, error, slow. Bonus.

Run them all together: pytest tests/ -v -n auto. With matrix-CI on two browsers and four workers per browser, the whole thing runs in under 10 minutes wall time on a GitHub-hosted Ubuntu runner.

What "done" looks like

Your repo at the end of Part 2:

25-30 tests, all green.
A .github/workflows/playwright.yml that runs them on every push.
Allure dashboard accessible from the workflow artefacts.
Visual baselines committed in tests/__snapshots__/.
A README that describes the project, the test pyramid, and how to run the suite locally and in CI.
A .auth/ and reports/ directory in .gitignore.

That's a production-quality test framework. The next lesson is the self-assessment — review what you built, identify gaps, and pick from a list of stretch goals to push the capstone further.