Parallel Execution with pytest-xdist

8 min read

A 200-test suite that runs serially takes 20 minutes; the same suite spread across 4 workers takes ~5. pytest-xdist is pytest's official parallel-execution plugin, and it composes cleanly with pytest-playwright — every worker gets its own browser instance, tests run on whichever worker is free, and the wall time drops linearly with worker count up to the point where shared resources (database, rate-limited API) become the bottleneck. This lesson covers installation, the four distribution modes, the one rule parallel testing has (test isolation), how to combine xdist with multi-browser runs, and when fewer workers actually beats more.

Installing and running

pip install pytest-xdist

Three flags you'll use:

pytest tests/ -n auto                    # auto-detect CPU count
pytest tests/ -n 4                       # exactly 4 workers
pytest tests/ -n auto --dist loadfile    # group tests by file

-n auto matches the number of physical CPUs on the runner. On a 4-core dev laptop, that's 4 workers; on a 16-core CI runner, 16. -n 4 pins the count manually — useful when you know the suite saturates a shared resource at higher counts.

That's the entire setup. Add -n auto to your pytest.ini's addopts and every run is parallel by default.

How xdist actually works

xdist spawns N worker subprocesses at the start of the run. Each worker:

  1. Loads the same conftest.py and fixtures as the main process.
  2. Receives test ids from the controller.
  3. Executes them and reports results back.

For pytest-playwright, the practical implication is that each worker has its own browser instance. Three workers = three Chromium processes running in parallel. The browser launches are session-scoped per worker, so the cost is paid once per worker, not once per test.

Distribution modes — --dist

Four modes control how tests get assigned to workers:

ModeBehaviourWhen to use
loadscope (default)Group tests by module/classDefault — keeps fixture caching effective
loadfileEach worker gets entire test filesWhen module-scoped fixtures are expensive
loadRound-robin per testMaximum parallelism, breaks fixture sharing
noSequentialEquivalent to not using xdist

The default loadscope is the right choice 90% of the time — it keeps tests in the same module on the same worker, so module-scoped fixtures (logged-in pages, seeded data) are reused efficiently within a worker. loadfile is similar but cruder, useful for class-scoped fixtures specifically.

pytest tests/ -n auto --dist loadfile

If you're seeing fixtures rebuild more often than expected, try loadfile — it forces an entire test file onto one worker.

Test isolation — the one rule

Parallel testing is unforgiving of shared state. If two tests on different workers both POST /api/users with email alice@test.com, they collide. The unique-id problem is the single biggest source of bugs that only appear in parallel runs.

The fix is uniqueness in the test data:

import time
import uuid
import pytest
 
@pytest.fixture
def unique_email():
    return f"test-{int(time.time() * 1000)}-{uuid.uuid4().hex[:8]}@test.com"
 
@pytest.fixture
def unique_username():
    return f"user-{uuid.uuid4().hex[:12]}"

time.time() alone is not unique enough — two workers can run the same fixture in the same millisecond. uuid.uuid4() is genuinely unique, even across workers and machines. Combine them: timestamp for human-readable sorting in logs, UUID suffix for collision avoidance.

Anywhere you have a hardcoded "test@test.com" in a fixture, replace it with unique_email. Anywhere you have a fixed product name, append a UUID. The discipline is one-way — you can always run unique data serially, but you can't run hardcoded data in parallel.

Worker-scoped resources

Some resources need to be created per worker, not per test. xdist exposes the worker id via pytest_xdist_worker:

import pytest
 
@pytest.fixture(scope="session")
def worker_db_name(worker_id):
    """Each worker gets its own database to avoid collisions."""
    if worker_id == "master":  # not running under xdist
        return "test_db"
    return f"test_db_{worker_id}"

worker_id is "master" when xdist isn't active, "gw0", "gw1", etc. when it is. Use it for any external resource (DB, message queue, file system path) where parallelism would cause collisions.

Combining xdist with multi-browser runs

Multi-browser × parallelism is the big speedup:

pytest -n 4 --browser chromium --browser firefox

xdist multiplies test count by browser count, so a 50-test suite × 2 browsers = 100 test runs. With 4 workers, ~25 per worker. Wall time drops to roughly (total_tests × browsers) / workers × avg_test_duration. For 50 tests at 5 seconds each on 4 workers across 2 browsers: ~62 seconds vs ~500 seconds serial — almost a 10× speedup.

In CI, the matrix-strategy approach from the previous lesson and xdist parallelism are complementary. Matrix splits work across runners (each runner gets one browser); xdist parallelises within a runner. Both stack:

strategy:
  matrix:
    browser: [chromium, firefox, webkit]
steps:
  - run: pytest tests/ -n 4 --browser ${{ matrix.browser }}

Three runners (matrix), four workers each (xdist) = 12-way parallelism for the same 50-test suite. Wall time approaches the duration of the slowest single test.

Speed impact, visualised

Speedup is roughly linear up to the CPU count; past that, you fight for context switches without gaining wall-time. For 50 tests on a 4-core laptop, 4 workers is the sweet spot. CI runners with 16 cores can push to 8-16 workers if your tests are truly independent.

When to use fewer workers

More workers isn't always better. Reduce parallelism when:

  • Shared database with limited connections. 16 workers all opening transactions on a 10-connection pool causes mid-test failures. Cap workers below the connection limit.
  • Rate-limited external APIs. A third-party API that allows 10 requests/second saturates fast — set -n 4 so you stay under the limit.
  • Flaky tests where the flake is timing-related. Sometimes adding workers exposes races that didn't exist serially. Don't paper over with -n 1; fix the race, but use fewer workers temporarily while you investigate.
  • Memory-constrained CI runners. Each browser process uses 200-500 MB. 16 workers on a 4 GB runner OOMs. Pick a worker count that fits the runner's RAM.

CI configuration

For GitHub Actions, parallelism is one flag:

- name: Run tests
  run: pytest tests/ -n 2 --browser chromium --browser firefox

GitHub-hosted runners have 2-4 vCPUs depending on tier; -n 2 is a safe default. For larger self-hosted runners, raise to match the CPU count. Watch the run timing — if doubling workers doesn't halve wall time, you've hit a non-CPU bottleneck.

A typical CI pyramid with parallelism

The full mental model for a production project:

  • Smoke tier (every PR): pytest -m smoke -n 4 --browser chromium — under 2 minutes, gates merges.
  • Regression tier (on push to main): pytest -m "not slow" -n 8 --browser chromium --browser firefox — under 10 minutes, catches deeper regressions.
  • Full tier (nightly): pytest -n 16 --browser chromium --browser firefox --browser webkit — under 30 minutes, comprehensive.

Three slices of the same suite, three parallelism levels, three timing budgets. Markers from chapter 3 plus xdist from this lesson is what makes the pyramid practical.

Coming from Playwright TypeScript?

The TS Playwright runner has built-in parallelism via playwright.config.ts's workers option:

export default defineConfig({
  workers: process.env.CI ? 2 : 4,
});

The Python equivalent is pytest -n auto (or -n 4). Both reach the same outcome — N workers, parallel test execution — through different surface areas. The TS version is config-driven; the Python version is CLI-flag driven. For teams already using pytest, xdist is mechanically more familiar; for teams coming from the Playwright TS world, the mental model carries over directly.

⚠️ Common mistakes

  • Hardcoded test data colliding under parallelism. email="alice@test.com" works serially, breaks the moment two workers hit the same endpoint. Always use unique-per-test data — UUIDs, timestamps, or both. The sign of this bug is "tests pass when I run them one at a time but fail under -n auto."
  • Sharing fixture state across workers via files. Writing to tests/.auth/admin.json in a session-scoped fixture is fine — each worker has its own session. Writing to a single shared file (e.g., state.json at the project root) creates a race condition because multiple workers might overwrite it simultaneously. Per-worker filenames or per-test temp directories solve it.
  • Ignoring the bottleneck and just increasing -n. If 4 workers run a suite in 60s and 8 workers run it in 58s, you've hit a non-CPU bottleneck — usually a shared database or external API. Increasing workers further makes it worse, not better. Profile, find the bottleneck, then tune parallelism.

🎯 Practice task

Time the difference and tune your suite for parallelism. 25-30 minutes.

  1. Install: pip install pytest-xdist.

  2. Time the suite serially:

    pytest tests/ --durations=10

    Note the total wall time at the bottom of the output.

  3. Run with -n auto:

    pytest tests/ -n auto --durations=10

    Compare wall time. On a 4-core laptop, expect roughly 4× speedup if tests are truly independent.

  4. Find a parallelism bug. If you have any test that POSTs a hardcoded email or unique constraint, run with -n 4 repeatedly. Eventually a run fails because two workers tried to create the same record. The error in the test logs points at the conflicting field.

  5. Fix it. Replace the hardcoded value with a UUID-suffixed one:

    import uuid
    email = f"test-{uuid.uuid4().hex[:8]}@test.com"

    Re-run with -n 4 ten times — no flakes.

  6. Demonstrate loadfile vs loadscope. Run the suite with --dist loadscope (default) and time it. Then run with --dist loadfile and time again. For most suites the times are similar; if your tests share class-scoped fixtures, loadfile is sometimes faster.

  7. Stretch: add worker_id-based DB naming if your project has a database. Connect to test_db_${worker_id} so each worker has its own database. Run with -n 4 and confirm there are four databases at the end of the run.

You've got the parallel-execution toolkit. The next lesson covers Docker — the container that makes "works on my machine" obsolete.

// tip to track lessons you complete and pick up where you left off across devices.