Organising Tests — Folders, Markers, and Tagging

8 min read

A 20-test suite is fine in one file. A 200-test suite is not. Real Playwright Python projects organise tests by feature folder, tag them with markers for selective running, and rely on pytest's keyword and folder filters to slice the suite for any given run — smoke on every PR, full regression nightly, slow tests only on release candidates. This lesson covers the conventions that scale: folder layout, custom markers, marker registration, the four ways to filter (-m, -k, folder paths, file patterns), and skip / xfail for tests that legitimately can't run yet.

Folder structure for a real suite

The convention that scales:

tests/
├── conftest.py
├── auth/
│   ├── conftest.py
│   ├── test_login.py
│   ├── test_registration.py
│   └── test_password_reset.py
├── products/
│   ├── conftest.py
│   ├── test_listing.py
│   ├── test_search.py
│   └── test_detail.py
├── checkout/
│   ├── test_cart.py
│   ├── test_shipping.py
│   └── test_payment.py
└── api/
    ├── test_users_api.py
    └── test_products_api.py

The rules of thumb:

  • Top-level folders by feature, not by test type. auth/ and checkout/ map to the product surfaces real users care about. unit/, integration/, e2e/ is a developer-mental-model split that doesn't help anyone navigate the suite. Use markers for test type instead.
  • One file per scenario or feature surface. test_login.py covers everything login-related; the file groups happy paths and validation cases together, so a developer changing the login flow knows where to look.
  • conftest.py per folder when the folder has fixtures only it needs. tests/auth/conftest.py for auth-only fixtures; root conftest.py for cross-cutting ones.
  • Don't mirror your src/ tree exactly. Tests are organised around what they verify, not who implements it. A test that crosses two services lives in the folder for the user-facing flow, not in the folder for either service.

Markers — tagging tests for selective runs

Markers are pytest's tagging system. Apply one to a test with @pytest.mark.<name>:

import pytest
from playwright.sync_api import Page, expect
 
 
@pytest.mark.smoke
def test_homepage_loads(page: Page):
    page.goto("/")
    expect(page).to_have_title("My App")
 
 
@pytest.mark.regression
def test_product_filtering(page: Page):
    page.goto("/products")
    page.get_by_label("Category").select_option("Electronics")
    expect(page.get_by_test_id("product-card")).to_have_count(12)
 
 
@pytest.mark.slow
def test_full_checkout_flow(page: Page):
    # ... 25 lines of end-to-end checkout ...
    pass
 
 
@pytest.mark.api
def test_create_user_via_api(page: Page):
    response = page.request.post("/api/users", data={"name": "Test"})
    assert response.ok

Multiple markers stack:

@pytest.mark.smoke
@pytest.mark.api
def test_health_endpoint(page: Page):
    response = page.request.get("/api/health")
    assert response.ok

A single test can be both smoke and api — useful when the same test belongs to multiple selection slices.

Registering markers — silence the warnings

Out of the box, every custom marker triggers a PytestUnknownMarkWarning. Register them in pytest.ini:

[pytest]
markers =
    smoke: critical-path smoke tests run on every PR
    regression: full regression suite run nightly
    slow: tests that take more than 30 seconds
    api: API-only tests (no browser launched)
    flaky: known flaky tests, kept out of the gating run
    wip: work-in-progress tests, not yet ready for CI

Two reasons to do this:

  1. Silence warnings. A 200-test run with 50 unknown-marker warnings is unreadable.
  2. Document the markers. The description (: critical-path smoke tests...) is shown by pytest --markers, giving new contributors a single source of truth for what each tag means.

Markers that aren't registered will eventually start failing the test run if you set filterwarnings = error::pytest.PytestUnknownMarkWarning, which a strict CI does. Register every marker your team uses.

Running by marker — -m

The four most useful invocations:

pytest -m smoke                       # run only smoke tests
pytest -m "not slow"                  # skip slow tests
pytest -m "smoke or regression"       # run smoke OR regression
pytest -m "smoke and not api"         # smoke tests that don't hit the API

The -m argument is a boolean expression over markers. The CI patterns most teams settle on:

  • PR check: pytest -m smoke — under 5 minutes, gates merges.
  • Nightly: pytest -m "not slow" — full coverage minus the multi-minute end-to-end flows.
  • Release candidate: pytest — everything, slow tests included.

Running by keyword — -k

-k filters by test name substring (case-insensitive):

pytest -k "login"                     # tests with "login" in the name
pytest -k "login and not admin"       # login tests that aren't admin
pytest -k "test_login or test_register"

The expression matches against the test's full id, including any parametrize id. So pytest -k "admin-login" matches a parametrized case with id="admin-login" regardless of the function name. Useful when chasing a single failing case from a large parametrized test.

Running by folder or file path

The simplest filter: a path argument.

pytest tests/auth/                    # only auth tests
pytest tests/auth/ tests/products/    # auth + products
pytest tests/auth/test_login.py       # one file
pytest tests/auth/test_login.py::TestLoginValidation::test_empty_email_fails  # one method

Path filters compose with markers and keywords:

pytest tests/auth/ -m smoke -k "not admin"

Run smoke tests in the auth folder that don't have "admin" in their name. Three filters layered.

Test selection — the full picture

Skip and xfail — for tests that can't pass yet

Sometimes a test can't run — environment isn't ready, OS doesn't match, or the feature isn't implemented:

import sys
import pytest
 
 
@pytest.mark.skip(reason="Feature not implemented yet — JIRA-456")
def test_future_feature(page: Page):
    pass
 
 
@pytest.mark.skipif(sys.platform == "win32", reason="Linux/macOS only")
def test_uses_unix_socket(page: Page):
    pass
 
 
@pytest.mark.skipif("not config.getoption('--run-slow')", reason="--run-slow not set")
def test_long_running(page: Page):
    pass

skip always skips. skipif skips conditionally — the second form is more useful because the conditional appears in the report so you know why the test was skipped.

xfail is for tests that are expected to fail because of a known bug:

@pytest.mark.xfail(reason="Known bug — JIRA-789, fix queued for sprint 23")
def test_known_broken(page: Page):
    page.goto("/broken-feature")
    expect(page.get_by_text("Should appear")).to_be_visible()  # currently fails

The test runs, but a failure is reported as XFAIL (expected fail) instead of FAIL — green CI run. If it ever passes, pytest reports XPASS (unexpected pass), prompting you to remove the marker. This is the right tag for a test that documents a regression you've decided not to fix yet — it stops gating CI without being silently deleted.

Coming from Playwright TypeScript?

The mappings:

  • TS test.describe.skip("Feature", ...) → Python @pytest.mark.skip(reason="...")
  • TS test.fixme("Will fix", ...) → Python @pytest.mark.xfail(reason="...")
  • TS test.slow() → Python @pytest.mark.slow (a custom marker, registered in pytest.ini)
  • TS test.describe.parallel(...) → not directly equivalent — pytest parallelism comes from pytest-xdist, covered in chapter 7
  • TS --grep "login" → Python pytest -k "login"
  • TS tag inside test name (test('@smoke logs in', ...)) → Python @pytest.mark.smoke decorator

The Python markers system is genuinely cleaner — markers are first-class, registered, documented, and combinable with boolean expressions. The TS test runner's grep-on-test-name approach works, but it's a less ergonomic substitute.

A typical CI matrix

The shape most teams converge on for their .github/workflows/test.yml (chapter 7 covers the full GitHub Actions setup):

- name: Smoke tests (every PR)
  run: pytest -m smoke --browser chromium
 
- name: Cross-browser smoke (every PR)
  run: pytest -m smoke --browser chromium --browser firefox --browser webkit
 
- name: Full regression (nightly)
  run: pytest -m "not slow"
 
- name: Slow + flaky (release candidate only)
  run: pytest -m "slow or flaky" || true   # don't fail the release on flakies

Three slices of the same suite, three different gating policies. Markers make this trivial; without them, you'd be maintaining four separate test directories.

⚠️ Common mistakes

  • Forgetting to register a marker in pytest.ini. Every test using @pytest.mark.foo triggers a PytestUnknownMarkWarning until foo appears in the markers list. On a strict CI (filterwarnings = error), the run fails entirely. Register the marker the moment you introduce it.
  • Using folder structure as the only organisation, no markers. tests/smoke/, tests/regression/ seems clean but it's the wrong split — the same test often belongs to both (a smoke test that's also part of regression). Markers are tags; folders are categories. Use both: folders by feature, markers by run-policy.
  • Letting xfail markers rot. An xfail is a debt — every one is a bug you've decided not to fix yet. Without a periodic audit (review them every sprint, demand a JIRA link in the reason), xfail grows into a graveyard of tests nobody touches. Treat the count of xfail markers as a metric and trend it down.

🎯 Practice task

Organise your project's tests by feature and tag them properly. 30 minutes.

  1. Restructure your tests/ folder into feature areas:

    tests/
    ├── conftest.py
    ├── auth/
    │   └── test_login.py
    ├── inventory/
    │   └── test_products.py
    └── checkout/
        └── test_checkout.py
    
  2. Add markers to existing tests:

    # tests/auth/test_login.py
    import pytest
    from playwright.sync_api import Page, expect
     
    @pytest.mark.smoke
    def test_login_with_valid_credentials(page: Page):
        page.goto("/")
        page.get_by_placeholder("Username").fill("standard_user")
        page.get_by_placeholder("Password").fill("secret_sauce")
        page.get_by_role("button", name="Login").click()
        expect(page).to_have_url("/inventory.html")
     
    @pytest.mark.regression
    def test_login_with_locked_out_user(page: Page):
        # ...
        pass
  3. Tag inventory tests as regression and checkout tests as regression and slow:

    @pytest.mark.regression
    @pytest.mark.slow
    def test_full_checkout_flow(page: Page):
        # ...
  4. Register the markers in pytest.ini:

    [pytest]
    addopts = --browser chromium
    base_url = https://www.saucedemo.com
    markers =
        smoke: critical-path smoke tests run on every PR
        regression: full regression suite run nightly
        slow: tests that take more than 30 seconds
  5. Run each slice and observe what gets selected:

    pytest -m smoke -v                       # only the login smoke test
    pytest -m "regression and not slow" -v   # regression, but skip slow
    pytest tests/auth/ -v                    # everything in auth folder
    pytest -k "login" -v                     # any test with "login" in the name
    pytest -m "smoke or slow" -v             # smoke OR slow tests
  6. Add a skipped test. Add a placeholder for a feature you haven't built yet:

    @pytest.mark.skip(reason="Two-factor auth not implemented yet — TODO-123")
    def test_login_with_2fa(page: Page):
        pass

    Run with -rs (pytest -rs) to see the skip reason in the summary. The test appears as SKIPPED with the reason, not as a failure.

  7. Stretch: add an xfail for a "known broken" feature on Sauce Demo (if you can find one — the problem_user account is famous for showing wrong product images). Write the test with the assertion you'd want to pass; mark it @pytest.mark.xfail(reason="problem_user shows wrong images"). Run the suite — XFAIL appears in the summary, not FAIL.

You've completed the test-organisation chapter. The next chapter shifts gears to network and API testingpage.route for mocking, the request fixture for direct API calls, and the API-setup-then-UI-test pattern that's one of Playwright's biggest superpowers.

// tip to track lessons you complete and pick up where you left off across devices.