Data Factories and Dynamic Test Data in Python

8 min read

A test that uses email="alice@test.com" works once. A test that uses email="alice@test.com" in parallel with three other tests using the same email is a flake-in-waiting. A test that does this in 50 different files is a refactor nightmare. Data factories are how you escape: a factory function builds a fresh object every call with sensible defaults, lets you override specific fields, and guarantees uniqueness automatically. Combine factories with dataclass (covered in the Python for QA course) and pytest fixtures, and your test data becomes deterministic, parallel-safe, and self-documenting. This lesson covers the factory pattern, the __post_init__ trick for unique fields, the API-seeded factory variant, and the cross-test factory fixture that handles cleanup.

The shape of a data factory

Start with a dataclass that holds the data, and a function that builds one with overrides:

# utils/data_factory.py
import time
from dataclasses import dataclass
 
 
@dataclass
class TestUser:
    name: str = "Test User"
    email: str = ""
    password: str = "TestPass123"
    role: str = "tester"
 
    def __post_init__(self):
        if not self.email:
            self.email = f"test-{int(time.time() * 1000)}@test.com"
 
 
def create_user(**overrides) -> TestUser:
    return TestUser(**overrides)

Three things to notice:

  1. Sensible defaults. Every field has a default that works for "I just need a user, I don't care which one." Most calls become create_user() — zero arguments, full object.
  2. __post_init__ for derived fields. email defaults to empty string, then __post_init__ fills it with a unique-per-call value. Tests don't have to remember to set it.
  3. **overrides for selective customisation. create_user(role="admin") overrides only role; the rest stay defaults.

Usage:

default_user = create_user()
admin = create_user(name="Admin User", role="admin")
specific = create_user(email="alice@specific.com")  # explicit email overrides the default

Each call returns a fresh object. No shared state, no collision risk.

Uniqueness — timestamps vs UUIDs

time.time() isn't unique across parallel tests. Two workers running on the same millisecond produce the same email. The fix is to add a UUID suffix:

import time
import uuid
 
 
@dataclass
class TestUser:
    name: str = "Test User"
    email: str = ""
    password: str = "TestPass123"
    role: str = "tester"
 
    def __post_init__(self):
        if not self.email:
            ts = int(time.time() * 1000)
            suffix = uuid.uuid4().hex[:8]
            self.email = f"test-{ts}-{suffix}@test.com"

The timestamp keeps emails sortable in logs (test-1700000001234-... is older than test-1700000005678-...). The UUID suffix makes them unique even on the same millisecond. Together they're parallel-safe and human-readable.

For non-email fields the same pattern applies — usernames, product names, project IDs:

@dataclass
class TestProduct:
    name: str = ""
    sku: str = ""
    price: float = 9.99
 
    def __post_init__(self):
        suffix = uuid.uuid4().hex[:8]
        if not self.name:
            self.name = f"Test Product {suffix}"
        if not self.sku:
            self.sku = f"TST-{suffix.upper()}"

Anywhere a unique constraint exists in the app's database, the factory needs a UUID-suffixed default.

Creating multiple objects at once

A test that needs ten users gets one helper:

def create_users(count: int, **overrides) -> list[TestUser]:
    return [create_user(**overrides) for _ in range(count)]
 
 
users = create_users(10, role="tester")
admins = create_users(3, role="admin")

Each iteration returns a unique user (because __post_init__ runs per object). List comprehension keeps it concise. Pythonic.

API-seeded factories — the production pattern

Factories that just build dataclasses are useful; factories that also persist them via the API are how production tests get their data:

import pytest
from utils.data_factory import create_user
 
 
@pytest.fixture
def seeded_user(page):
    user = create_user()
    response = page.request.post("/api/users", json=user.__dict__)
    created = response.json()
    yield created
    page.request.delete(f"/api/users/{created['id']}")

The fixture builds a user via the factory, POSTs it, yields the created record, and deletes it on teardown. Tests that need a user just take the fixture:

def test_user_profile(page, seeded_user):
    page.goto(f"/users/{seeded_user['id']}")
    expect(page.get_by_role("heading")).to_contain_text(seeded_user["name"])

No setup boilerplate, no cleanup boilerplate, no collision risk. The factory does the unique-data work; the fixture handles the lifecycle.

The factory-as-fixture pattern — multi-create with shared cleanup

Some tests need several users with different roles. A simple fixture that hardcodes one user doesn't fit. Return the factory function itself:

@pytest.fixture
def user_factory(page):
    created = []
 
    def _create(**overrides) -> dict:
        user = create_user(**overrides)
        response = page.request.post("/api/users", json=user.__dict__)
        created_user = response.json()
        created.append(created_user)
        return created_user
 
    yield _create
 
    # Teardown — delete every user the test created
    for user in created:
        page.request.delete(f"/api/users/{user['id']}")

The fixture yields the _create function. Tests call it as many times as they need:

def test_admin_demotes_user(page, user_factory):
    admin = user_factory(role="admin")
    user = user_factory(role="tester")
 
    page.goto("/login")
    # ... log in as admin, navigate to user-management UI ...
    page.get_by_test_id(f"user-{user['id']}-demote").click()
 
    # Assertions on the demote flow
    expect(page.get_by_test_id("toast")).to_contain_text("Demoted")

Two users created, both deleted on teardown. Adding a third user is one line. The pattern composes well — you can mix user_factory with product_factory and order_factory in the same test and trust each to clean up its own creations.

How a factory call flows

Step 1 of 5

1. Test calls create_user()

Either with no overrides, or with kwargs like role='admin'. The dataclass receives the overrides; defaults fill the rest.

A complete factory module

A typical utils/data_factory.py for a real project covers users, products, orders — the entities tests touch most:

# utils/data_factory.py
import time
import uuid
from dataclasses import dataclass
 
 
def _unique_id() -> str:
    return f"{int(time.time() * 1000)}-{uuid.uuid4().hex[:8]}"
 
 
@dataclass
class TestUser:
    name: str = "Test User"
    email: str = ""
    password: str = "TestPass123"
    role: str = "tester"
 
    def __post_init__(self):
        if not self.email:
            self.email = f"test-{_unique_id()}@test.com"
 
 
@dataclass
class TestProduct:
    name: str = ""
    sku: str = ""
    price: float = 9.99
    category: str = "general"
 
    def __post_init__(self):
        suffix = _unique_id()
        if not self.name:
            self.name = f"Test Product {suffix}"
        if not self.sku:
            self.sku = f"TST-{suffix.upper()[:12]}"
 
 
def create_user(**kwargs) -> TestUser:
    return TestUser(**kwargs)
 
 
def create_users(count: int, **kwargs) -> list[TestUser]:
    return [create_user(**kwargs) for _ in range(count)]
 
 
def create_product(**kwargs) -> TestProduct:
    return TestProduct(**kwargs)

One module, two dataclasses, four factory functions, one shared _unique_id helper. Every test in the suite imports what it needs.

Combining factories with parametrize

Factories compose with the pytest.mark.parametrize pattern from chapter 3. Generate test data per case:

import pytest
from utils.data_factory import TestUser
 
 
@pytest.mark.parametrize("role,expected_url", [
    ("admin", "/admin"),
    ("tester", "/dashboard"),
    ("viewer", "/readonly"),
])
def test_login_redirects_by_role(page, api, role: str, expected_url: str):
    user = api.create_user(TestUser(role=role).__dict__)
 
    page.goto("/login")
    page.get_by_label("Email").fill(user["email"])
    page.get_by_label("Password").fill("TestPass123")
    page.get_by_role("button", name="Login").click()
 
    expect(page).to_have_url(expected_url)

Three parametrize cases, three unique users (each with its own UUID-suffixed email), three role-specific assertions. No copy-paste, no hardcoded test data.

Coming from Playwright TypeScript?

The TypeScript course's data factories use TS interfaces and Builder pattern:

  • TS interface User { email: string; ... } → Python @dataclass class User
  • TS class UserBuilder { withRole(r: string) { ... } } → Python create_user(role=...)
  • TS Object.assign({}, defaults, overrides) → Python **overrides kwargs
  • TS Faker.js for realistic data → Python faker library (pip install faker)

The Python dataclass + __post_init__ pattern is genuinely more concise than the TS Builder pattern most JS-world libraries push. If you're cross-training, the Python version is the cleaner read.

For realistic test data (real-looking names, plausible addresses, valid postcodes), the faker library is the standard:

from faker import Faker
 
fake = Faker("en_GB")
 
@dataclass
class TestUser:
    name: str = ""
    email: str = ""
    address: str = ""
 
    def __post_init__(self):
        if not self.name: self.name = fake.name()
        if not self.email: self.email = fake.unique.email()
        if not self.address: self.address = fake.address()

fake.unique.email() guarantees no duplicates within a process (Faker tracks the seen values). For cross-process uniqueness across xdist workers, combine with the UUID suffix from earlier.

⚠️ Common mistakes

  • Hardcoding email or username in test fixtures. Even if you never run in parallel, hardcoded data leaks across runs — the second run hits "user already exists" because the first never cleaned up. Always go through the factory; never type a hardcoded test email in a fixture.
  • Forgetting to delete created data on teardown. A factory fixture that creates but doesn't delete leaves your test database polluted. Use yield followed by deletes; for batches, track the IDs in a list and delete each on the way out.
  • Using **kwargs without typing. def create_user(**overrides) accepts any keyword. A typo (rolerolle) silently sets rolle="admin" on the object — the dataclass ignores unknown fields by default unless you enable strict checking. For a strict factory, def create_user(*, name: str = "...", email: str = "", role: str = "tester") -> TestUser: makes mypy catch typos.

🎯 Practice task

Build a factory module and use it across the suite. 30-40 minutes.

  1. Create utils/data_factory.py with the TestUser dataclass and create_user(**kwargs) factory from the lesson. Add the _unique_id() helper.

  2. Write tests/test_factory.py:

    from utils.data_factory import create_user, create_users
     
    def test_factory_creates_unique_users():
        users = create_users(5)
        emails = [u.email for u in users]
        assert len(set(emails)) == 5  # all unique
     
    def test_factory_overrides():
        admin = create_user(name="Alice Admin", role="admin")
        assert admin.name == "Alice Admin"
        assert admin.role == "admin"
        assert admin.email.startswith("test-")  # unique email default
  3. Run pytest tests/test_factory.py -v. Both should pass.

  4. Test parallel safety. Run pytest tests/test_factory.py -n 4 --count 10 (with pytest-repeat). Forty runs in parallel produce no collisions because every email contains a UUID suffix.

  5. Add a seeded_user fixture that POSTs to JSONPlaceholder (or your dev API). Write a test that uses it; confirm the fixture creates the user, the test sees the data, and the teardown call deletes (or attempts to delete; JSONPlaceholder doesn't actually persist, but the call shape is right).

  6. Add the factory-as-fixture variant. Build user_factory that yields a _create function. Write a test that creates 3 users with different roles inside a single test body and confirms all 3 are deleted on teardown.

  7. Stretch: add faker for realistic names. pip install faker, plug fake.name() and fake.unique.email() into the dataclass __post_init__. Print a few generated objects via pytest -s — confirm names look like real people, emails are valid format. Combine fake.unique.email() with the UUID-timestamp suffix for both realism and cross-process uniqueness.

You've got the data layer. The last lesson of this chapter (and the course's framework material) covers what to do after the suite is built — flake management, retry policies, the metrics that matter, and the scaling sequence as your suite grows from 30 to 300 tests.

// tip to track lessons you complete and pick up where you left off across devices.