A test that uses email="alice@test.com" works once. A test that uses email="alice@test.com" in parallel with three other tests using the same email is a flake-in-waiting. A test that does this in 50 different files is a refactor nightmare. Data factories are how you escape: a factory function builds a fresh object every call with sensible defaults, lets you override specific fields, and guarantees uniqueness automatically. Combine factories with dataclass (covered in the Python for QA course) and pytest fixtures, and your test data becomes deterministic, parallel-safe, and self-documenting. This lesson covers the factory pattern, the __post_init__ trick for unique fields, the API-seeded factory variant, and the cross-test factory fixture that handles cleanup.
The shape of a data factory
Start with a dataclass that holds the data, and a function that builds one with overrides:
# utils/data_factory.py
import time
from dataclasses import dataclass
@dataclass
class TestUser:
name: str = "Test User"
email: str = ""
password: str = "TestPass123"
role: str = "tester"
def __post_init__(self):
if not self.email:
self.email = f"test-{int(time.time() * 1000)}@test.com"
def create_user(**overrides) -> TestUser:
return TestUser(**overrides)Three things to notice:
- Sensible defaults. Every field has a default that works for "I just need a user, I don't care which one." Most calls become
create_user()— zero arguments, full object. __post_init__for derived fields.emaildefaults to empty string, then__post_init__fills it with a unique-per-call value. Tests don't have to remember to set it.**overridesfor selective customisation.create_user(role="admin")overrides onlyrole; the rest stay defaults.
Usage:
default_user = create_user()
admin = create_user(name="Admin User", role="admin")
specific = create_user(email="alice@specific.com") # explicit email overrides the defaultEach call returns a fresh object. No shared state, no collision risk.
Uniqueness — timestamps vs UUIDs
time.time() isn't unique across parallel tests. Two workers running on the same millisecond produce the same email. The fix is to add a UUID suffix:
import time
import uuid
@dataclass
class TestUser:
name: str = "Test User"
email: str = ""
password: str = "TestPass123"
role: str = "tester"
def __post_init__(self):
if not self.email:
ts = int(time.time() * 1000)
suffix = uuid.uuid4().hex[:8]
self.email = f"test-{ts}-{suffix}@test.com"The timestamp keeps emails sortable in logs (test-1700000001234-... is older than test-1700000005678-...). The UUID suffix makes them unique even on the same millisecond. Together they're parallel-safe and human-readable.
For non-email fields the same pattern applies — usernames, product names, project IDs:
@dataclass
class TestProduct:
name: str = ""
sku: str = ""
price: float = 9.99
def __post_init__(self):
suffix = uuid.uuid4().hex[:8]
if not self.name:
self.name = f"Test Product {suffix}"
if not self.sku:
self.sku = f"TST-{suffix.upper()}"Anywhere a unique constraint exists in the app's database, the factory needs a UUID-suffixed default.
Creating multiple objects at once
A test that needs ten users gets one helper:
def create_users(count: int, **overrides) -> list[TestUser]:
return [create_user(**overrides) for _ in range(count)]
users = create_users(10, role="tester")
admins = create_users(3, role="admin")Each iteration returns a unique user (because __post_init__ runs per object). List comprehension keeps it concise. Pythonic.
API-seeded factories — the production pattern
Factories that just build dataclasses are useful; factories that also persist them via the API are how production tests get their data:
import pytest
from utils.data_factory import create_user
@pytest.fixture
def seeded_user(page):
user = create_user()
response = page.request.post("/api/users", json=user.__dict__)
created = response.json()
yield created
page.request.delete(f"/api/users/{created['id']}")The fixture builds a user via the factory, POSTs it, yields the created record, and deletes it on teardown. Tests that need a user just take the fixture:
def test_user_profile(page, seeded_user):
page.goto(f"/users/{seeded_user['id']}")
expect(page.get_by_role("heading")).to_contain_text(seeded_user["name"])No setup boilerplate, no cleanup boilerplate, no collision risk. The factory does the unique-data work; the fixture handles the lifecycle.
The factory-as-fixture pattern — multi-create with shared cleanup
Some tests need several users with different roles. A simple fixture that hardcodes one user doesn't fit. Return the factory function itself:
@pytest.fixture
def user_factory(page):
created = []
def _create(**overrides) -> dict:
user = create_user(**overrides)
response = page.request.post("/api/users", json=user.__dict__)
created_user = response.json()
created.append(created_user)
return created_user
yield _create
# Teardown — delete every user the test created
for user in created:
page.request.delete(f"/api/users/{user['id']}")The fixture yields the _create function. Tests call it as many times as they need:
def test_admin_demotes_user(page, user_factory):
admin = user_factory(role="admin")
user = user_factory(role="tester")
page.goto("/login")
# ... log in as admin, navigate to user-management UI ...
page.get_by_test_id(f"user-{user['id']}-demote").click()
# Assertions on the demote flow
expect(page.get_by_test_id("toast")).to_contain_text("Demoted")Two users created, both deleted on teardown. Adding a third user is one line. The pattern composes well — you can mix user_factory with product_factory and order_factory in the same test and trust each to clean up its own creations.
How a factory call flows
Step 1 of 5
1. Test calls create_user()
Either with no overrides, or with kwargs like role='admin'. The dataclass receives the overrides; defaults fill the rest.
A complete factory module
A typical utils/data_factory.py for a real project covers users, products, orders — the entities tests touch most:
# utils/data_factory.py
import time
import uuid
from dataclasses import dataclass
def _unique_id() -> str:
return f"{int(time.time() * 1000)}-{uuid.uuid4().hex[:8]}"
@dataclass
class TestUser:
name: str = "Test User"
email: str = ""
password: str = "TestPass123"
role: str = "tester"
def __post_init__(self):
if not self.email:
self.email = f"test-{_unique_id()}@test.com"
@dataclass
class TestProduct:
name: str = ""
sku: str = ""
price: float = 9.99
category: str = "general"
def __post_init__(self):
suffix = _unique_id()
if not self.name:
self.name = f"Test Product {suffix}"
if not self.sku:
self.sku = f"TST-{suffix.upper()[:12]}"
def create_user(**kwargs) -> TestUser:
return TestUser(**kwargs)
def create_users(count: int, **kwargs) -> list[TestUser]:
return [create_user(**kwargs) for _ in range(count)]
def create_product(**kwargs) -> TestProduct:
return TestProduct(**kwargs)One module, two dataclasses, four factory functions, one shared _unique_id helper. Every test in the suite imports what it needs.
Combining factories with parametrize
Factories compose with the pytest.mark.parametrize pattern from chapter 3. Generate test data per case:
import pytest
from utils.data_factory import TestUser
@pytest.mark.parametrize("role,expected_url", [
("admin", "/admin"),
("tester", "/dashboard"),
("viewer", "/readonly"),
])
def test_login_redirects_by_role(page, api, role: str, expected_url: str):
user = api.create_user(TestUser(role=role).__dict__)
page.goto("/login")
page.get_by_label("Email").fill(user["email"])
page.get_by_label("Password").fill("TestPass123")
page.get_by_role("button", name="Login").click()
expect(page).to_have_url(expected_url)Three parametrize cases, three unique users (each with its own UUID-suffixed email), three role-specific assertions. No copy-paste, no hardcoded test data.
Coming from Playwright TypeScript?
The TypeScript course's data factories use TS interfaces and Builder pattern:
- TS
interface User { email: string; ... }→ Python@dataclass class User - TS
class UserBuilder { withRole(r: string) { ... } }→ Pythoncreate_user(role=...) - TS
Object.assign({}, defaults, overrides)→ Python**overrideskwargs - TS Faker.js for realistic data → Python
fakerlibrary (pip install faker)
The Python dataclass + __post_init__ pattern is genuinely more concise than the TS Builder pattern most JS-world libraries push. If you're cross-training, the Python version is the cleaner read.
For realistic test data (real-looking names, plausible addresses, valid postcodes), the faker library is the standard:
from faker import Faker
fake = Faker("en_GB")
@dataclass
class TestUser:
name: str = ""
email: str = ""
address: str = ""
def __post_init__(self):
if not self.name: self.name = fake.name()
if not self.email: self.email = fake.unique.email()
if not self.address: self.address = fake.address()fake.unique.email() guarantees no duplicates within a process (Faker tracks the seen values). For cross-process uniqueness across xdist workers, combine with the UUID suffix from earlier.
⚠️ Common mistakes
- Hardcoding email or username in test fixtures. Even if you never run in parallel, hardcoded data leaks across runs — the second run hits "user already exists" because the first never cleaned up. Always go through the factory; never type a hardcoded test email in a fixture.
- Forgetting to delete created data on teardown. A factory fixture that creates but doesn't delete leaves your test database polluted. Use
yieldfollowed by deletes; for batches, track the IDs in a list and delete each on the way out. - Using
**kwargswithout typing.def create_user(**overrides)accepts any keyword. A typo (role→rolle) silently setsrolle="admin"on the object — the dataclass ignores unknown fields by default unless you enable strict checking. For a strict factory,def create_user(*, name: str = "...", email: str = "", role: str = "tester") -> TestUser:makes mypy catch typos.
🎯 Practice task
Build a factory module and use it across the suite. 30-40 minutes.
-
Create
utils/data_factory.pywith theTestUserdataclass andcreate_user(**kwargs)factory from the lesson. Add the_unique_id()helper. -
Write
tests/test_factory.py:from utils.data_factory import create_user, create_users def test_factory_creates_unique_users(): users = create_users(5) emails = [u.email for u in users] assert len(set(emails)) == 5 # all unique def test_factory_overrides(): admin = create_user(name="Alice Admin", role="admin") assert admin.name == "Alice Admin" assert admin.role == "admin" assert admin.email.startswith("test-") # unique email default -
Run
pytest tests/test_factory.py -v. Both should pass. -
Test parallel safety. Run
pytest tests/test_factory.py -n 4 --count 10(withpytest-repeat). Forty runs in parallel produce no collisions because every email contains a UUID suffix. -
Add a
seeded_userfixture that POSTs to JSONPlaceholder (or your dev API). Write a test that uses it; confirm the fixture creates the user, the test sees the data, and the teardown call deletes (or attempts to delete; JSONPlaceholder doesn't actually persist, but the call shape is right). -
Add the factory-as-fixture variant. Build
user_factorythat yields a_createfunction. Write a test that creates 3 users with different roles inside a single test body and confirms all 3 are deleted on teardown. -
Stretch: add
fakerfor realistic names.pip install faker, plugfake.name()andfake.unique.email()into the dataclass__post_init__. Print a few generated objects viapytest -s— confirm names look like real people, emails are valid format. Combinefake.unique.email()with the UUID-timestamp suffix for both realism and cross-process uniqueness.
You've got the data layer. The last lesson of this chapter (and the course's framework material) covers what to do after the suite is built — flake management, retry policies, the metrics that matter, and the scaling sequence as your suite grows from 30 to 300 tests.