Test Data

Creating, managing and protecting the data tests run against — fixtures, seeds, synthetic data and masking.

8 terms

C

CSV (Comma-Separated Values)

A plain-text file format in which each line represents a record and fields are separated by commas (or other delimiters such as tabs or semicolons). The first row is commonly a header row naming each column. CSV is the most common format for test data files, bulk imports, and data-driven test suites: a QA engineer creates one row per test case, feeds the file to a test runner, and the runner executes each row as a separate test. Common failure modes include unquoted fields containing commas, inconsistent column counts between rows, trailing newlines, non-UTF-8 encoding, and duplicate headers. Always validate a CSV structurally before using it as test input — a malformed header row will silently shift all values into the wrong columns.

D

Data Masking

Replacing sensitive fields in a real dataset with realistic but fake values — so a copy of production can be used for testing without exposing actual PII. Names become other names, card numbers become valid-format fakes, emails get scrambled, but the data keeps its shape, relationships, and distribution. The middle path between "test on raw production" (illegal) and "test on pure synthetic" (less realistic).

Data-Driven Testing

Running the same test logic against many input/output combinations, typically loaded from a CSV, JSON file, or database. Separates test data from test code so you can scale coverage without duplicating logic.

S

Seed Data

A known, controlled dataset loaded into a system before tests run, so every test starts from a predictable state. Seeding is what makes assertions reliable: if the database always begins with "User 1, 3 orders", a test can assert against those exact values instead of whatever happens to be there. The opposite of testing against a shared, drifting environment.

Synthetic Data

Artificially generated data that mimics the shape and statistical properties of real data without being real — fake names, plausible addresses, realistic-but-invented transactions. It lets teams test at volume and edge cases without copying production data (and its privacy risk). Tools like Faker generate it; the harder version preserves real distributions for performance/ML testing.

T

Test Data Management

Provisioning, masking, refreshing, and tearing down data needed by tests. Done well, it's invisible. Done badly, it's the reason a third of tests fail on Mondays.

Test Fixture

A known, fixed state used as a baseline for tests — sample data, a seeded database, or a configured environment that ensures repeatability across runs.

Test Fixture vs Factory

Two ways to produce test data. A fixture is a fixed, predefined dataset loaded as-is (the same "User 1" every time) — predictable but rigid. A factory generates objects on demand with sensible defaults you override per test (`buildUser({ role: 'admin' })`) — flexible and DRY. Fixtures suit a stable shared baseline; factories suit tests that each need a slightly different variant.