Managing Test Data and Fixtures in Git

8 min read

The previous lesson was about what not to put in Git. This one is about what should go in: test fixtures — the JSON, CSV, image, and config files that your tests load before running. Fixtures are part of your test suite. They get versioned, reviewed, and merged through the same PR workflow as your test code. Done well, fixtures make your tests deterministic, fast, and readable. Done badly, they balloon the repo, leak real user data, and become impossible to maintain. This lesson covers the patterns that work.

What counts as a "fixture"

A fixture is any static, predictable input your test loads at the start of a run. Common shapes:

  • JSON — API response stubs, user records, product catalogues, form data.
  • CSV — bulk data for parameterised tests (100 user rows, 50 search queries).
  • Images — visual-regression baselines, profile-photo upload tests, file-upload tests.
  • HTML / fragment files — small static pages tests serve to a fake server.
  • Config files — environment-specific defaults, feature-flag matrices.

What unites them: the test reads them, the test does not write to them, and what they contain is the same on every run.

Fixtures belong in Git

Your tests are code. Code lives in Git. The data those tests consume is part of "the test" — without the fixture, the test doesn't run. Treat fixtures the same way:

  • Version them. When the fixture changes, the test that depends on it might also need to change. Both go through the same PR.
  • Review them. A PR adding 30 new test users is a PR. Reviewers spot duplicate IDs, malformed JSON, or fixtures that accidentally mirror real customer data.
  • Track them. git blame on a fixture file tells you who added the "international shipping" record and why — exactly the same forensics you'd run on test code.

Folder structure that scales

A flat fixtures/ folder works for ten files. At fifty, it's a swamp. Group by feature or shape:

cypress/fixtures/
├── users/
│   ├── valid-user.json
│   ├── admin-user.json
│   ├── locked-user.json
│   └── invalid-users.json
├── products/
│   ├── single-product.json
│   ├── product-list.json
│   └── out-of-stock-product.json
├── api-responses/
│   ├── login/
│   │   ├── 200-success.json
│   │   ├── 401-invalid-credentials.json
│   │   └── 429-rate-limited.json
│   └── checkout/
│       ├── 200-success.json
│       └── 402-payment-required.json
└── images/
    ├── valid-jpeg.jpg
    ├── valid-png.png
    └── corrupt.jpg

Two rules:

  1. One concept per file. A 600-line users.json containing 47 different scenarios is a maintenance trap. Break it: valid-user.json, admin-user.json, locked-user.json. Each test loads exactly what it needs.
  2. Mirror your test structure. If your tests live in cypress/e2e/checkout/, your fixtures live in cypress/fixtures/checkout/. New joiners can find the right file from the test's folder alone.

What does NOT belong in Git

Three categories, each for a different reason:

Large binary files

Git stores every version of every file forever. A 50MB CSV that updates weekly turns into 2.5GB of repo history in a year — and every clone downloads all of it.

Options for large data:

  • Cloud storage (S3, GCS) — your test fetches it at setup time. The repo stays small; the data is versioned by upload date or hash.
  • Git LFS (Large File Storage) — extension that stores big files outside Git proper. Tracked but not bloating clones.
  • Programmatic generation — generate the data in code from a small seed, on demand. Often the cleanest answer for synthetic test data.

Real user / PII data

Even if you trust your repo's permissions, real user data in a test fixture is a leak waiting to happen. Use fake data:

  • Generators like Faker or @faker-js/faker create realistic but synthetic names, emails, addresses, credit-card numbers.
  • Test accounts with explicitly fake values: alice@test.example.com, Test User #042, 4111-1111-1111-1111 (a known test card number).

If the fixture contains anything that could identify a real human, replace it before committing.

Dynamic / per-run data

Anything that changes every run — current timestamps, freshly-generated UUIDs, today's exchange rate — should not be a fixture. Generate it in the test:

const order = {
  id: crypto.randomUUID(),
  createdAt: new Date().toISOString(),
  amount: 19.99
};

A fixture frozen with last-Tuesday's timestamp is one debugging headache away from confusion.

Environment-specific data — the .env pattern

Tests that talk to real systems need different values per environment: a different base URL on staging vs production, different test account credentials, different rate-limit thresholds. The convention:

  • .env.example — committed to Git. Lists every variable the test suite expects, with placeholder values. New joiners copy this to .env on their machine.
  • .env — listed in .gitignore. Holds the real values for each developer's environment.
  • cypress.config.ts (or equivalent) reads from process.env so tests stay environment-agnostic.

Example .env.example:

CYPRESS_BASE_URL=https://staging.example.com
CYPRESS_TEST_USERNAME=
CYPRESS_TEST_PASSWORD=
CYPRESS_API_TOKEN=

Each developer copies it, fills in their real values, and cypress run works on their machine. CI provides the same variables via GitHub Actions secrets — same pattern, different source. The CI/CD for testers cheat sheet covers the secret-injection side.

Seed data vs dynamic data — when to use each

Seed data (fixtures) — predefined, predictable values:

  • Your "valid user" always has the same email and password.
  • Your "out of stock product" record always exists.
  • A successful login always returns the same canned JSON.

Seed data makes tests reproducible. The same fixture + same code = same result every time.

Dynamic data — generated at runtime:

  • Random emails to avoid collisions on parallel runs.
  • Today's date for time-sensitive tests.
  • UUIDs for new entities.

Dynamic data avoids cross-test pollution and concurrency conflicts. Generate it inline; don't try to bake it into a fixture.

Updating fixtures the same way you update code

Fixtures change. Your team adds a new user role; an API field is renamed; a new currency joins the product catalogue. Treat each fixture update as a code change:

  1. git switch -c test/add-moderator-role-fixtures
  2. Update the affected fixture files.
  3. Update any tests that need to reflect the new shape.
  4. Push, open a PR, get reviewed.
  5. Merge.

Reviewers catch oversights — the fixture you updated in users/ but forgot to update in api-responses/login/, the field rename you missed in three test files, the test data that accidentally reads as a real customer's name.

Decision tree — should this data be in Git?

A real QA scenario — adding a moderator role

Product adds a "moderator" user role. Five tests reference user roles, two fixtures define users, one API stub returns role info. Your sequence:

git switch main && git pull
git switch -c test/add-moderator-role

Update users/moderator-user.json (new file). Update users/valid-user.json if needed. Update api-responses/login/200-success.json to include the new role in the role list. Update tests that enumerate roles. Commit per logical change:

git add cypress/fixtures/users/moderator-user.json
git commit -m "Add moderator-user fixture for role-based access tests"
 
git add cypress/fixtures/api-responses/login/200-success.json
git commit -m "Add moderator role to login response stub"
 
git add cypress/e2e/permissions/
git commit -m "Add 4 tests for moderator role permissions"
 
git push -u origin test/add-moderator-role

Open a PR. Reviewers check fixture shape, naming, no PII, tests still pass. Merge. Now every developer has the new role available locally on their next git pull, and CI runs against the same data.

⚠️ Common mistakes

  • One giant users.json containing every test scenario. It becomes the file with the most merge conflicts in the repo. One concept per file (or at least, one category per file) prevents this. Break early.
  • Hard-coding environment URLs in fixtures. A fixture that contains https://staging.acme.com/api won't work in localhost or production. Strip environment specifics out of fixtures and inject them via env vars at runtime.
  • Storing real customer data "just for one test." That sample CSV from the bug report has real emails and order numbers. Even on a private repo, that's a compliance leak waiting to happen. Always anonymise before commit; treat fake data as the default.

🎯 Practice task

Audit and reorganise a fixture folder. 25-30 minutes.

  1. In any test repo (or your qa-sandbox), create cypress/fixtures/ with three files: users.json, products.json, api-responses.json. Put 3-5 records each.
  2. Reorganise into the recommended layout: users/ folder with valid-user.json, admin-user.json, invalid-user.json. Same for products. Same for api-responses/.
  3. Commit on a branch (test/reorganise-fixtures). The commit alone is the practice — fixtures are code, and reorganising them is a real PR.
  4. Create a .env.example file with three placeholder variables (e.g., CYPRESS_BASE_URL=, CYPRESS_TEST_USERNAME=). Commit it.
  5. Create .env with real (fake but real-looking) values: CYPRESS_BASE_URL=https://staging.example.com, etc. Confirm git status does NOT show .env (your .gitignore from Lesson 1 should hide it).
  6. Search any fixture for words like @gmail.com, real-looking phone numbers, or dates with personal context. If you find any, replace with @test.example.com and synthetic values.
  7. Stretch: add a comment header to one fixture file: // Used by cypress/e2e/checkout/discount-code.spec.ts. Some teams keep the link explicit for navigability; whether it's worth the noise is a judgement call to make on your team.

The next (and final) lesson of this chapter introduces Git hooks — running your tests automatically before code leaves your machine.

// tip to track lessons you complete and pick up where you left off across devices.