Common GraphQL Testing Pitfalls — API Testing Masterclass

GraphQL has a learning curve for testers — not because it's complicated, but because the things you'd habitually do for REST don't quite apply, and the things you don't habitually do (check errors, set complexity limits, watch for N+1) suddenly become important. This lesson catalogues the pitfalls that bite even experienced QA engineers when they first start testing GraphQL, and the test-writing habits that defuse each one.

A map of where bugs hide

GraphQL test areas

– Field selection
– Nested data
– Variables and fragments

– Input validation
– Idempotency
– Side effects in DB

– errors array vs HTTP status
– Partial success
– Stable error codes

Introspection in prod –
Depth/complexity limits –
Auth on every operation –

N+1 fan-out –
Query timing –
Per-query rate limits –

Five branches. Pitfalls cluster in each.

Pitfall 1: only testing happy-path queries

GraphQL's flexibility means more shapes a test could take. A tester writing positive-only tests covers a small fraction of the surface.

Defuser: for every query, also test:

Empty result (user(id: 9999) with no match) — does it return null or an error?
Missing field selection — what happens if you ask only for id?
Deep nesting — does user.orders.items.product come back fully populated?
Wrong argument type and missing required arguments — both should be caught at validation time, before the resolver runs.

Pitfall 2: ignoring the errors array

The single most common GraphQL testing bug. Servers return HTTP 200 with errors in the body, and tests that only check data quietly miss them.

Defuser: wrap every test in a helper that asserts on the absence of errors before it asserts on data. If your suite already has dozens of tests written without this, retrofit the helper rather than touching every test individually.

Pitfall 3: not testing query depth and complexity limits

GraphQL's "client picks the shape" feature is also its most exploitable vector. A query with 30 levels of nesting can lock up the server.

Defuser: include at least one test that:

Sends a query past the documented depth limit (typically 7-10 levels).
Expects an error response (not a 30-second hang).
Confirms the server returns within a sane time (under a second).

If the team hasn't set limits, that's the bug — the test failure is the signal to add them.

Pitfall 4: over-fetching in tests

It's tempting to write one test fixture that asks for every field on a type, then reuse it for all tests:

{ user(id: 42) { id name email role createdAt orders { id total status items { id name price } } addresses { ... } } }

The problem: this doesn't reflect how real clients use the API. Real clients fetch small slices. A test that always fetches everything misses bugs where one specific field combination is broken.

Defuser: write tests with the minimum selection that proves the assertion. If you're testing user.email, ask for email and id — not the entire type. Reserve "fetch everything" tests for schema validation runs.

Pitfall 5: not testing mutations thoroughly

GraphQL's symmetry — mutation Create($input: ...) looks just like query, just renamed — can lull testers into treating mutations as a one-line check.

Defuser: for each mutation, ensure coverage of:

Happy path with all returned fields verified.
Missing required input fields.
Invalid input values (failing business validation).
Auth missing → unauthenticated.
Auth with wrong role → forbidden.
Idempotency: does calling twice create two rows or detect the duplicate?
Side effects: is the change persisted? Verify with a follow-up query or DB read (Lesson 4 of Chapter 5).

That's six to eight tests per mutation. Skipping them leaves real bugs.

Pitfall 6: ignoring rate limiting and per-query cost

GraphQL APIs often don't limit per-request count — one heavy query can be more expensive than 100 light ones. Modern providers (GitHub, Shopify) charge by query complexity instead.

Defuser: find out how rate limits work on your API:

Per request? Per minute? Per query "points" budget?
Are response headers like X-RateLimit-Cost returned?
Does a complex query cost more "points" than a simple one?

Then test:

A heavy query reports its cost in headers / extensions.
A request near the budget limit succeeds.
One past the limit returns a clear cost-exceeded error.

GitHub's GraphQL API is a good public example to learn against — it documents complexity scoring openly.

Pitfall 7: assuming GraphQL replaces REST

Most large products run both — REST for stable public APIs and integrations, GraphQL for client-driven reads. Your test suite needs to handle both, and the bugs each catches are different.

Defuser: maintain explicit coverage in both styles. A REST endpoint and a GraphQL field that read the same underlying data both deserve tests, because the bugs (auth, serialisation, error handling) live in different code paths.

Pitfall 8: not validating against the schema

REST tests often hand-write expected response shapes. GraphQL gives you something better: the schema itself is a typed contract, available via introspection. Tools like graphql-inspector and Apollo's tooling can:

Validate every query in your codebase against the schema at CI time.
Detect breaking schema changes (removed fields, type changes) before deploy.
Auto-generate test fixtures from the schema.

Defuser: add a schema-check step to CI. It's a cheap insurance policy that catches bugs before any test runs.

Pitfall 9: tooling-induced complacency

GraphQL tooling — Playground, Apollo Studio, Insomnia, Postman's GraphQL UI — is genuinely excellent. So good that it can hide problems. A query that works in Playground may fail in production because:

Playground sends extra introspection that the prod resolver doesn't run.
Playground's auth token differs from your client's.
Playground silently retries failed requests.

Defuser: automate everything that matters. Manual exploration is great for finding bugs; only an automated test suite catches regressions before they ship.

Pitfall 10: treating one-off queries as forever-stable

Tests written with hardcoded query strings against today's schema break when the schema evolves. A renamed field or moved argument breaks every test.

Defuser:

Use named operations (query GetUser) for grep-ability when something breaks.
Use variables instead of string interpolation.
Periodically run a schema-diff check between environments — staging and prod schemas should stay in sync.
When the schema changes, update fragments first; tests that use fragments need only one edit, not many.

A pre-flight checklist

For any new GraphQL endpoint, ask before merging:

☐ Errors array checked on every test.
☐ Auth tested on every mutation and protected query.
☐ Depth / complexity limit verified.
☐ Introspection disabled in production.
☐ N+1 risk evaluated for nested queries.
☐ Schema validated in CI against a golden file.
☐ At least one mutation idempotency test.
☐ Errors return stable extensions.code values.

Most teams get to seven of these and call it good. Even five represents a real improvement over "we test the happy path."

⚠️ Common mistakes

Treating GraphQL as "just JSON over POST." Mechanically true; conceptually misleading. The shape and error model differ enough to warrant their own habits.
Inheriting REST's status-code intuition wholesale. HTTP 200 is the default for everything, including failures. The body's errors array carries the real signal.
Skipping schema-aware tooling. A schema is a free typed contract — letting your tests use it pays back many times over.

🎯 Practice task

Audit a GraphQL test suite (or build one for the first time). 30 minutes.

Pick a project with GraphQL — yours, or any open-source repo with *.gql test files. If you don't have one, use the Countries API and write three tests from scratch.
Run through the pre-flight checklist above. Mark each item ✓ or ✗.
For every ✗, pick the highest-impact one and write a test that addresses it. Often "errors array checked" is the cheapest big win.
Try one exploitative query — deeply nested, asking for fan-out. Time the response. Is it bounded?
Stretch: add a CI step that runs a schema-introspection query and diffs the result against a checked-in golden file. The next breaking schema change will fail loudly.

That wraps up Chapter 6. Chapter 7 takes us back to the contract — but at the team level: how to keep consumers and providers in sync as APIs evolve.