The performance smoke test I'd run before release

qa.codes · 13 June 2026 · 8 min read

IntermediateQA EngineersPerformance testers

performance-testingsmoke-testreleasechecklist

You don't need a full load test before every release. You need a fast performance smoke test that catches the regressions that matter — the ones that turn a working feature into a slow, broken one under real conditions.

part ofPerformance for QA engineers

Most teams have a false choice in their heads: either run a full multi-hour load test or do nothing. The result is "do nothing," because the full test is too expensive to run per release. The missing middle is a performance smoke test — a small, fast, repeatable check that catches gross regressions before they ship. It won't tell you your exact capacity ceiling; it'll tell you whether this release made things obviously worse, which is what you actually need to know before sign-off.

What a performance smoke test is for

It answers one question: did anything get dramatically slower or start falling over since the last release? It's not capacity planning, and it's deliberately not a full load test. It's the performance equivalent of a functional smoke test — fast, focused on the critical paths, run every time, designed to catch the cliff-edge regressions, not to measure the last 5%.

The trick is to measure the few endpoints that matter under a modest, consistent load, and compare against a known baseline. Same script, same load, same environment, every release. The comparison is the whole point — a number in isolation is noise; a number that doubled since last week is a finding.

How to run one

Pick the critical paths. Login, the main read endpoint(s), the main write/checkout, search. Five or six at most — the ones whose slowness would actually hurt.
Hold the load modest and fixed. A handful of virtual users for a few minutes is enough to surface a regression. You're not trying to break it; you're trying to compare it. Tools like k6 make this a short script.
Look at p95, not the average. The average lies — it hides the slow tail where real users feel pain. p95 latency is the honest number: 95% of requests were at least this fast.
Compare to baseline. A regression is "p95 doubled" or "errors appeared," not "p95 is 200ms." Without a baseline you can't tell a problem from a Tuesday.
Watch error rate and saturation. A fast response that's actually a 500 is not fast. Errors under light load are a worse signal than latency.

Pre-release performance smoke test

Same script, same load, same environment as last time (comparison only works if it's controlled)
5–6 critical endpoints covered: login, primary read, primary write, search
Modest fixed load (a few VUs, a few minutes) — surface regressions, don't stress to failure
Report p95 (and p99 if you have it), never just the average
Compare against the previous baseline; flag anything that materially worsened
Error rate stays near zero under the smoke load
Check for performance bugs that look functional — N+1 queries, a missing index, a new sync call in a hot path
Run it in CI or as a scripted step so it actually happens every release, not "when someone remembers"

Where the regressions actually come from

When a smoke test catches something, the cause is usually mundane and recent: a new database query with no index, an N+1 introduced by an ORM change, a call to a third-party service added inside a request that used to be local, a payload that quietly grew. These are the regressions that look fine in functional testing — the feature works, it's just slow — and only a before/after performance number exposes them. That's the gap the smoke test fills: catching the slow-but-passing change before a real user on a real connection feels it.

Keep it cheap enough to run every time. A performance check that's too heavy to run per release is a performance check you don't have.