Load testing is not the same as performance testing
Load testing and performance testing get used interchangeably and aren't the same thing. Confusing them is how teams run the wrong test and trust the wrong result.
part ofPerformance for QA engineers"We need to performance test it" almost always comes out as "let's blast it with virtual users and see if it falls over." That's load testing — one tool in a bigger box — and treating it as the whole of performance testing leads to a specific failure: a green load test, a confident sign-off, and a system that's slow for real users anyway. Untangling the terms changes what you measure and what you conclude. This is the conceptual companion to p95 latency explained and why average response time lies.
Performance testing is the umbrella
Performance testing is the whole discipline of measuring how a system behaves under various conditions — speed, responsiveness, stability, resource use. It answers "is this fast enough, and does it stay that way?" Load testing is one type of performance test. So are several others, and they answer different questions:
- Load testing — behaviour under expected load. "Does it hold up with the traffic we actually expect?"
- Stress testing — behaviour beyond expected load, to find the breaking point. "Where does it fall over, and how — gracefully or catastrophically?"
- Spike testing — a sudden sharp jump in load. "What happens when traffic triples in a minute (a sale, a launch, a viral moment)?"
- Soak / endurance testing — sustained load over a long time. "Does it degrade after hours — memory leaks, connection-pool exhaustion, disk filling?"
- Scalability testing — how performance changes as you add resources. "Does doubling the servers actually double throughput?"
A single load test tells you about one of these. Calling it "performance testing" implies you've covered all of them — you haven't.
A single user can have a performance bug
The biggest consequence of conflating the two: people think performance only matters under load. But a page that takes four seconds for one user — an unindexed query, an N+1, a giant unpaginated response — is a performance bug with zero concurrent traffic. That's pure performance testing (measure the response time of one operation) with no load testing involved at all. If "performance testing" only ever means "run a load test," these single-user slowdowns never get caught until production.
They measure different things
- Load testing mostly watches the system's capacity signals as concurrency rises: throughput, error rate, and how latency percentiles climb with the number of users. You're hunting the knee in the curve — where it stops coping.
- Performance testing of a single operation watches that operation's latency, the queries it makes, the payload size, the resources it touches. You're hunting why one thing is slow, independent of traffic.
Reporting a load test's average and calling the feature "performant" mixes these up — and, as covered in why average response time lies, the average hides the bug either way.
What this means for QA
- Don't accept "we load tested it" as "it's fast." Ask: fast for one user? Under expected load? At the breaking point? Over eight hours?
- Match the test to the risk. A launch with a marketing spike needs spike testing; a long-running background job needs a soak test; a slow report needs single-user profiling, not a load test.
- Measure single-operation performance early (you can do this before any load tooling — just time the operation with realistic data), and load/stress/soak as the risk warrants.
Where this fits
This sorts the vocabulary so the rest of the performance-for-qa series makes sense — reading percentiles, distrusting the average, and what load belongs in CI. The glossary defines load, stress, soak, spike, and throughput; the tools directory compares k6, JMeter, and Gatling.
Which performance test do you actually need?
- Single-operation slow? → profile one user with realistic data (no load tooling needed)
- Expected traffic holds up? → load test, watch percentiles + error rate, find the knee
- Where does it break? → stress test beyond expected load
- Sudden surge (sale/launch)? → spike test
- Degrades over hours? → soak/endurance test
- Adding resources helps? → scalability test
- Never accept "we load tested it" as proof it's fast for one user
// RELATED QA.CODES RESOURCES
// related
p95 latency explained for QA engineers
What p95 actually means, why averages hide the bugs, and how to read a latency distribution as a tester.
How to set realistic performance thresholds
Derive thresholds from user expectation, today's baseline, and business impact — set on p95/p99 with an error-rate gate, tiered by criticality — not a made-up 'under 2s'.