Performance Testing

What a QA engineer needs to test how a system behaves under load: the test types and what each answers, the metrics that actually matter (hint: not the average), how to design a load test that produces a real signal, and the line between back-end load testing and front-end web performance. Pair this with the API Testing Concepts sheet for the request side.

The test types

"Performance testing" is an umbrella. Each type answers a different question — pick the one that matches your risk.

Type	Question it answers	Shape of the load
Load	Does it meet its targets at expected traffic?	Ramp to expected peak, hold
Stress	Where does it break, and how?	Push past capacity until it fails
Soak (endurance)	Does it degrade over time?	Sustained moderate load for hours
Spike	Does it survive a sudden surge?	Sharp jump up, then down
Volume	Does it cope with large data?	Big datasets/payloads, not just traffic

Most teams start with a load test against a target, then add soak (catches memory leaks) and spike (catches autoscaling gaps) as the system matters more.

The metrics that matter

The single most important habit: report percentiles, not averages. An average hides the slow tail that users actually feel.

Metric	What it tells you
Throughput (req/s, TPS)	How much work the system handles
Latency / response time	How long each request takes
p95 / p99 (percentiles)	The experience of the slowest 5% / 1% — the real signal
Error rate	Share of requests failing under load
Concurrency / active VUs	How many users are hitting it at once

Why averages lie:
  100 requests: 99 at 100ms, 1 at 5000ms
  average  = 149ms        -> looks fine
  p99      = 5000ms       -> one in a hundred users waits 5s
Report p95/p99 and error rate. The average is the least useful number.

Virtual users, concurrency, and ramp-up

A load tool simulates virtual users (VUs) — concurrent simulated clients. Two knobs shape the test:

Concurrency — how many VUs run at once (models real simultaneous users). See Concurrent Users.
Ramp-up — how fast you add VUs. Ramping gradually finds the point where latency degrades; slamming all VUs at once only tells you pass/fail. See Ramp-up Period.

Think in terms of arrival rate (requests per second) where you can — it's more stable than a fixed VU count when response times change mid-test.

Designing a load test

A repeatable shape that produces a real signal:

Set targets — define the SLOs first (e.g. p95 < 500ms, error rate < 0.1% at 1,000 req/s). A test with no target can't pass or fail.
Establish a baseline — measure the system at low, known load so you have something to compare against. See Baseline Testing.
Model realistic traffic — mix the endpoints/journeys real users hit, with realistic think-time and data. One-endpoint hammering misrepresents the system.
Ramp gradually — increase load in steps and watch where metrics turn.
Hold and observe — sustain peak long enough to see steady-state behaviour, not just the spike of warm-up.
Analyse against targets — compare p95/p99 and error rate to the SLOs, not to a vibe.

Use realistic, varied test data — reusing one record hits caches and flatters the result.

Reading results and finding bottlenecks

A failed target is the start, not the answer. Bottlenecks usually sit in one of a few places:

Application — slow code paths, lock contention, thread/connection-pool exhaustion.
Database — unindexed queries, the N+1 pattern, connection limits (often the first wall).
Infrastructure — CPU/memory saturation, network, undersized instances.
External dependencies — a downstream API or queue that caps your throughput.

Correlate the load tool's metrics with server-side observability (APM, Grafana dashboards) — the load tool tells you that it's slow; the server-side data tells you where. Watch for the knee in the curve: the concurrency level where latency rises sharply while throughput plateaus.

Back-end load vs front-end performance

Two different disciplines, often confused:

	Back-end load testing	Front-end web performance
Question	Does the server hold up under load?	Is the page fast for one user?
Tools	k6, JMeter, Gatling, Locust	Lighthouse, WebPageTest
Metrics	Throughput, p95 latency, error rate	Core Web Vitals (LCP, INP, CLS)
Scope	Many simulated users, no real browser	One real browser, render/paint timings

Both matter: a server that scales perfectly still feels slow if the page renders poorly, and a fast page still fails if the API behind it falls over at 500 users. Test the layer that carries your risk — usually both.

The tool landscape

Need	Tools
Code-scripted load (OSS)	k6 (JS), Gatling (Scala/Java), Locust (Python), Artillery (JS/YAML)
GUI / record-based load	JMeter
Quick HTTP benchmarking	Vegeta (constant-rate), wrk
Enterprise platforms	LoadRunner, NeoLoad, LoadView
Front-end / Core Web Vitals	Lighthouse, WebPageTest