Designing Realistic Performance Tests — Non-Functional Testing Overview

A performance test that does not reflect real user behaviour does not tell you anything useful about real user experience. This is the most common way performance testing fails — not because the tool was wrong or the infrastructure was underpowered, but because the test scenario was unrealistic. Optimistic assumptions in the test design produce optimistic results, and the system fails in production anyway.

Designing a realistic performance test is a distinct skill from running one. This lesson covers how to do it.

The five ways performance tests go wrong

Before designing a good test, it helps to understand the most common failure modes in test design:

1. Testing only the happy path. Real users do not all go to the same page. They browse, search, add to cart, abandon, come back, try different products, hit error pages, and log out. A test where every virtual user follows an identical checkout flow does not reflect this diversity — and it inflates results by allowing aggressive caching of a single data path.

2. Using static test data. If every virtual user logs in as the same user, searches for the same product, and buys the same item, the server caches everything after the first request. Response times look great. In production, where thousands of different users search for different things, the cache miss rate is much higher and results are much worse.

3. Removing think time. Real users read, pause, scroll, fill in forms slowly. They do not fire requests as fast as the network allows. A test without think time applies far more load than the same number of real users would generate — and produces artificially bad results at a given concurrency level, or requires artificially low concurrency to simulate the actual load.

4. Wrong infrastructure. Testing on a developer laptop with 16 cores, 32GB of RAM, and a local database tells you nothing about a 4-core production instance with a remote database over a shared network. Always test on production-equivalent infrastructure.

5. Not testing third-party dependencies. If your checkout calls a payment provider, your search calls an external analytics service, and your auth calls an identity provider — and your performance test mocks all of them — you have tested a system that does not exist. Third-party latency and reliability are real performance variables.

A design process that produces realistic tests

Define user journeysWhat do real users actually do? Use anal…

Assign traffic weightsMix journeys based on real analytics: 60…

Parameterise test dataUse data files: 10,000 user accounts, 50…

Add realistic think time1–5 seconds between actions. Users read,…

Set a realistic ramp-upGradually increase virtual users over 5–…

Run for sufficient durationMinimum 30 minutes at steady state. Soak…

Analyse percentiles and resourcesReport p50, p95, p99, error rate, throug…

Defining user journeys

Start with your analytics. What are the most common paths through your application? For an e-commerce site, the data might look like this:

60% of sessions: browse product listings (no purchase)
20% of sessions: search
10% of sessions: add to cart and abandon
10% of sessions: complete checkout

A realistic load test simulates all four journeys in proportion. Your 500 virtual users split roughly as: 300 browsing, 100 searching, 50 abandoning, 50 completing checkout. This is far more representative than 500 virtual users all completing checkout.

Parameterising test data

Every k6 and JMeter test supports parameterised data — CSV files or JSON arrays that feed different values to each virtual user on each iteration.

Minimum data variety for an e-commerce test:

10,000 distinct user accounts (pre-created in the test environment)
50,000 product IDs spread across categories
Varied shipping addresses (at least one per user)
Test payment tokens (most payment providers have sandbox cards)

The rule of thumb: if a cache would help a single repeated request in production, your test data needs to be varied enough to exercise the cache realistically — not defeat it entirely, but not game it artificially either.

Think time: the detail most scripts skip

Think time is the pause between a user receiving a response and sending the next request. Real users read content, fill in forms, and make decisions. A think time of 1–3 seconds between actions is a reasonable default for most applications; form-filling flows (checkout, registration) might warrant 5–10 seconds.

Without think time, a script that simulates "10 concurrent users" actually generates the same load as 100 real users pressing refresh as fast as the network allows. With think time, the relationship between virtual users and actual requests per second becomes meaningful.

What belongs in a performance test report

A good performance test report tells a complete story — not just whether the system passed, but what it looked like while under load:

Test scenario summary — what journeys, what data, what ramp-up, what duration
Load profile — a graph of virtual users over time, showing the ramp-up and steady-state periods
Key metrics — p50, p95, p99 response time; throughput; error rate; broken down by journey type
System metrics — CPU, memory, database connections, network utilisation on the server side
Comparison to baseline — how do results compare to the previous test run? Getting worse over time is a signal even if today's run still passes.
Pass / fail against SLAs — explicit statement of which thresholds were met and which were not
Recommendations — where to investigate further, what optimisations to consider

A report that shows only "average response time: 240ms, test passed" is not a performance test report — it is a single number with a label.

⚠️ Common mistakes

Skipping the warm-up period. Most systems have cold-start behaviour — JVM warmup, connection pool initialisation, cache population. Measure steady-state performance, not cold-start performance, by ignoring results from the first few minutes of each run.
Not validating correctness under load. A system that returns HTTP 200 with an empty body under load has not actually served the request correctly. Validate response content in your performance tests, not just status codes.
Treating a passing test as proof of readiness. A test that passes is only as meaningful as the scenario it tests. If your load test uses static data and think time of zero, a passing result means very little. The quality of the test design determines the value of the result.

🎯 Practice task

Design (on paper) a realistic performance test scenario for a product you know well — your own application, a favourite SaaS tool, or a public API.

Using either real analytics or your best estimate, write down 3–4 user journeys and assign a percentage weight to each (they should add up to 100%).
List the data you would need to parameterise the test. What would a "10,000 row" CSV file contain? What fields? Where would you generate the data?
Estimate the think time for each step in your primary journey. Where do users read? Where do they fill in forms? Where do they wait for results and then act?
Write the acceptance criteria: what p95 response time, error rate, and throughput targets must the test hit to pass?

This design exercise is the hardest part of performance testing — and the part that determines whether the results mean anything. Tools are the easy part; thinking clearly about what to simulate is the skill.