"Performance testing" is not one thing. It is a family of related techniques, each designed to answer a different question about how a system behaves under different conditions. Using the wrong type of test for the question you are actually asking produces misleading results — and leaves real problems uncovered. This lesson maps the territory.
The four core types
Performance test types: what each one answers
Load Testing
Does it handle normal traffic?
Simulates expected concurrent users
Verifies SLAs under typical conditions
Example: 500 users at steady state for 30 min
Stress Testing
Where does it break?
Pushes beyond expected capacity
Finds the breaking point and failure mode
Example: ramp 500 → 5,000 users, watching for collapse
Spike Testing
Can it survive a sudden surge?
Sudden jump — not a gradual ramp
Tests autoscaling and burst handling
Example: 100 → 10,000 users in 60 seconds
Soak Testing
Does it degrade over time?
Normal load held for hours
Catches memory leaks, connection pool exhaustion
Example: 500 users, sustained for 8 hours
Load testing
Load testing runs your system at the expected traffic level and verifies that it meets its service-level agreements (SLAs). If your product normally handles 500 concurrent users, your load test runs at 500 concurrent users and checks that response times, error rates, and throughput stay within the agreed thresholds.
Load testing is the foundational type — it tells you whether your system can do its job. Every system should have load tests before going to production.
Stress testing
Stress testing deliberately exceeds expected capacity to find the system's breaking point. The goal is not to cause a failure for its own sake — it is to understand how the system fails when it does fail. Does it return clear errors? Does it crash completely? Does it take other services with it?
Knowing your system's breaking point before your users find it is far better than discovering it on launch day. Stress tests also reveal whether capacity planning (auto-scaling, load balancing, database connection limits) is correctly configured.
Spike testing
Spike testing simulates sudden, sharp traffic surges — the kind that happen when a viral post links to your product, a large marketing email goes out to 200,000 subscribers at the same time, or a TV ad airs. Unlike stress testing, which ramps load gradually, spike testing applies an abrupt jump.
The questions spike testing answers: does the system absorb the spike and stabilise? Does autoscaling kick in fast enough? Does it shed load gracefully, returning errors quickly rather than queuing requests indefinitely?
Soak testing (endurance testing)
Soak testing runs normal load for an extended period — typically 8 to 24 hours. It is designed to surface problems that only become visible over time: memory leaks that fill up the heap after six hours, database connection pools that exhaust when connections are not released correctly, disk space consumed by unrotated logs, gradual response time degradation as caches warm and cold paths accumulate.
Functional tests and short-duration load tests will never catch a memory leak. Soak testing exists precisely for that class of problem.
Two more types worth knowing
Volume testing tests behaviour with large amounts of data, not users. How does search perform with 10 million records instead of 10,000? How does report generation behave when the underlying table has 500 million rows? Volume testing is distinct from load testing — it is about data scale, not user scale.
Scalability testing verifies that adding capacity (servers, instances, database replicas) produces proportional performance gains. If doubling your server count does not roughly double your throughput, there is a bottleneck — a shared lock, a single-threaded component, an over-centralised database — that will limit growth.
Choosing the right test for the question
| Question | Test type |
|---|---|
| Can we handle Black Friday? | Load test at peak projected concurrency |
| What happens when we hit our limit? | Stress test past that limit |
| What if a marketing campaign sends 10× traffic instantly? | Spike test |
| Are we leaking memory over a working day? | Soak test for 8–12 hours |
| Does the system slow down with 100M rows? | Volume test |
| Will adding more servers help? | Scalability test |
The cost of skipping performance testing
Consider what a performance failure actually costs. Amazon estimates that every 100ms of additional latency costs 1% in revenue. The 2021 Facebook outage (six hours of downtime) cost an estimated $60M in lost revenue and $6B in market cap in a single day. Government services under tax-season load routinely make headlines for the wrong reasons.
Those are extreme examples, but the pattern holds at every scale. An e-commerce checkout that degrades under promotional traffic, a healthcare booking system that times out during peak appointment season, a SaaS dashboard that crawls when the database grows — all of these are performance failures that functional tests pass right through.
Performance testing is not insurance against unlikely disasters. It is standard engineering practice for any system that will see real load.
⚠️ Common mistakes
- Running only load tests and calling it done. Load tests verify normal operation. They do not tell you what happens when traffic spikes, when the system runs for 12 hours, or when you push past capacity. Use all four types.
- Testing on non-representative infrastructure. A load test that passes on a 16-core developer laptop tells you nothing about a 4-core production instance. Always performance-test in an environment that mirrors production as closely as possible.
- Ignoring the failure mode, only the threshold. Knowing a system breaks at 2,000 users is useful. Knowing it breaks by timing out silently and queuing requests until they expire — degrading every user's experience rather than rejecting cleanly — is more useful. Observe how it fails, not just when.
🎯 Practice task
Think about a product you work on or use regularly. For each of the four core test types, write one specific scenario:
- Load test — what is the expected normal peak concurrent user count? What SLA (response time, error rate) must hold at that load?
- Stress test — at what point would you expect the system to show signs of strain? What would failure look like?
- Spike test — what real-world event could cause a sudden traffic spike? (A product launch, an email campaign, a news story.) What multiplier would it apply to normal traffic?
- Soak test — are there any known or suspected slow resource leaks (memory, connections, disk)? How long would you need to run at normal load to observe them?
This exercise turns the abstract test types into concrete scenarios for your actual product. The next lesson shows you how to measure the results — what metrics to capture and how to interpret them.