Spike Testing — Sudden Traffic Surges — Performance Testing with K6

A spike test simulates a sudden, massive jump in traffic — the kind that arrives when a marketing email hits 200,000 inboxes simultaneously, a product appears on the front page of a major publication, or a viral social media post sends users flooding to a single endpoint. The goal is not to find the steady-state breaking point (that is a stress test) but to understand how the system absorbs and recovers from shock load.

What spike testing measures

Spike test observation points

	What K6 measures	What it reveals
Baseline (pre-spike)	http_req_duration p(95), error rate at normal load	Reference point — confirms the system is healthy before the test begins
Spike onset (10–30s)	Error rate spike, p(99) latency, http_req_blocked duration	How fast the system responds to sudden demand. High http_req_blocked means connection pool saturation.
Sustained spike (2–5m)	Whether p(95) stabilises or continues climbing. Error rate trend.	Whether autoscaling kicked in fast enough. If latency falls back during this phase, autoscaling worked.
Drop back to baseline	Speed of error rate drop, latency return to baseline values	Whether the system recovers cleanly or leaves threads stuck, connections held, or caches in a bad state
Recovery observation	p(95) and error rate at baseline VU count, post-spike	A system degraded by a spike that does not recover at normal load is a more serious problem than one that held during the spike.

The spike test pattern

The key characteristics are the very short ramp (10–30 seconds) and the recovery observation phase after the load drops:

export const options = {
  stages: [
    { duration: '2m',  target: 10 },    // establish normal baseline
    { duration: '30s', target: 500 },   // abrupt spike — 10 to 500 VUs in 30 seconds
    { duration: '4m',  target: 500 },   // hold spike — watch system response
    { duration: '30s', target: 10 },    // drop back to baseline
    { duration: '3m',  target: 10 },    // observe recovery at normal load
    { duration: '1m',  target: 0 },     // ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<3000'],
    http_req_failed:   ['rate<0.10'],
  },
};

The 30-second ramp from 10 to 500 VUs is what makes this a spike rather than a gradual stress test. At 30 seconds, K6 is adding roughly 16 new VUs every second.

Timeout configuration

Default K6 request timeout is 60 seconds. Under spike load, servers under severe resource pressure sometimes take that long to respond. Set explicit timeouts to distinguish "slow" from "unavailable" and to prevent VUs from waiting indefinitely:

export default function (data) {
  const res = http.get('https://api.example.com/products', {
    headers: { Authorization: `Bearer ${data.token}` },
    tags: { name: 'GetProducts' },
    timeout: '10s',   // fail fast — treat anything over 10s as an error
  });
 
  check(res, {
    'status ok':           (r) => r.status === 200,
    'responded in time':   (r) => r.timings.duration < 10000,
  });
}

Under spike load, fast failures are better than slow queues. If your server takes 60 seconds to respond under load, VUs pile up waiting — compounding the problem. A 5–10 second timeout converts stuck connections into counted errors instead of accumulated wait time.

Reading autoscaling signals

The spike test's most valuable output is whether autoscaling responded fast enough. Look for this pattern in the Grafana output:

[0:00–2:00]  VUs: 10,  p(95): 180ms,  errors: 0.0%
[2:00–2:30]  VUs: 10→500 (spike onset)
[2:30–3:00]  VUs: 500, p(95): 4200ms, errors: 18%   ← surge — autoscaler not yet ready
[3:00–4:00]  VUs: 500, p(95): 2100ms, errors: 6%    ← autoscaler adds instances
[4:00–6:30]  VUs: 500, p(95): 620ms,  errors: 0.8%  ← new instances absorbing load
[6:30–7:00]  VUs: 500→10 (drop)
[7:00–10:00] VUs: 10,  p(95): 185ms,  errors: 0.0%  ← recovered cleanly

The latency peak at 2:30–3:00 is the autoscaler's response lag. If that window spans 5 minutes at 18% errors, the autoscaler is too slow for your traffic pattern. If the system never recovers during the spike (latency stays above 4000ms), the autoscaler is not scaling to enough instances.

What spike testing does not test

A spike test applies sudden load but still represents genuine user traffic — requests that the system would normally process. It does not simulate:

DDoS attack traffic — malformed, random, or flood-rate traffic at the network layer
API abuse patterns — individual users hitting rate limits through repeated calls
Downstream cascade — upstream services generating sudden load on your API

These require separate, purpose-built testing approaches. A K6 spike test specifically measures your application's ability to absorb sudden volume from real users.

Common failure modes a spike test reveals

Connection pool exhaustion: http_req_blocked spikes sharply at spike onset. Your application is waiting for a free database connection — the pool is full. All new requests queue behind existing ones. Increasing pool size or adding read replicas addresses this.

CDN cold start: The first wave of requests hits origin servers because the CDN has no cached content for the spike traffic. After the first request per cache key, subsequent requests are served from cache. You see a sharp error burst in the first 30 seconds, then normalisation — this is expected and acceptable.

Thread pool exhaustion: Your application server hits its max thread count. New requests receive 503 responses immediately rather than queuing. Visible as a sharp http_req_failed rate increase with very short http_req_duration — fast failures, not slow ones.

⚠️ Common mistakes

Spike without a recovery phase. Ending the test immediately after the spike drop leaves the most important question unanswered: did the system return to normal? Always hold at baseline load for 2–3 minutes after the drop to observe recovery.
Making the spike ramp too long. A 5-minute ramp from 10 to 500 VUs is a stress test, not a spike. The defining feature of a spike is the abrupt, realistic shock — 10 to 30 seconds. Anything longer gives the autoscaler time to respond before the peak is reached, masking the real spike behaviour.
Setting timeouts too high. With the default 60-second timeout, 500 VUs can each tie up a connection for a minute — 500 simultaneous hung connections compounds the overload. Set explicit timeouts of 5–10 seconds and treat them as errors. Fast failures are easier for the system to recover from than slow queues.
No baseline VU phase before the spike. Without a baseline phase, you do not know whether p(95) = 1800ms during the spike reflects degradation under load or just normal system latency. The baseline is your control group.

🎯 Practice task

Run a spike test and observe autoscaling (or lack of it) against a public endpoint. 35 minutes.

Use https://test.k6.io — designed for load testing practice.

Write a spike test script with this pattern: 1m at 5 VUs (baseline) → 20s spike to 100 VUs → 3m hold at 100 VUs → 20s drop to 5 VUs → 2m hold at 5 VUs (recovery) → 30s ramp to 0.
Add timeout: '8s' to every request. Add a check for r.timings.duration < 8000.
Set thresholds wide enough for observation: http_req_duration: ['p(95)<15000'] and http_req_failed: ['rate<0.40'].
Run the test. Record p(95) and error rate at each phase. Does latency stabilise during the spike hold, or does it keep climbing?
Reduce the spike hold to 30s and increase the spike target to 200 VUs. Run again. Compare how quickly (or whether) the system recovers during the 30s hold vs the 3m hold.
Look at http_req_blocked in the output. Does it spike during the spike onset phase? What does a high http_req_blocked value during a spike tell you about the connection pool?