Q30 of 38 · Performance
How do you detect performance regressions between releases automatically?
Short answer
Short answer: Compare each run's key metrics against the stored baseline using percentage-deviation thresholds. Fail the build when p95 or p99 exceeds the threshold. Track trends over time to distinguish gradual drift from sudden regressions.
Detail
The simplest approach: store the baseline metrics in a JSON file committed with the tests. After each load test run, a comparison script checks whether p95 has increased more than 15% (or whatever threshold the team agrees), and if so, exits with a non-zero code to fail the pipeline.
For more sophisticated tracking, send results to a time-series store (InfluxDB + Grafana, Datadog, or k6 Cloud). This enables visualising per-endpoint latency over weeks and spotting gradual degradation that falls within a single-run threshold but is clearly trending upward over multiple releases.
Key rules for reliable regression detection:
- Only compare runs from the same test environment and the same load level.
- Use percentile metrics (p95, p99), never average — a single slow outlier hides in the average.
- Account for natural variance: a 5% increase might be noise; a 20% increase is almost certainly signal. Set thresholds based on measured baseline variance, not intuition.