How do you detect performance regressions between releases automatically?

Question

Accepted Answer

Compare each run's key metrics against the stored baseline using percentage-deviation thresholds. Fail the build when p95 or p99 exceeds the threshold. Track trends over time to distinguish gradual drift from sudden regressions. The simplest approach: store the baseline metrics in a JSON file committed with the tests. After each load test run, a comparison script checks whether p95 has increased more than 15% (or whatever threshold the team agrees), and if so, exits with a non-zero code to fail the pipeline. For more sophisticated tracking, send results to a time-series store (InfluxDB + Grafana, Datadog, or k6 Cloud). This enables visualising per-endpoint latency over weeks and spotting gradual degradation that falls within a single-run threshold but is clearly trending upward over multiple releases. Key rules for reliable regression detection: Only compare runs from the same test environment and the same load level. Use percentile metrics (p95, p99), never average — a single slow out

How do you detect performance regressions between releases automatically?

Short answer

Detail

// WHAT INTERVIEWERS LOOK FOR