Q30 of 38 · Performance

How do you detect performance regressions between releases automatically?

PerformanceMidperformanceregression-detectionbaselineci-cdk6influxdb

Short answer

Short answer: Compare each run's key metrics against the stored baseline using percentage-deviation thresholds. Fail the build when p95 or p99 exceeds the threshold. Track trends over time to distinguish gradual drift from sudden regressions.

Detail

The simplest approach: store the baseline metrics in a JSON file committed with the tests. After each load test run, a comparison script checks whether p95 has increased more than 15% (or whatever threshold the team agrees), and if so, exits with a non-zero code to fail the pipeline.

For more sophisticated tracking, send results to a time-series store (InfluxDB + Grafana, Datadog, or k6 Cloud). This enables visualising per-endpoint latency over weeks and spotting gradual degradation that falls within a single-run threshold but is clearly trending upward over multiple releases.

Key rules for reliable regression detection:

  • Only compare runs from the same test environment and the same load level.
  • Use percentile metrics (p95, p99), never average — a single slow outlier hides in the average.
  • Account for natural variance: a 5% increase might be noise; a 20% increase is almost certainly signal. Set thresholds based on measured baseline variance, not intuition.

// WHAT INTERVIEWERS LOOK FOR

Percentage-deviation thresholds against a stored baseline. Time-series trending to catch gradual drift. Using percentiles, not averages. Same environment required for valid comparisons.