A quality gate is a pass/fail criterion that prevents a build from progressing when performance degrades. K6 thresholds are quality gates: they define acceptable performance, and when any threshold fails, K6 exits with code 108 — a signal CI pipelines detect and act on without any custom parsing.
Thresholds in a CI context
Quality gate threshold definitions and outcomes
| Threshold definition | What triggers failure | When to tighten | |
|---|---|---|---|
| SLA-driven | p(95)<500 — matches SLA committed to users | Any deploy that causes 95th percentile to exceed the SLA | When the SLA itself is tightened in the service agreement |
| Regression-driven | p(95)<baseline*1.2 — 20% regression tolerance | Any change that makes the system 20% slower than the last measured baseline | When performance improves — update the baseline to lock in the gain |
| Risk-driven | p(99)<300 on Login, p(95)<200 on Search — stricter for critical paths | Degradation specifically on paths where slowness causes user abandonment or revenue loss | When business metrics (conversion rate, login success) correlate with latency increases |
| Throughput-driven | http_reqs rate>100 — minimum throughput guarantee | Throughput drops — possible early sign of resource exhaustion or saturated thread pool | When capacity grows and the system should serve more RPS than before |
A complete quality gate configuration
export const options = {
vus: 50,
duration: '10m',
thresholds: {
// SLA: overall latency
'http_req_duration': ['p(95)<500', 'p(99)<1000'],
// SLA: error rate
'http_req_failed': ['rate<0.001'], // less than 0.1%
// Risk-driven: critical paths get stricter thresholds
'http_req_duration{name:Login}': ['p(99)<300'],
'http_req_duration{name:Payment}': ['p(95)<800', 'p(99)<1500'],
'http_req_duration{name:Search}': ['p(95)<400'],
// Throughput gate: must sustain at least 100 RPS
'http_reqs': ['rate>100'],
// Custom business metric: at least 95% of orders must succeed
'order_success_rate': ['rate>0.95'],
// Check pass rate: 99%+ of assertions must hold
'checks': ['rate>0.99'],
},
};When any of these thresholds fails, K6 exits with code 108. In GitHub Actions:
- name: Run load test
uses: grafana/k6-action@v0.3.1
with:
filename: tests/load-test.js
# The step fails automatically if K6 exits with 108
# No custom parsing needed — CI sees a non-zero exit codeThe CI job fails. The PR is blocked. The deploy does not proceed.
Threshold philosophy: setting values that matter
Too lenient: p(95)<5000 (5 seconds). The test always passes. Developers do not trust it because they know it would pass even if the system were broken. Quality gates that never fail are not quality gates.
Too strict: p(95)<50 (50ms). The test always fails unless you are testing a local in-memory cache. Developers ignore the failure because it has nothing to do with their change. False positives destroy confidence in the gate.
Calibrated: Set thresholds to 20–30% above your current measured p95 during steady-state load. If the system currently runs at p95=180ms, set p(95)<240. This catches regressions while tolerating normal variance. Update thresholds when performance genuinely improves.
Collaborative threshold ownership
Thresholds represent commitments. Effective quality gates are set collaboratively:
- Product / business: defines user-facing SLAs ("checkout must respond in under 1 second 95% of the time")
- Engineering: translates SLAs into measurable threshold expressions and validates feasibility
- QA: writes the thresholds, maintains the test, ensures CI enforces them
- All stakeholders: review threshold values in code review — they are as important as functional tests
Thresholds checked into version control, reviewed like code, and updated intentionally are more trustworthy than values set once and forgotten.
Tag-based thresholds for granular gates
Apply different quality standards to different endpoint categories using tags:
export const options = {
thresholds: {
// Critical user-facing paths
'http_req_duration{category:critical}': ['p(95)<200'],
// Standard API endpoints
'http_req_duration{category:standard}': ['p(95)<500'],
// Reporting and export endpoints (users expect them to be slower)
'http_req_duration{category:reports}': ['p(95)<3000'],
},
};
export default function (data) {
// Tag each request with its category
http.get('https://api.example.com/health', {
tags: { name: 'HealthCheck', category: 'critical' },
});
http.get('https://api.example.com/orders', {
tags: { name: 'ListOrders', category: 'standard' },
});
http.post('https://api.example.com/reports/export', null, {
tags: { name: 'ExportReport', category: 'reports' },
});
}A slow report export does not fail the checkout threshold. The gates are independent.
abortOnFail for load tests in CI
In CI, a load test running to completion while error rates are at 40% is wasteful — the result is clear before the test ends. Use abortOnFail to terminate early:
export const options = {
stages: [
{ duration: '3m', target: 50 },
{ duration: '10m', target: 50 },
{ duration: '2m', target: 0 },
],
thresholds: {
http_req_failed: [{
threshold: 'rate<0.05',
abortOnFail: true,
delayAbortEval: '2m', // give system time to warm up first
}],
http_req_duration: [{
threshold: 'p(95)<2000',
abortOnFail: true,
delayAbortEval: '3m',
}],
},
};delayAbortEval: '2m' prevents early termination during ramp-up. A CI test that aborts at minute 5 of a 15-minute run because of a real failure saves 10 minutes of CI time and still reports the failure correctly.
⚠️ Common mistakes
- Thresholds that have never failed. If your threshold has never triggered a CI failure since you added it, it is probably too lenient. Review threshold values periodically against your current measured performance.
- Missing
delayAbortEvalon CI thresholds. During ramp-up, error rates and latency are naturally elevated. Without a delay,abortOnFailterminates the test in the first 30 seconds before the system reaches steady state — a false positive. - No threshold on custom metrics. If you track
order_success_rateas a custom metric but do not add a threshold, the CI job passes even if 50% of orders are failing. Every custom business metric that measures a KPI should have a threshold. - Setting the same threshold for all endpoints. A 2-second threshold on a login endpoint is unacceptable if users abandon your app after 1 second. Use tagged thresholds to match each endpoint's business priority.
🎯 Practice task
Design and implement quality gates for a multi-endpoint test. 35 minutes.
Use https://jsonplaceholder.typicode.com.
- Write a script with
vus: 5, duration: '2m'that calls/posts(taggedcategory:read),/users(taggedcategory:read), andPOST /posts(taggedcategory:write). - Add these thresholds:
'http_req_duration{category:read}': ['p(95)<400']'http_req_duration{category:write}': ['p(95)<600']'http_req_failed': ['rate<0.01']'checks': ['rate>0.99']
- Add checks to each request (status code, body content).
- Run and observe all thresholds pass. Note the actual p(95) values.
- Set the read threshold to
p(95)<1to force a failure. Run again — verify exit code 108. - Add
abortOnFail: true, delayAbortEval: '30s'to the failing threshold. Run again and observe whether the test terminates early or completes. - Reflect: what would your team's real SLA thresholds be for these endpoint categories? Write a comment in the script explaining the reasoning behind each threshold value.