Review and Stretch Goals — Performance Testing with K6

You have built a complete performance test suite: five test scripts covering every major load pattern, a data management layer using SharedArray, custom business metrics with thresholds, a CI/CD integration that blocks on failure, and shareable HTML reports. This lesson reviews what you have built, surfaces the most important concepts to carry forward, and points to where the discipline leads next.

Self-assessment checklist

Work through this checklist against your capstone implementation. Each item maps to a core concept from the course.

Test structure and lifecycle:

setup() runs once and passes data to all scenario functions via the data parameter
Scenario functions are exported with export function — not just function
teardown() would run even if setup() fails
Init code (module-level constants, SharedArray loading) contains no HTTP requests

Data management:

User and product data is loaded with SharedArray, not plain JSON.parse(open(...))
VU-specific data selection uses (__VU - 1) % array.length for deterministic assignment
No data mutation inside the default function or scenario functions

HTTP and assertions:

Every request has a tags: { name: '...' } for metric filtering
check() is used for per-request validation; thresholds for whole-test pass/fail
http.del() is used, not http.delete()
POST and PUT bodies use JSON.stringify() with Content-Type: application/json

Load patterns:

Load test has a ramp-up, steady-state hold, and target: 0 ramp-down
Stress test has hold stages between each increment (not a continuous ramp)
Spike test has a recovery observation phase after the load drops
Soak test runs at normal load, not peak stress load

Custom metrics and quality gates:

Custom business metrics are instantiated in init code, not inside functions
Every custom metric that measures a KPI has a corresponding threshold
abortOnFail with delayAbortEval is used on the stress test's error rate threshold
Tag-based thresholds separate latency requirements by endpoint category

CI/CD:

Smoke test runs on every PR (fast, 1 VU, wide thresholds)
Load test runs on merge to main (gated on smoke passing)
HTML report upload uses if: always()
Credentials are in GitHub secrets, referenced as __ENV.VARIABLE_NAME in scripts
Baseline JSON is committed to version control

The most important concepts to retain

If you remember nothing else from this course, remember these five things:

1. Thresholds are the exit code. http_req_duration p(95)<500 failing means K6 exits with code 108. That is how CI knows the test failed. Without thresholds, K6 always exits with 0, and your pipeline always passes regardless of performance.

2. setup() runs once — not per VU. Authentication, test data creation, and expensive one-time preparation belong in setup(). Login logic inside the default function means N VUs × M iterations login requests — every iteration.

3. SharedArray loads once regardless of VU count. JSON.parse(open('./users.json')) in init code runs once per VU — 500 VUs means 500 copies in memory. new SharedArray(...) loads once, shared across all VUs.

4. Arrival-rate executors model real traffic under load. VU-based executors naturally reduce RPS when the server is slow (VUs are waiting, not sending). constant-arrival-rate maintains the target rate, accumulating VUs if needed — just like real users who do not stop clicking because your server is slow.

5. Trends catch what thresholds miss. A threshold set at p(95)<500 passes even if p95 climbs from 200ms to 480ms over 10 weeks. Grafana time-series dashboards and baseline comparison scripts catch this gradual degradation before it crosses the threshold.

Stretch goals

These extend the capstone into more advanced territory. Each is independent — pick any one that interests you.

Stretch goals

– Use k6/browser API (Chromium-based)
– Measure Core Web Vitals: LCP, CLS, FID
– Compare browser vs API latency for the same flows

– Add Node Exporter for CPU and memory metrics
– Overlay backend metrics on the K6 Grafana dashboard
– Correlate p95 climbing with heap memory growth

– Run load.js on Grafana Cloud K6
– Set load zones: 50% EU-West, 50% US-East
– Compare geographic latency breakdown

Commit baseline JSON to the repo –
CI step compares current p95 to baseline + 20% –
GitHub PR comment shows regression if detected –

Browser performance testing — K6's k6/browser module runs a real Chromium browser and measures Core Web Vitals (Largest Contentful Paint, Cumulative Layout Shift, First Input Delay). This is performance testing at the user experience layer, not just the API layer. Mix browser VUs with HTTP VUs in the same scenarios block to test how user-visible performance degrades under backend load.

Full observability stack — Add Prometheus Node Exporter and a database metrics exporter to your Docker Compose stack. Build a Grafana dashboard with K6 metrics on the left panel and server metrics (CPU, memory, DB connections) on the right panel. When p95 starts climbing in the K6 panel, look right: which server resource is the constraint?

Multi-region testing — Grafana Cloud K6 lets you specify load zones: geographic regions from which to generate traffic. Real users in EU and US experience different network latency to your servers. A test run from a single CI server (usually US-East) may miss geographic latency issues. Multi-region tests reveal them.

Automated regression detection — Commit a baseline JSON file (from a known-good load test) to the repository. Add a GitHub Actions step that runs after the load test and compares the current run's p95 against the baseline + tolerance. Fail the CI job if the comparison finds a regression. Post the result as a PR comment using the GitHub API.

Where to go from here

Performance Engineering as a career track — Performance testing at the level you have built here — CI-integrated, threshold-driven, with business metrics — is a senior skill. It is the foundation of the Site Reliability Engineer and Software Development Engineer in Test specialisations. Performance engineers are responsible not just for running tests but for setting SLAs, advising on capacity planning, and owning the observability stack.

JMeter for enterprise contexts — Some organisations standardise on JMeter because of its long history, GUI-based test plan editing, and wide plugin ecosystem. The concepts you have learned — load patterns, thresholds, test data management, CI integration — transfer directly. JMeter uses different terminology (threads, samplers, listeners) but the mental model is the same.

Related courses in this programme:

CI/CD for QA Engineers — pipeline ownership, quality gates, deployment strategies
Microservices Testing — testing distributed systems, contract testing, service mesh observability
Non-Functional Testing — the broader discipline of performance, security, accessibility, and reliability testing

Performance testing is not a one-time activity before a release. The test suite you have built here is meant to run continuously — every PR, every night, every release. The value compounds over time: with 30 days of trend data, a 5% regression is obvious. With 365 days, gradual capacity growth is planned rather than scrambled for. The discipline is in running the tests, trusting the thresholds, and acting on what the data shows.