You have built a complete performance test suite: five test scripts covering every major load pattern, a data management layer using SharedArray, custom business metrics with thresholds, a CI/CD integration that blocks on failure, and shareable HTML reports. This lesson reviews what you have built, surfaces the most important concepts to carry forward, and points to where the discipline leads next.
Self-assessment checklist
Work through this checklist against your capstone implementation. Each item maps to a core concept from the course.
Test structure and lifecycle:
-
setup()runs once and passes data to all scenario functions via thedataparameter - Scenario functions are exported with
export function— not justfunction -
teardown()would run even ifsetup()fails - Init code (module-level constants, SharedArray loading) contains no HTTP requests
Data management:
- User and product data is loaded with
SharedArray, not plainJSON.parse(open(...)) - VU-specific data selection uses
(__VU - 1) % array.lengthfor deterministic assignment - No data mutation inside the default function or scenario functions
HTTP and assertions:
- Every request has a
tags: { name: '...' }for metric filtering -
check()is used for per-request validation; thresholds for whole-test pass/fail -
http.del()is used, nothttp.delete() - POST and PUT bodies use
JSON.stringify()withContent-Type: application/json
Load patterns:
- Load test has a ramp-up, steady-state hold, and
target: 0ramp-down - Stress test has hold stages between each increment (not a continuous ramp)
- Spike test has a recovery observation phase after the load drops
- Soak test runs at normal load, not peak stress load
Custom metrics and quality gates:
- Custom business metrics are instantiated in init code, not inside functions
- Every custom metric that measures a KPI has a corresponding threshold
-
abortOnFailwithdelayAbortEvalis used on the stress test's error rate threshold - Tag-based thresholds separate latency requirements by endpoint category
CI/CD:
- Smoke test runs on every PR (fast, 1 VU, wide thresholds)
- Load test runs on merge to main (gated on smoke passing)
- HTML report upload uses
if: always() - Credentials are in GitHub secrets, referenced as
__ENV.VARIABLE_NAMEin scripts - Baseline JSON is committed to version control
The most important concepts to retain
If you remember nothing else from this course, remember these five things:
1. Thresholds are the exit code. http_req_duration p(95)<500 failing means K6 exits with code 108. That is how CI knows the test failed. Without thresholds, K6 always exits with 0, and your pipeline always passes regardless of performance.
2. setup() runs once — not per VU. Authentication, test data creation, and expensive one-time preparation belong in setup(). Login logic inside the default function means N VUs × M iterations login requests — every iteration.
3. SharedArray loads once regardless of VU count. JSON.parse(open('./users.json')) in init code runs once per VU — 500 VUs means 500 copies in memory. new SharedArray(...) loads once, shared across all VUs.
4. Arrival-rate executors model real traffic under load. VU-based executors naturally reduce RPS when the server is slow (VUs are waiting, not sending). constant-arrival-rate maintains the target rate, accumulating VUs if needed — just like real users who do not stop clicking because your server is slow.
5. Trends catch what thresholds miss. A threshold set at p(95)<500 passes even if p95 climbs from 200ms to 480ms over 10 weeks. Grafana time-series dashboards and baseline comparison scripts catch this gradual degradation before it crosses the threshold.
Stretch goals
These extend the capstone into more advanced territory. Each is independent — pick any one that interests you.
- – Use k6/browser API (Chromium-based)
- – Measure Core Web Vitals: LCP, CLS, FID
- – Compare browser vs API latency for the same flows
- – Add Node Exporter for CPU and memory metrics
- – Overlay backend metrics on the K6 Grafana dashboard
- – Correlate p95 climbing with heap memory growth
- – Run load.js on Grafana Cloud K6
- – Set load zones: 50% EU-West, 50% US-East
- – Compare geographic latency breakdown
- Commit baseline JSON to the repo –
- CI step compares current p95 to baseline + 20% –
- GitHub PR comment shows regression if detected –
Browser performance testing — K6's k6/browser module runs a real Chromium browser and measures Core Web Vitals (Largest Contentful Paint, Cumulative Layout Shift, First Input Delay). This is performance testing at the user experience layer, not just the API layer. Mix browser VUs with HTTP VUs in the same scenarios block to test how user-visible performance degrades under backend load.
Full observability stack — Add Prometheus Node Exporter and a database metrics exporter to your Docker Compose stack. Build a Grafana dashboard with K6 metrics on the left panel and server metrics (CPU, memory, DB connections) on the right panel. When p95 starts climbing in the K6 panel, look right: which server resource is the constraint?
Multi-region testing — Grafana Cloud K6 lets you specify load zones: geographic regions from which to generate traffic. Real users in EU and US experience different network latency to your servers. A test run from a single CI server (usually US-East) may miss geographic latency issues. Multi-region tests reveal them.
Automated regression detection — Commit a baseline JSON file (from a known-good load test) to the repository. Add a GitHub Actions step that runs after the load test and compares the current run's p95 against the baseline + tolerance. Fail the CI job if the comparison finds a regression. Post the result as a PR comment using the GitHub API.
Where to go from here
Performance Engineering as a career track — Performance testing at the level you have built here — CI-integrated, threshold-driven, with business metrics — is a senior skill. It is the foundation of the Site Reliability Engineer and Software Development Engineer in Test specialisations. Performance engineers are responsible not just for running tests but for setting SLAs, advising on capacity planning, and owning the observability stack.
JMeter for enterprise contexts — Some organisations standardise on JMeter because of its long history, GUI-based test plan editing, and wide plugin ecosystem. The concepts you have learned — load patterns, thresholds, test data management, CI integration — transfer directly. JMeter uses different terminology (threads, samplers, listeners) but the mental model is the same.
Related courses in this programme:
- CI/CD for QA Engineers — pipeline ownership, quality gates, deployment strategies
- Microservices Testing — testing distributed systems, contract testing, service mesh observability
- Non-Functional Testing — the broader discipline of performance, security, accessibility, and reliability testing
Performance testing is not a one-time activity before a release. The test suite you have built here is meant to run continuously — every PR, every night, every release. The value compounds over time: with 30 days of trend data, a 5% regression is obvious. With 365 days, gradual capacity growth is planned rather than scrambled for. The discipline is in running the tests, trusting the thresholds, and acting on what the data shows.