Q2 of 38 · Performance

Why do you report p95 instead of average response time?

PerformanceMidperformancepercentilesmetricsslo

Short answer

Short answer: Averages hide the long tail. A request that's fast for 95% of users but takes 8 seconds for the slowest 5% will look 'fine' on average but enrages a noticeable slice of customers. Percentiles describe the experience real users actually have.

Detail

Imagine 100 requests: 95 take 100ms each, and 5 take 8 seconds each. The average is 495ms — looks acceptable. But the p95 is 100ms (95th percentile of the fast group) and the p99 is somewhere in the 8-second range. The average smoothed over a real user-impacting tail.

Percentiles describe the slowest user, not the average user. p50 (median) is "what does a typical request look like?" p95 is "what does the slowest 1-in-20 request look like?" p99 is "what does the slowest 1-in-100 request look like?" For a busy service that's thousands of slow requests per minute — every one of them a customer.

Service-level objectives are almost always written against percentiles for this reason: "p95 latency under 500ms" is a meaningful contract; "average latency under 500ms" can be satisfied by a system that randomly takes 30 seconds 5% of the time.

Two follow-up points an interviewer will love:

  1. Don't aggregate percentiles by averaging. The average of yesterday's hourly p95s is not the p95 over the day. Compute percentiles from the raw distribution, not from summary statistics.
  2. Watch p99 and beyond for fan-out. If a page makes 10 backend calls, the user experiences the slowest of the 10. A p99 of 1 second on each backend call means roughly 1 in 10 page loads sees a 1s tail — a much worse experience than the per-call p99 suggests.

// EXAMPLE

k6-thresholds.js

import http from 'k6/http';
import { check } from 'k6';

export const options = {
  vus: 50,
  duration: '5m',
  thresholds: {
    // Hard SLOs — fail the run if these are missed
    'http_req_duration{expected_response:true}': [
      'p(50)<200',
      'p(95)<500',
      'p(99)<1500',
    ],
    'http_req_failed': ['rate<0.001'],
  },
};

export default function () {
  const res = http.get('https://api.example.com/checkout');
  check(res, { 'status is 200': (r) => r.status === 200 });
}

// WHAT INTERVIEWERS LOOK FOR

Understanding that averages mask tails, ability to articulate what p95/p99 mean in user terms, and ideally awareness of fan-out amplification.

// COMMON PITFALL

Computing 'average p95' across time buckets — that's not a percentile any more. Or reporting only p50 and missing the long-tail customer experience.