How do you correlate response time with backend resource saturation?

Question

Accepted Answer

Run the load test while collecting host and service metrics (CPU, memory, IO, DB pool, GC, queue depth) on synchronised timestamps. Plot latency against each resource — the inflection point where latency rises and a resource hits its limit identifies the bottleneck. Methodology: Set up observability before the test. Datadog/Prometheus/New Relic must be capturing the system's metrics at 1-10s resolution: CPU, memory, disk IO, network, DB connection pool, slow query rate, GC pause time, queue depth, thread pool utilisation. Synchronise time. Load tool and observability must share NTP clock — a 30-second drift makes correlation impossible. Run a stepped ramp. Increase load in stages (50, 100, 200, 400 RPS) with a 5-minute hold each. Each plateau gives a stable measurement window. Plot latency vs. each resource. Either eye-ball overlapped charts in the dashboard, or pull the data into Python/notebooks and compute correlation. The signature you're looking for: latency stays flat through thr

How do you correlate response time with backend resource saturation?

// WHAT INTERVIEWERS LOOK FOR

// COMMON PITFALL

How do you correlate response time with backend resource saturation?

Short answer

Detail

// WHAT INTERVIEWERS LOOK FOR

// COMMON PITFALL