Observability

Automation

// Definition

The ability to understand the internal state of a system from the signals it emits externally, without needing to redeploy or modify the system. The three pillars are logs (timestamped records of discrete events), metrics (numeric measurements aggregated over time, such as request rate, error rate, and latency percentiles), and traces (end-to-end records of a request's path through distributed services, linked by a correlation ID). High observability means a QA engineer can diagnose a failure purely from existing output, without attaching a debugger or reproducing the issue locally. In test environments, observability enables post-run failure analysis: instead of re-running a flaky test with extra logging, query structured logs for the test's correlation ID and see exactly which service call failed and why. Contrast with monitoring, which alerts on known failure thresholds — observability enables exploration of previously unknown failure modes.

// Related terms

Correlation ID
A unique identifier — typically a UUID — attached to a request when it first enters a system and propagated through every downstream service call, log entry, and event. Correlation IDs make it possible to trace a single user action across multiple services, relating all log entries to one root cause. QA engineers use them to diagnose failures in distributed systems: locate the correlation ID in a failing test's response header, then grep every service log for that ID to reconstruct the exact call chain. Absence of a correlation ID on any hop indicates a broken propagation point. CI pipelines increasingly inject a known correlation ID into test requests so that post-run log analysis can be scoped to a single test run's traffic.