Non-determinism

AI & LLM Testing

// Definition

Behaviour where the same input doesn't always produce the same output. In classical testing this is the cause of flaky tests — race conditions, time-of-day bugs, unstable network — and the response is to hunt down the source and eliminate it. In AI-backed systems, non-determinism is intrinsic to the model itself: an LLM with a non-zero temperature will give different answers to the same prompt, by design. The QA implication is that the same tactic — eliminate variance — doesn't work; you have to measure variance instead. Tolerance budgets, score distributions, and agreement-rate metrics replace pass/fail counts for the AI parts of a system, while the deterministic plumbing around it (auth, routing, database writes) keeps its classical test treatment.

// Related terms