Prompt regression

AI & LLM Testing

// Definition

When a prompt change — or a model update underneath an unchanged prompt — silently degrades the quality of outputs your product depends on. Prompt regressions are particularly nasty because they don't throw errors and don't fail integration tests; the system keeps responding, just worse. The defence is a regression eval suite: a versioned set of test inputs with known-good outputs, run on every prompt change and every model upgrade, with scores tracked over time. Without this, a model provider's quiet behind-the-scenes update can degrade your product's quality and you won't notice until a user complains.

// Related terms