Q3 of 21 · Testing AI systems

What are output property checks and how do you use them to test LLM responses?

Testing AI systemsMidtesting-ai-systemsproperty-checksllmevaluationassertionsgroundedness

Short answer

Short answer: Property checks test invariants that must hold on every valid output regardless of phrasing: required JSON fields exist, response length is within bounds, banned content is absent, claims cite the source. They replace exact-match assertions for non-deterministic outputs.

Detail

Property checks are assertions about constraints and content rules that define a valid response — not a specific valid response.

Common categories: Structural: does the response parse as valid JSON? Are required top-level fields present and the right type? Constraint: is the length within the documented range? Does the language match the requested locale? Safety: does the response contain PII, profanity, or competitor brand names? Use a regex or a secondary classifier. Groundedness: for RAG features, do all factual claims in the response appear in the retrieved source documents? A grounding check can be a secondary LLM call ("does claim X appear in context Y?") or an embedding similarity check. Instruction following: if the prompt specified "respond in bullet points" or "respond only in French," does the output comply?

Property checks run fast — they're assertions, not model calls — and can run on every response in a CI pipeline or even in production monitoring on live sampled traffic. See Evaluation methods.

// WHAT INTERVIEWERS LOOK FOR

Five categories: structural, constraint, safety, groundedness, instruction-following. Knowing these replace exact-match for non-deterministic outputs. Mentioning production monitoring as a deployment context.