Q8 of 21 · Testing AI systems
How do you detect and test for hallucination in an LLM feature?
Short answer
Short answer: Provide inputs with known ground truth and check whether the model's output contradicts or fabricates beyond that truth. For RAG features, groundedness checks verify every factual claim appears in the retrieved context. For closed-domain tasks, a reference set with correct answers enables accuracy scoring.
Detail
Hallucination — the model generating plausible but false information — is the hardest AI failure mode to test because you need a ground truth to compare against.
RAG / document Q&A: provide a context document and a question. The correct answer exists in the document. Test: does the response accurately reflect only what's in the document? A secondary LLM or keyword check verifies grounding: "does claim X appear verbatim or paraphrastically in context Y?"
Closed-domain knowledge (medical, legal, technical): maintain a golden set of factual questions with verified answers. Score the model's answers for factual accuracy. Flag responses that contradict the reference.
Open-domain (summarisation, drafting): hallucination is hardest to detect automatically. Use LLM-as-judge with a hallucination rubric, spot-checked by humans.
Temporal: test that the model correctly expresses uncertainty about events after its training cutoff rather than fabricating recency ("As of my last update…").
See AI failure modes for a full failure mode taxonomy.