How does test design change for an AI/ML model output vs deterministic code?

Question

Accepted Answer

Deterministic code: assert exact outputs, use EP/BVA on inputs, branch coverage. ML models: assert *distributions* and *invariants* (output stays in valid range, monotonic in expected direction, robust to small perturbations), monitor drift in production, and use property-based testing more than example-based. Testing an ML model output is fundamentally different because the model isn't a deterministic function — it's a learned approximation, and the right answer for a given input is usually probabilistic. What changes: Assertions become invariants, not equalities. Deterministic: assert classify(image) == "cat". ML: assert classify(image).confidence > 0.5 for the obvious case; assert classify(rotatedimage).topclass == classify(image).top_class for invariance under rotation. Test data becomes the test suite. A deterministic suite has 50 test cases. An ML test suite has hundreds or thousands of input-output pairs (a labelled dataset), and the metric is aggregate (accuracy, F1, precision/

How does test design change for an AI/ML model output vs deterministic code?

// WHAT INTERVIEWERS LOOK FOR

// COMMON PITFALL

How does test design change for an AI/ML model output vs deterministic code?

Short answer

Detail

// WHAT INTERVIEWERS LOOK FOR

// COMMON PITFALL