AI procurement and supplier audits

11 min read · Reviewed May 2026

Most products in 2026 do not train their own foundation models. They embed someone else's — OpenAI's, Anthropic's, Google's, or a specialised vendor's. That makes supplier diligence a first-order QA question, not a procurement-team question. What evaluations did the supplier run? What is their incident response SLA? Can they produce a model card? Do they notify customers when the model retrains? This sub-page is the practitioner's checklist for that conversation — framework-anchored rather than law-anchored, because the regulatory landscape is in flux.

READ TIME11 min

DIFFICULTYintermediate

REVIEWEDMay 2026

YOU'LL LEARNThe diligence questions QA should own when the AI in your product comes from a supplier, the four documents to request, and why framework alignment matters more than law alignment in May 2026.

The four documents to ask for

Model card, evaluation report, incident response SLA, and training data attestation — the minimum diligence set.

The four-document set is the minimum viable diligence pack for a production AI supplier. Each document answers a different question a regulator may ask of you: what does the model do and not do (model or system card); how was it tested and when (evaluation report); how will you know when something goes wrong (incident response SLA); and where did the training data come from (training data attestation).

A supplier who cannot produce all four is not automatically a no-go — there are legitimate reasons for partial disclosure under NDA or competitive sensitivity. But any gaps should be named explicitly in your risk register, not quietly accepted as normal. Document what you asked for and what you received.

Model card or system card covering the supplier's model — Mitchell et al. format or HuggingFace spec acceptable; Anthropic and OpenAI publish system cards publicly as reference artefacts
Most recent evaluation report — benchmarks run, on which model version, on which date. Undated eval results are close to useless; model behaviour changes between versions.
Incident response and disclosure SLA — how they notify customers of safety issues, response-time commitments, and what their Responsible Scaling Policy-equivalent commitments are (if applicable). Anthropic RSP v3.2 (April 2026) and OpenAI's Preparedness Framework are the public reference formats.
Training data attestation — high-level provenance, PII handling approach, opt-out mechanisms. Especially relevant for IP and privacy compliance in regulated industries.

ai:Evaluating AI models ai:PII-safe synthetic data

Framework alignment, not law alignment

The regulatory floor is too unstable in May 2026 to anchor supplier diligence to specific laws.

Asking 'is your supplier EU AI Act compliant?' is harder to answer than it appears. The EU AI Act's implementing acts — including those governing the Article 10 bias-monitoring obligations (deadline: 2 August 2026) — are not all finalised at time of writing. The EU AI Act sub-page that would cover this in depth is deferred to Phase 3 (September 2026, after the August deadline and implementing acts settle) for exactly this reason.

At the US federal level, Biden's Executive Order 14110 was rescinded by Trump's EO 14179 on 23 January 2025 ('Removing Barriers to American Leadership in Artificial Intelligence'). A December 2025 EO asserted federal preemption on state AI regulation. Colorado's AI Act (SB205) saw its effective date delayed multiple times through 2025–2026 and in May 2026 the Colorado Legislature passed a bill to repeal and replace it. California's AB-2013, Illinois, and other states are moving in different directions. The UK government announced a 'Regulating for Growth Bill' with AI sandbox provisions in April 2026 but has not introduced a comprehensive AI Bill to Parliament.

A better question for supplier diligence: 'Is your supplier aligned to NIST AI RMF / ISO/IEC 42001:2023 / a documented internal governance framework?' Framework alignment survives regulatory churn. Law compliance, in the current landscape, does not.

ISO/IEC 42001:2023 certifications are increasingly available from AI suppliers — it is an audited management-system standard, so certification is verifiable via a third-party audit body. NIST AI RMF alignment is harder to verify externally (the RMF is a framework, not a certification scheme) but suppliers should be able to produce documentation mapping their practices to the four RMF functions.

Ask suppliers about framework alignment, not law compliance. The framework holds; the law shifts.

The AI Governance Platform category

Gartner and Forrester recognised the category in 2025 — useful for managing diligence at scale, not a substitute for the diligence itself.

Gartner formally recognised 'AI Governance Platforms' as a category in its 2025 Market Guide; Forrester's Wave: AI Governance Solutions (Q3 2025) named Credo AI as a Leader. Active vendors in May 2026 include Credo AI, Holistic AI, and Cisco AI Defense — the platform formerly known as Robust Intelligence, which Cisco acquired in October 2024 and absorbed into its security portfolio under the Cisco AI Defense brand.

These platforms help your organisation track supplier diligence systematically: dashboards, evidence management, and audit-trail tooling for managing governance across a portfolio of AI suppliers and AI-embedded features. They are not a substitute for the diligence conversation itself; they reduce the administrative cost of conducting that conversation across many suppliers and keeping the evidence current.

Honest framing: most teams with fewer than roughly 50 AI-embedded features will be better served by a well-maintained spreadsheet and the four-document checklist than by a platform purchase. The platforms earn their cost when the number of suppliers and audit events exceeds what a manual process can track cleanly.

Procurement red flags

Five signals that warrant a documented risk decision — not a silent acceptance.

Red flags in supplier diligence should be named and documented, not quietly accepted. The question each red flag raises is not "should we walk away?" but "are we making an informed risk decision, and is that decision on record?" The answer to the second question needs to be yes regardless of how the first is resolved.

# AI Supplier Diligence Questionnaire

## Model documentation
- [ ] Can you provide a model card or system card for the model we are integrating?
- [ ] Is the card version-controlled and updated on each retraining event?
- [ ] Which documentation standard does it follow (HuggingFace spec / Mitchell et al. / system card)?

## Evaluation and testing
- [ ] What evaluation benchmarks were run on the current production version?
- [ ] What date were those evaluations run, and which model version do they reflect?
- [ ] Are evaluation results available as a versioned artefact, or only as a marketing summary?
- [ ] Was adversarial or red-team evaluation conducted? Can findings be shared under NDA?

## Incident response
- [ ] What is your disclosure SLA for safety incidents affecting this model?
- [ ] Do you notify customers when the model retrains or when behaviour changes materially?
- [ ] Do you have an RSP-equivalent or Preparedness Framework-equivalent published commitment?

## Training data and compliance
- [ ] What is the high-level provenance of the training data?
- [ ] How is PII handled in the training dataset and in inference outputs?
- [ ] What opt-out or data-removal mechanisms exist for training data subjects?

## Governance framework alignment
- [ ] Are you aligned to NIST AI RMF 1.0? Can you provide alignment documentation?
- [ ] Do you hold ISO/IEC 42001:2023 certification or equivalent?
- [ ] Who is the named accountability owner for AI safety and compliance at your organisation?

AI supplier questionnaire — minimum viable diligence pack (adapt to your procurement process)

// WARNING

Any of these red flags appearing alone warrants a follow-up question. Two or more together is a procurement decision-point — not an automatic walk-away signal, but a moment worth surfacing to your procurement team in writing with a named risk decision attached.

No model card, or supplier refuses to provide one without offering an NDA path
Evaluation results exist only in marketing materials — not as a dated, version-linked artefact
No incident response SLA, or SLA measured in weeks rather than hours or days
"We cannot share training data details" without offering an NDA path
No commitment to notifying customers of retraining events that change model behaviour materially