Safety-Review a Third-Party SKILL.md
Perform a defensive security review of a third-party or externally sourced SKILL.md — checking for prompt injection via referenced files, unsafe script execution, credential requests, overbroad permissions, and missing version pinning.
You are a defensive security reviewer specialising in AI agent skill safety. Review the SKILL.md below for safety risks before installation. IMPORTANT SCOPE: This review is defensive only — identifying risks so the reviewer can make an informed decision about whether to install, modify, or reject the skill. Do not generate exploit code, bypass instructions, or offensive content. SKILL_MD_CONTENT: {{SKILL_MD_CONTENT}} SKILL_SOURCE: {{SKILL_SOURCE}} (e.g. "public GitHub repository", "internal team member", "third-party skill marketplace") Review the skill against each of the following risk categories. For each, provide: Risk present (Yes / No / Unclear) and a one-sentence justification. Where risk is Yes or Unclear, provide a specific mitigation. **1. Prompt injection via referenced files** Does the skill instruct the agent to read from references/, external URLs, or user-provided files? If yes: could an attacker control the content of those files to inject instructions into the agent? Risk: Could a malicious references/ file override the skill's instructions or cause the agent to exfiltrate data? **2. Unsafe script execution** Does the skill have a scripts/ directory or instruct the agent to run shell commands? If yes: - Do the scripts require user confirmation before writing to disk or making network calls? - Do the scripts read credentials or tokens from the environment? - Do the scripts make outbound network requests? To where? - Are inputs validated before being passed to scripts? Risk: Could a crafted input cause a script to execute unintended commands? **3. Credential and secret exposure** Does the skill instruct the agent to read, log, output, or transmit credentials, API keys, tokens, environment variables, or PII? Does any output format include fields that would capture sensitive data? Risk: Could the skill inadvertently expose secrets through agent outputs or logs? **4. Overbroad permissions** Does the skill instruct the agent to read files outside the project directory? Does it instruct the agent to write to arbitrary paths? Does it instruct the agent to install packages or modify system configuration? Risk: Could the skill cause changes outside the intended scope? **5. Version pinning** Is the skill pinned to a specific version or commit hash? If it references external resources (URLs, Git repos), are those pinned? Risk: Could an update to the skill or its dependencies change its behaviour without the reviewer's knowledge? **6. Deceptive or misleading instructions** Does the skill's description match what the instruction body actually does? Are there instructions that claim to do one thing but instruct the agent to do another? Risk: Is the skill attempting to activate under false pretences? **7. Data exfiltration patterns** Does the skill instruct the agent to send data to external endpoints? Does it include instructions to write output to locations outside the project? Risk: Could the skill cause sensitive project data to leave the local environment? **Overall recommendation** After reviewing all 7 categories: - INSTALL AS-IS (no material risks found) - INSTALL WITH MODIFICATIONS (list specific changes required before installing) - DO NOT INSTALL (list the disqualifying risks) Provide a summary of the top 3 risks (if any) and the specific mitigations required. IMPORTANT: This review is a starting point. Complex skills with scripts/ content require a full code review of the scripts themselves. Do not install a skill solely on the basis of this automated review.
{{SKILL_MD_CONTENT}}requiredThe full content of the SKILL.md to review
e.g. --- name: qa-helper description: | Helps with QA tasks... --- ## Instructions ...
{{SKILL_SOURCE}}requiredWhere this skill came from (public repo, colleague, marketplace, etc.)
e.g. Public GitHub repository — github.com/example/qa-skills
- All 7 risk categories are assessed with a rating and justification
- Any scripts/ content is flagged for separate manual code review (this prompt does not substitute for it)
- The overall recommendation is one of the three defined outcomes, not a vague 'it depends'
- Mitigations are specific (e.g. 'remove line 12 which reads from APPDATA') not generic
- The reviewer understands this is a starting point and performs their own due diligence before installing
AI output requires human review before use. These checks are your responsibility.
- This review covers the SKILL.md file only — scripts in scripts/ require separate manual review of the script code
- A skill may pass all 7 criteria and still be subtly unsafe — treat this as a checklist, not a guarantee
- Do not share sensitive project code or credentials when pasting the SKILL.md content for review