Blog

OWASP AI Testing Guide v1.0: a practical checklist for testing AI/LLM features

04/04/2026

OWASP released AI Testing Guide v1.0 (11/2025). It’s an attempt to make AI testing as repeatable as traditional software testing—while addressing the risks that are specific to AI systems.

The core message is simple:

Security alone is not enough → the real objective is AI trustworthiness.

Trustworthiness includes security, but also hallucinations, bias/fairness, explainability, agentic overreach, and multiple forms of data leakage.

This post turns the guide into a practical outline + checklist your team can apply when shipping AI/LLM features.

What is the OWASP AI Testing Guide?

The guide provides:

a standardized methodology for testing AI and LLM-based systems
repeatable test cases organized across four layers:
- AI Application (prompts, agents, UI, integrations)
- AI Model (robustness, poisoning, inference attacks)
- AI Infrastructure (supply chain, resources, tool/plugin boundaries)
- AI Data (training/runtime exposure, minimization, consent)

In practice, it’s a map of what needs to be tested so an AI feature behaves safely and predictably in production.

Why AI testing is different from “normal” software testing

The guide highlights failure modes that won’t be covered by standard unit/integration tests:

prompt injection / jailbreaks
indirect prompt injection (when the model consumes external content)
sensitive data leakage
hallucinations and misinformation
bias / fairness failures
excessive or unsafe agency
supply chain compromise and data/model poisoning
drift and degradation over time

Practical checklist: what to test before releasing an AI/LLM feature

Below is a condensed checklist aligned with the guide’s structure. The point is to build repeatable tests, not a one-off review.

1) Application layer (AITG-APP)

Prompt injection: can users override system instructions and policies?
Indirect prompt injection: if the model reads webpages/tickets/messages, can external text hijack the behavior?
Sensitive data leak: does PII/customer data/secrets leak via responses, logs, or caches?
Unsafe outputs: does the model produce harmful instructions or disallowed content?
Agentic behavior limits: with tool access (APIs, Jira, email), does the agent stay within hard boundaries?
Prompt disclosure: can the model reveal system prompts or internal instructions?
Embedding manipulation: can retrieval be steered toward malicious content?
Model extraction: can an attacker infer too much about the model/system through queries?
Hallucinations: how often does it fabricate, and how is uncertainty communicated?
Over-reliance: does UX encourage blind trust (“AI said so”)?
Explainability & interpretability: can you explain/trace why an answer was produced (sources, rationale)?

2) Model layer (AITG-MOD)

Evasion attacks: does the model fail under adversarial-but-valid inputs?
Runtime poisoning: can production inputs poison behavior (RAG memory, caching, feedback loops)?
Poisoned training sets / fine-tuning: how do you validate tuning data and prevent hidden payloads?
Membership inference / inversion: can the model reveal training data or reconstruct sensitive examples?
Robustness to new data: how does quality change when input distribution shifts?
Goal alignment: is the model aligned with your policies and user intent?

3) Infrastructure layer (AITG-INF)

Supply chain tampering: models, dependencies, containers, CI/CD—what is verified and signed?
Resource exhaustion: can attackers drive cost/latency via token flooding or tool loops?
Plugin boundary violations: can tool/plugin usage leak data or exceed intended scope?
Capability misuse: can the agent be induced into unsafe automation (mass actions, destructive calls)?
Dev-time model theft: how are models and secrets protected during development?

4) Data layer (AITG-DAT)

Training data exposure: what’s in the model, and can it leak?
Runtime exfiltration: can users force data exfiltration via prompt injection + tools?
Dataset diversity & coverage: do you cover important variation, or create blind spots?
Data minimization & consent: do you store only what’s needed, with the right consent and controls?

Common failure modes we keep seeing

RAG retrieves a plausible-but-wrong source → the model states it as fact.
Agents get too much privilege → one prompt injection becomes real system changes.
Guardrails exist, but are not regression-tested.
Minimization is ignored → logs/caches contain sensitive data.
No drift monitoring → quality degrades silently over time.

How to operationalize this (so it becomes normal delivery)

A pragmatic approach:

threat model your AI architecture across app/model/infra/data
build repeatable test packs (injection, data leaks, unsafe outputs, tool boundaries)
add observability: logging, audit trails, feedback loops, drift metrics
define “stop the line” criteria for release readiness

CTA (Byte)

If you’re shipping (or planning) AI/LLM features, we can help you:

tailor an AI testing framework to your architecture
implement prompt injection / indirect injection / data leak test suites
design safe agent tool boundaries + audit trail
integrate the tests into CI/CD so this stays maintainable

Message us and we’ll review your setup in 60–90 minutes and turn it into a concrete testing plan.

Source: OWASP AI Testing Guide v1.0 (PDF)