Two surfaces, one workflow.
Promptfoo plugin catalog
60+ plugins: harmful, PII, BFLA/BOLA/RBAC, ASCII smuggling, debug access, divergent repetition, excessive agency, hallucination, hijacking, indirect prompt injection, intent, MCP exploitation, memory poisoning, off-topic, prompt extraction, RAG document exfiltration, RAG source attribution, reasoning DoS, shell/SQL/SSRF injection, tool discovery, vertical packs (financial, medical, pharmacy, ecommerce, insurance, telecom, real estate, teen safety).
Strategy mixers
base64 / hex / leet / rot13 / homoglyph / multilingual / audio / image / video, plus iterative attacks (crescendo, GOAT, hydra, simba, bestOfN, layer composite), `mischievousUser` multi-turn, `indirectWebPwn`, `authoritativeMarkupInjection`, `mathPrompt`, `likert`, `gcg`, `citation`.
Vouch-AI offensive agent (v0)
Goal-driven attacker that picks a skill, crafts a prompt, sends it to your agent's chat endpoint, and iterates based on the response. Multi-judge ensemble: deterministic (canary regex) + tool-oracle (high-blast tool fired with attacker target) + LLM judge.
Skill library
8 agent-attack patterns shipped: indirect prompt injection, RAG poisoning, memory poisoning, MCP exploitation, tool-call hijack, cross-tenant escape, approval bypass, confused deputy. Each skill is a markdown playbook.
Cost-bounded by default
Three budgets: turn count (default 8), USD cost (default $1.00), and 100k-token chat-history budget. Misconfigured runs can't drain credit.
Continuous learning
Every successful attack feeds the firewall classifier and the corpus the Mutual Defense Network shares (opt-in). The defender gets smarter as attackers find more.
Run a redteam pack on your agent.
Configure the target, pick a skill, hit go. Findings land in the inbox.