Your agent's seal of
approval at AI speed
One Living Cert score from red-team pass-rate, firewall block-rate, and intent failure-rate. Signed, public, embeddable, and auto-revocable.
Agent autonomy gives your org leverage, but production risk comes with tradeoffs.
The invisible blast radius
Agents inherit tools, prompts, data sources, tenants, and approval paths. Most teams cannot see what an agent can actually touch until something breaks.
The dashboard trap
Trace tools show thousands of runs, but the team still has to find repeated failures, reproduce them, and prove the fix worked.
Assemble, test, and govern agents with a risk layer that keeps up.
Production-grade inventory
Map agents, models, tools, prompts, data sources, approvals, environments, and tenant boundaries in one graph.
AI-native failure mining
Cluster production failures, explain root cause, generate repro tests, and turn evidence into prioritized findings.
Control without drag
Run targeted red-team packs, enforce policy checks, and publish living certs your customers can verify.
Begin with an agent permission graph
Vouch discovers what every agent can read, write, delete, message, deploy, or move money through.
Observe real behavior in production
Trace prompts, retrieved context, model output, tool calls, policy decisions, feedback, escalation, latency, and cost.
{
"tool": "issue_refund",
"args": { "amount": 84.00 },
"policy": "approval-required",
"decision": "block",
"reason": "identity confidence 0.61"
}Turn logs into tests and fixes
Repeated failures become findings, repro tests, remediation plans, GitHub issues, and retest evidence.
Refund issued before identity verification
Cluster of 17 sessions · suspected schema gap · auto-repro generated.
- Cluster identified · 17 sessions
- Repro test written · 8/10 stable
- PR-ready remediation drafted
+ if (!ctx.identity.verified) {
+ return policy.block("identity-required");
+ }
await tools.issue_refund(input);Stay in control with our open-source risk layer
Don't get locked into someone else's black box. Vouch works with local setup, real traces, live testing, and policy-as-code your team can inspect.
// policy.ts import { ToolCall, PolicyGate, CertScore } from "vouch"; export default function RefundAgent() { return ( <PolicyGate label="Verified identity"> <ToolCall name="issue_refund" risk="money_movement" /> <CertScore min={90} /> </PolicyGate> ); }
Make your security team confident and your product team faster.
Vouch turns repeated production failures into clusters, repro tests, remediation PRs, and a fresh living-cert score — without your team triaging trace tables by hand.
Risk evidence your security team trusts, in a workflow your product team will actually run.
Repeated failures become repro tests, remediation PRs, and a fresh cert score.
Running- Cluster identified17 refund-agent sessions promised action without approval gateF-1142 · CRITICAL
- Repro test written8 / 10 stable runs · committed to vouch-tests/refund-1142.spec.tsauto-generated
- PR-ready remediation draftedAdd identity gate before tools.issue_refund · vouch-bot wants to merge1 commit
- Cert score updatedLiving cert · 88.6 → 91.0 after policy gate ships+2.4
Teams use Vouch to turn agent risk into engineering work.
Stop false promises before customers churn
Detect when an agent promises cancellation, refunds, or account changes without the required tool call or approval.
Read the caseProtect privileged GTM workflows
Score risky automations by data sensitivity, blast radius, exposure, controls, and production frequency.
Read the caseGovern code execution and deployment
Red-team tool abuse, secret access, repo actions, CI/CD paths, and multi-turn social engineering.
Read the caseThe valuable output is not another trace table. It is the cluster, the suspected cause, the reproduction test, and the suggested fix.
Stop guessing. Start proving your agents are safe.
Is Vouch just observability?
No. Observability is the evidence stream. Vouch turns traces into inventory, failures, tests, risk scores, policies, and living certificates.
Can Vouch run locally?
Yes. The current local stack runs the app, worker, Postgres, ClickHouse, Redis, MinIO, and firewall API from localhost for testing.
Does it block unsafe actions?
Vouch already has a firewall path. The next production layer is deterministic tool-call policy, approval gates, tenant checks, and audit trails.
What should we build next?
The next high-leverage feature is agent inventory and permission graph. It powers blast-radius scoring, smarter red-team tests, and runtime policy.
Can this become insurance-ready?
That is the direction. Start with risk evidence, cert history, incident history, runtime controls, and audit logs before underwriting integrations.