Your agent's seal of
approval at AI speed

One Living Cert score from red-team pass-rate, firewall block-rate, and intent failure-rate. Signed, public, embeddable, and auto-revocable.

Get your Living Cert Watch demo

Cert score

88.6+2.4

Open findings

12-3

Redteam pass-rate

78.3%+4.1%

Findings by source14212 critical · 41 high · 89 other

Cert score over time88.6+2.4 vs prior period

Firewall block-rate92%last 24h

Tool calls / hour83.2kmedian 14ms

Prompt-injection blocks4.7k↑ 28% wow

built for agent stacks

OpenAIAnthropicLangChainLlamaIndexMCPOpenTelemetry

The problem.

Agent autonomy gives your org leverage, but production risk comes with tradeoffs.

The invisible blast radius

Agents inherit tools, prompts, data sources, tenants, and approval paths. Most teams cannot see what an agent can actually touch until something breaks.

The dashboard trap

Trace tools show thousands of runs, but the team still has to find repeated failures, reproduce them, and prove the fix worked.

Stop settling for blind spots.

Assemble, test, and govern agents with a risk layer that keeps up.

Agent Registry · 18 production agents

AgentRiskStatus

refund-agent82watch

support-agent94good

dev-agent68risk

Production-grade inventory

Map agents, models, tools, prompts, data sources, approvals, environments, and tenant boundaries in one graph.

Cluster · refund-agent+17 sessions

tool_error_but_success_message17

promised_action_no_tool_call9

no_progress_loop5

AI-native failure mining

Cluster production failures, explain root cause, generate repro tests, and turn evidence into prioritized findings.

Living Cert · refund-agent

Firewall block-rate92
Redteam pass-rate78
Intent failure85

Control without drag

Run targeted red-team packs, enforce policy checks, and publish living certs your customers can verify.

Begin with an agent permission graph

Vouch discovers what every agent can read, write, delete, message, deploy, or move money through.

toolsdata sourcesapproval gates

refund-agent

issue_refund

customer_db

approval gate

send_email

billing_api

Observe real behavior in production

Trace prompts, retrieved context, model output, tool calls, policy decisions, feedback, escalation, latency, and cost.

OpenTelemetryredactionsessions

Live trace · refund-agent recording

ToolRisk classOutcomeLatency

issue_refundmoneyapproved180ms

send_emailtenantblocked227ms

lookup_userpiiapproved274ms

issue_refundmoneymissing gate321ms

share_docexfilapproved368ms

otel.spanpolicy decision

{
  "tool":   "issue_refund",
  "args":   { "amount": 84.00 },
  "policy": "approval-required",
  "decision": "block",
  "reason":  "identity confidence 0.61"
}

Turn logs into tests and fixes

Repeated failures become findings, repro tests, remediation plans, GitHub issues, and retest evidence.

repro testsissuesPR-ready

finding · F-1142critical

Refund issued before identity verification

Cluster of 17 sessions · suspected schema gap · auto-repro generated.

Cluster identified · 17 sessions
Repro test written · 8/10 stable
PR-ready remediation drafted

vouch-bot wants to merge 1 commit intomain

+ if (!ctx.identity.verified) {
+   return policy.block("identity-required");
+ }
  await tools.issue_refund(input);

tests passcert score +2.4

Stay in control with our open-source risk layer

Don't get locked into someone else's black box. Vouch works with local setup, real traces, live testing, and policy-as-code your team can inspect.

SESSION RECORDING live

tracetrc_8ac9…

agentrefund-agent

toolissue_refundpolicy block

decisionidentity 0.61 < 0.85

agent-policy.ts

// policy.ts
import { ToolCall, PolicyGate, CertScore } from "vouch";

export default function RefundAgent() {
  return (
    <PolicyGate label="Verified identity">
      <ToolCall name="issue_refund" risk="money_movement" />
      <CertScore min={90} />
    </PolicyGate>
  );
}

RiskOps for agents.

Make your security team confident and your product team faster.

Vouch turns repeated production failures into clusters, repro tests, remediation PRs, and a fresh living-cert score — without your team triaging trace tables by hand.

Risk evidence your security team trusts, in a workflow your product team will actually run.

Finding pipeline

Repeated failures become repro tests, remediation PRs, and a fresh cert score.

Running

Cluster identified17 refund-agent sessions promised action without approval gateF-1142 · CRITICAL
Repro test written8 / 10 stable runs · committed to vouch-tests/refund-1142.spec.tsauto-generated
PR-ready remediation draftedAdd identity gate before tools.issue_refund · vouch-bot wants to merge1 commit
Cert score updatedLiving cert · 88.6 → 91.0 after policy gate ships+2.4

In production.

Teams use Vouch to turn agent risk into engineering work.

Support agent

Stop false promises before customers churn

Detect when an agent promises cancellation, refunds, or account changes without the required tool call or approval.

Read the case

Revenue agent

Protect privileged GTM workflows

Score risky automations by data sensitivity, blast radius, exposure, controls, and production frequency.

Read the case

Developer agent

Govern code execution and deployment

Red-team tool abuse, secret access, repo actions, CI/CD paths, and multi-turn social engineering.

Read the case

1 / 3

The valuable output is not another trace table. It is the cluster, the suspected cause, the reproduction test, and the suggested fix.

Agent RiskOps thesisVouch product direction

Any questions?

Stop guessing. Start proving your agents are safe.

Get started Talk to us

Is Vouch just observability?

No. Observability is the evidence stream. Vouch turns traces into inventory, failures, tests, risk scores, policies, and living certificates.

Can Vouch run locally?

Yes. The current local stack runs the app, worker, Postgres, ClickHouse, Redis, MinIO, and firewall API from localhost for testing.

Does it block unsafe actions?

Vouch already has a firewall path. The next production layer is deterministic tool-call policy, approval gates, tenant checks, and audit trails.

What should we build next?

The next high-leverage feature is agent inventory and permission graph. It powers blast-radius scoring, smarter red-team tests, and runtime policy.

Can this become insurance-ready?

That is the direction. Start with risk evidence, cert history, incident history, runtime controls, and audit logs before underwriting integrations.

Your agent's seal ofapproval at AI speed

Agent autonomy gives your org leverage, but production risk comes with tradeoffs.

The invisible blast radius

The dashboard trap

Assemble, test, and govern agents with a risk layer that keeps up.

Production-grade inventory

AI-native failure mining

Control without drag

Begin with an agent permission graph

Observe real behavior in production

Turn logs into tests and fixes

Refund issued before identity verification

Stay in control with our open-source risk layer

Make your security team confident and your product team faster.

Teams use Vouch to turn agent risk into engineering work.

Stop false promises before customers churn

Protect privileged GTM workflows

Govern code execution and deployment

Stop guessing. Start proving your agents are safe.

Your agent's seal of
approval at AI speed