YBacked by YCS26

Your agent's seal of
approval at AI speed

One Living Cert score from red-team pass-rate, firewall block-rate, and intent failure-rate. Signed, public, embeddable, and auto-revocable.

Analytics
A
Cert score
88.6+2.4
Open findings
12-3
Redteam pass-rate
78.3%+4.1%
Findings by source14212 critical · 41 high · 89 other
50382513012:0012:3013:0013:3014:0014:3015:0015:3016:00redteamintentfirewall
Cert score over time88.6+2.4 vs prior period
928578
Firewall block-rate92%last 24h
Tool calls / hour83.2kmedian 14ms
Prompt-injection blocks4.7k↑ 28% wow
built for agent stacks
OpenAIAnthropicLangChainLlamaIndexMCPOpenTelemetry

Agent autonomy gives your org leverage, but production risk comes with tradeoffs.

The invisible blast radius

Agents inherit tools, prompts, data sources, tenants, and approval paths. Most teams cannot see what an agent can actually touch until something breaks.

The dashboard trap

Trace tools show thousands of runs, but the team still has to find repeated failures, reproduce them, and prove the fix worked.

Assemble, test, and govern agents with a risk layer that keeps up.

Agent Registry · 18 production agents
AgentRiskStatus
refund-agent82watch
support-agent94good
dev-agent68risk

Production-grade inventory

Map agents, models, tools, prompts, data sources, approvals, environments, and tenant boundaries in one graph.

Cluster · refund-agent+17 sessions
tool_error_but_success_message17
promised_action_no_tool_call9
no_progress_loop5

AI-native failure mining

Cluster production failures, explain root cause, generate repro tests, and turn evidence into prioritized findings.

Living Cert · refund-agent
  • Firewall block-rate92
  • Redteam pass-rate78
  • Intent failure85

Control without drag

Run targeted red-team packs, enforce policy checks, and publish living certs your customers can verify.

01

Begin with an agent permission graph

Vouch discovers what every agent can read, write, delete, message, deploy, or move money through.

toolsdata sourcesapproval gates
refund-agent
issue_refund
customer_db
approval gate
send_email
billing_api
02

Observe real behavior in production

Trace prompts, retrieved context, model output, tool calls, policy decisions, feedback, escalation, latency, and cost.

OpenTelemetryredactionsessions
Live trace · refund-agent recording
ToolRisk classOutcomeLatency
issue_refundmoneyapproved180ms
send_emailtenantblocked227ms
lookup_userpiiapproved274ms
issue_refundmoneymissing gate321ms
share_docexfilapproved368ms
otel.spanpolicy decision
{
  "tool":   "issue_refund",
  "args":   { "amount": 84.00 },
  "policy": "approval-required",
  "decision": "block",
  "reason":  "identity confidence 0.61"
}
03

Turn logs into tests and fixes

Repeated failures become findings, repro tests, remediation plans, GitHub issues, and retest evidence.

repro testsissuesPR-ready
finding · F-1142critical

Refund issued before identity verification

Cluster of 17 sessions · suspected schema gap · auto-repro generated.

  1. Cluster identified · 17 sessions
  2. Repro test written · 8/10 stable
  3. PR-ready remediation drafted
vouch-bot wants to merge 1 commit intomain
+ if (!ctx.identity.verified) {
+   return policy.block("identity-required");
+ }
  await tools.issue_refund(input);
tests passcert score +2.4
03

Stay in control with our open-source risk layer

Don't get locked into someone else's black box. Vouch works with local setup, real traces, live testing, and policy-as-code your team can inspect.

SESSION RECORDING live
tracetrc_8ac9…
agentrefund-agent
toolissue_refundpolicy block
decisionidentity 0.61 < 0.85
agent-policy.ts
// policy.ts
import { ToolCall, PolicyGate, CertScore } from "vouch";

export default function RefundAgent() {
  return (
    <PolicyGate label="Verified identity">
      <ToolCall name="issue_refund" risk="money_movement" />
      <CertScore min={90} />
    </PolicyGate>
  );
}

Make your security team confident and your product team faster.

Vouch turns repeated production failures into clusters, repro tests, remediation PRs, and a fresh living-cert score — without your team triaging trace tables by hand.

Risk evidence your security team trusts, in a workflow your product team will actually run.

Finding pipeline

Repeated failures become repro tests, remediation PRs, and a fresh cert score.

Running
  1. Cluster identified17 refund-agent sessions promised action without approval gateF-1142 · CRITICAL
  2. Repro test written8 / 10 stable runs · committed to vouch-tests/refund-1142.spec.tsauto-generated
  3. PR-ready remediation draftedAdd identity gate before tools.issue_refund · vouch-bot wants to merge1 commit
  4. Cert score updatedLiving cert · 88.6 → 91.0 after policy gate ships+2.4

Teams use Vouch to turn agent risk into engineering work.

Support agent

Stop false promises before customers churn

Detect when an agent promises cancellation, refunds, or account changes without the required tool call or approval.

Read the case
Revenue agent

Protect privileged GTM workflows

Score risky automations by data sensitivity, blast radius, exposure, controls, and production frequency.

Read the case
Developer agent

Govern code execution and deployment

Red-team tool abuse, secret access, repo actions, CI/CD paths, and multi-turn social engineering.

Read the case
1 / 3
The valuable output is not another trace table. It is the cluster, the suspected cause, the reproduction test, and the suggested fix.
Agent RiskOps thesisVouch product direction

Stop guessing. Start proving your agents are safe.

Is Vouch just observability?

No. Observability is the evidence stream. Vouch turns traces into inventory, failures, tests, risk scores, policies, and living certificates.

Can Vouch run locally?

Yes. The current local stack runs the app, worker, Postgres, ClickHouse, Redis, MinIO, and firewall API from localhost for testing.

Does it block unsafe actions?

Vouch already has a firewall path. The next production layer is deterministic tool-call policy, approval gates, tenant checks, and audit trails.

What should we build next?

The next high-leverage feature is agent inventory and permission graph. It powers blast-radius scoring, smarter red-team tests, and runtime policy.

Can this become insurance-ready?

That is the direction. Start with risk evidence, cert history, incident history, runtime controls, and audit logs before underwriting integrations.