Changelog

What we shipped, when we shipped it.

Public, dated entries. Every shipped feature lands here with the commit and the rationale.

  1. feature

    Vouch-AI offensive agent v0

    TypeScript thin-agent that runs goal-driven attacks against any OpenAI-compatible target. Multi-judge ensemble (deterministic + tool-oracle + LLM judge) with three-budget cost control (turn count, USD cap, token-history budget with Haiku-cheap summarization). 8-skill agent-attack library: indirect prompt injection, RAG poisoning, memory poisoning, MCP exploitation, tool-call hijack, cross-tenant escape, approval bypass, confused deputy. Dashboard launcher + new Pentest tab.

  2. feature

    Scheduled detector worker + dashboard wiring

    BullMQ cron 20 */6 * * * runs silent-failure detectors and behavioral drift checks against the last 24h of ClickHouse traces, persisting findings with audit-log coverage. Detector libraries moved to @langfuse/shared so worker + web consume one source. Run-detectors button + frustration-trajectory sparkline + 3-level intent hierarchy chips wired into the cert panel.

  3. infra

    Production deploy manifests for tryvouch.ai

    deploy/render.yaml ships web + worker + Postgres + Redis with all 7 vouch queue flags wired. deploy/fly.firewall.toml ships firewall.tryvouch.ai (performance-2x, 4 GB RAM for HF models, /readyz health). docs/launch-checklist.md covers DNS, TLS, cert-key generation, JWKS hosting, DB migrations, pre-launch smoke, SEO, security checklist.

  4. feature

    Public scanner + 1-line SDK install

    Free /scan endpoint (10 req/min/IP, 8 KB ceiling) at the marketing root. @vouch/sdk ships with instrumentOpenAI() one-line wrapper, scan(prompt), and guard(prompt, fn). Closes the Benchspan top-of-funnel gap.

  5. feature

    Per-cluster regression + vouchctl CLI

    topicTrafficDeltas ClickHouse query + dashboard ↑/↓ pills track per-intent regression. New @vouch/cli ships vouchctl verify, jwks, cert (public-only, no auth, useful for procurement).

  6. feature

    Silent-failure detectors v0 + Garak probes

    Four agent-failure patterns shipped: promised_action_no_tool_call, tool_error_but_success_message, no_progress_loop, claimed_completion_without_tool. Plus three Garak probe ports.

  7. feature

    Frustration score + intent topics

    Per-trace frustration score from sentiment + escalation signals. Intent topic clustering surfaces in the dashboard.

  8. feature

    Strix-vouch fork plan + Garak overlap doc

    Fork plan for Strix-for-agents (Apache-2.0, 18.5k LOC). 50% build-cost reduction vs from-scratch. Skill library scoped to AI-agent surfaces (RAG, MCP, memory, tool-call hijack, cross-tenant).