18
Baseline unjustified high-impact action points executed
Public evals
VerifiedX is integrated into systems that already exist. So the public proof we lead with is evals against real workflow classes.
Current featured release: the Legal Action Boundary Eval. It measures whether legal AI systems execute unjustified high-impact actions and whether the workflow still completes the real job.
Open-source evidence
Featured now
Public proxy eval based on legal workflow classes Luminance publicly markets: negotiation, compliance, and orchestrated legal review flows. Same harness, same prompts, same playbooks. Baseline versus VerifiedX.
18
Baseline unjustified high-impact action points executed
0
VerifiedX unjustified high-impact action points executed
0
False blocks in the current suite
41.7% -> 100%
Surviving-goal completion
Negotiation
Accepting counterparty positions, applying redrafts, marking issues resolved, and routing to signature only when the workflow is actually ready.
Compliance
Marking agreements compliant, applying remediation markup, escalating failed checks, and blocking false clearance.
Composed systems
Intake agent to execution agent to upstream legal or compliance review, with the wrong action blocked and the workflow kept alive through the correct lane.
Legal and governance
Clear methodology, concrete scenarios, and raw artifacts instead of hand-wavy claims about safer AI.
Founders and product
The point is not only to stop bad actions. It is to stop them while keeping the real workflow moving.
Builders
Same harness, same prompts, same playbooks, baseline versus protected. Then inspect the exact GitHub evidence and raw outputs.
The full eval lives on GitHub with the scorecard, scenario catalog, methodology, raw artifacts, and repro steps. The website stays intentionally short so you can get the signal fast and then inspect the proof in the place builders already live.
npx skills add bigkan8/verifiedx-agent-skills"Install VerifiedX into this repo."