You ship faster with Claude Code, Cursor, and Copilot — but you're trusting one model. Before you merge, Nomos reviews the change with a different one and hands you a receipt. One command. One key. The AI that wrote it doesn't get to grade its own homework.
Point Nomos at the change you just made. A second, independent model reviews the diff — told to find what's wrong, not to agree — and gives you a verdict. Here it caught an auth bypass a "looks fine" review would wave through:
$ nomos verify -m anthropic/claude-opus-4-8 ▸ reviewing 11 diff lines… ✗ FAIL — if (user.role = "admin") assigns instead of compares. The function now returns true for any user, mutates the input, and bypasses every authorization check. Revert it. ── receipt 14c8680b · independently checked by anthropic/claude-opus-4-8
FAIL exits non-zero — drop it in a pre-commit hook or CI and a bad change can't land. nomos verify --staged checks what you're about to commit; --against main reviews a whole branch.
The model that wrote the code is the worst judge of it — it's confident in exactly the places it's wrong. A different model, told to refute rather than agree, catches the bugs, edge cases, and security holes the first one was blind to. Nomos emits a content-hashed receipt of that review: tamper with the diff, the answer, or the verdict and the id changes.
{ "proposer": { "model": "Cursor", "provider": "external" }, "verifier": { "model": "anthropic/claude-opus-4-8", "verdict": "FAIL" }, "cross_provider": true, "hash": "sha256…" }
It carries no secrets — commit it, attach it to a PR, hand it to a reviewer who wasn't there. Stop merging AI code on faith.
Verify with whatever you already pay for — Claude, GPT, Kimi, GLM, Qwen, Grok, DeepSeek, or a local Ollama. 13 providers. The key stays on your machine, mode 0600.
nomos verify --staged in a git hook. --against main on a PR. A FAIL exits non-zero, so a bad change can't merge. --json for pipelines.
Need it to make the change as well? nomos run is a full coding agent — explore, surgical edits, run your tests — streaming, on the same one key.
One small, readable, sandboxed codebase. npm test runs 25 tests. MIT — fork it.
Node 18+. One key — whatever model you want as the reviewer.