🔥 Comin' in Hot! Shipping multiple unstable releases per day at the moment. If you want prove_it to actually work, email Justin for updates 🛬🔥
If you experience errors after an upgrade, reset your setup with prove_it reinstall && prove_it reinit.
By far the most frustrating thing about Claude Code is its penchant for prematurely declaring success. Out-of-the-box, Claude will happily announce a task is complete. But has it run the tests? No. Did it add any tests? No. Did it run the code? Also no.
prove_it hooks into Claude Code's lifecycle events and runs whatever tasks you configure—test suites, lint scripts, AI code reviewers—blocking Claude until they pass.
(And in case it's not obvious, prove_it currently only works with Claude Code.)
brew install searlsco/tap/prove_it
prove_it install
cd your-project && prove_it initRestart Claude Code and you're live.
prove_it is a config-driven framework for enforcing quality in Claude Code sessions. You can easily configure script and subagent tasks in a few lines of JSON that:
- Block Claude from stopping until your tests pass
- Block git commits until a full test suite is green
- Run AI reviewers — independent subagents that audit Claude's work for coverage gaps, logic errors, or security issues
- Fire reviews asynchronously — expensive reviewers run in the background while Claude keeps working, then enforce their verdict on the next stop
- Gate tasks on signals — heavyweight checks fire only when Claude declares a unit of work complete (
prove_it signal done), or when Claude gets caught in a doom loop (prove_it signal stuck --message "can't figure out Liquid Glass") - Gate tasks on churn — reviews trigger after N lines changed (net git diff) or N lines written (gross, catches thrashing)
- Inject context on session start — briefs your agent on what prove_it will inspect for and when, along with instructions on how to use it
- Guard tool usage — block specific tool calls (config file edits, dangerous commands) before they execute
- Track runs — skip re-running tasks when code hasn't changed since the last pass (via
when: { sourcesModifiedSinceLastRun: true })
Out of the box, prove_it init generates the Searls-stack of configured tasks:
- Session briefing on startup — Claude gets an orientation showing active tasks, signal instructions, and how the review process works
- Config lock on every edit — silently blocks Claude from modifying your prove_it config
- TDD enforcement on every edit — tracks the red-green cycle and nudges Claude to write a failing test before writing source code. Adapts behavior based on the current session phase (see Session phases).
- TDD guidance in plans — injects a red-green TDD development approach section into every plan Claude creates
- Fast tests on every stop — runs
./script/test_fastand blocks until it passes - Full tests on signal — runs
./script/testwhen Claude signals done (and source files were edited) - Async coverage review — a Haiku-powered
prove-coveragesubagent fires in the background after 541+ net lines of churn, enforced on the next stop - Done review on signal — an Opus-powered
prove-donesubagent runs a thorough pre-ship review when Claude signals done - Approach review on signal — a Sonnet-powered
prove-approachsubagent runs when Claude signals stuck, surfacing alternative approaches - Full tests on git commit — pre-commit hook runs
./script/test(Claude commits only — human commits pass through)
Every one of these is a config entry you can change, disable, or replace. The framework supports any combination of lifecycle events, conditions, and task types — the default config is just a starting point.
# Install the CLI
brew install searlsco/tap/prove_it
# Register prove_it hooks in ~/.claude/settings.json
prove_it installcd your-project
prove_it initThis interactively sets up .claude/prove_it/config.json, creates script/test and script/test_fast stubs if you don't have them, installs git hooks, and generates a starter .claude/rules/testing.md. Restart Claude Code and you're live.
Pass flags to skip prompts (useful for CI or scripting):
prove_it init --git-hooks --default-checks| Flag | Default | Effect |
|---|---|---|
--[no-]git-hooks |
on | Install git pre-commit/pre-push hooks |
--[no-]default-checks |
on | Include AI coverage review, pre-ship review |
--[no-]automatic-git-hook-merge |
off | Merge with existing git hooks (fails if hooks exist) |
--[no-]overwrite |
— | Overwrite customized config with current defaults |
By default, prove_it looks for two test scripts by convention:
| Script | Purpose | When it runs |
|---|---|---|
script/test |
Full test suite (units, integration, linters, etc.) | Before every git commit |
script/test_fast |
Fast unit tests only | Every time Claude stops work |
For example, your script/test_fast script might run:
#!/usr/bin/env bash
set -e
trap 'rc=$?; command -v prove_it >/dev/null 2>&1 && prove_it record --name fast-tests --result $rc' EXIT
rake testAnd your full script/test command will probably run that and more:
#!/usr/bin/env bash
set -e
trap 'rc=$?; command -v prove_it >/dev/null 2>&1 && prove_it record --name full-tests --result $rc' EXIT
rake test standard:fix test:systemThe trap ... EXIT pattern ensures results are always recorded, even when set -e causes early exit. prove_it uses this to skip re-running tests when code hasn't changed.
prove_it record options:
--result <N>—record pass (N=0) or fail (N!=0), exit with code N (best for traps)--pass/--fail—record explicitly (exit 0 / exit 1)--name <task>—must match the task name in your config
prove_it is configured with a hooks object in .claude/prove_it/config.json. Hooks are keyed by type (claude, git) then by event (Stop, PreToolUse, SessionStart, pre-commit, pre-push, etc.), with each event mapping to an ordered list of tasks:
{
"enabled": true,
"sources": ["src/**/*.js", "lib/**/*.js", "test/**/*.js"],
"tests": ["test/**/*.test.js"],
"hooks": {
"claude": {
"Stop": [
{ "name": "fast-tests", "type": "script", "command": "./script/test_fast" },
{ "name": "coverage-review", "type": "agent", "prompt": "Check coverage...\n\n{{session_diff}}" }
]
}
}
}Config files merge (later overrides earlier):
~/.claude/prove_it/config.json—global defaults.claude/prove_it/config.json—project config (commit this).claude/prove_it/config.local.json—local overrides (gitignored, per-developer)
Hooks merge by task name: a task in a descendant config with the same name as one in an ancestor fully replaces the ancestor's task; a task with a new name is appended. Global tasks run first (most general to most specific). Other array fields (like sources) replace rather than merge.
sources defines which files prove_it considers "your code" — these globs drive conditions like sourcesModifiedSinceLastRun, sourceFilesEdited, and linesChanged. Test files should be included in sources so that edits to tests are tracked as source changes.
tests identifies which source files are test files. This drives the test-first check, which enforces red-green TDD by tracking whether Claude writes and runs failing tests before implementing source code. See Session phases for how enforcement varies by activity. tests is typically a subset of sources — it doesn't need to be disjoint.
Both sources and tests are preserved across prove_it init / reinit when customized, so you won't lose your globs on upgrade.
Claude events:
| Event | Purpose | Behavior |
|---|---|---|
SessionStart |
Environment setup, injecting context | Non-blocking. All tasks run. Output is injected into Claude's context. Use this to inject prompts, announce project state, set environment variables, or run setup scripts. |
PreToolUse |
Guarding tool usage | Blocking, fail-fast. Tasks run in order; the first failure denies the tool and stops. Use this for config protection, enforcing workflows, or vetting commands. |
Stop |
Verifying completed work | Blocking, fail-fast. Tasks run in order; the first failure sends Claude back to fix it. Put cheap tasks first (test suite), expensive ones last (AI reviewer). Async results are harvested before sync tasks run. |
PostToolUse |
Observing tool results | Non-blocking. Fires after a tool succeeds. Used by TDD enforcement to detect test passes. Matcher filters by tool name. |
PostToolUseFailure |
Observing tool failures | Non-blocking. Fires after a tool fails. Used by TDD enforcement to detect test failures. Matcher filters by tool name. |
Git events:
| Event | Purpose | Behavior |
|---|---|---|
pre-commit |
Validating before commit | Blocking, fail-fast. Runs only under Claude Code (CLAUDECODE env var)—human commits pass through instantly. |
pre-push |
Validating before push | Blocking, fail-fast. Same as pre-commit but triggers on push. |
script—runs a shell command, fails on non-zero exitagent—sends a prompt to an AI reviewer, expects PASS/FAIL response (see Agent tasks)env—runs a command that outputs environment variables, injected into Claude's session (SessionStart only, see Env tasks)
Script tasks accept a params object that is passed to the script as input.params in the stdin JSON payload:
{
"name": "lock-config",
"type": "script",
"command": "$(prove_it prefix)/libexec/guard-config",
"quiet": true,
"params": {
"paths": [".claude/prove_it/config.json", ".claude/prove_it/config.local.json"]
}
}Scripts read params from the parsed stdin JSON alongside tool_name, tool_input, etc. This is a generic mechanism—any script can use params to accept structured configuration without inventing CLI arg parsing.
guard-config with custom paths: The built-in guard-config script uses params.paths to decide which file paths to block. Add your own paths to guard additional files:
{
"name": "lock-config",
"type": "script",
"command": "$(prove_it prefix)/libexec/guard-config",
"quiet": true,
"params": {
"paths": [".claude/prove_it/config.json", ".claude/prove_it/config.local.json", ".env", "credentials/**"]
}
}When params.paths is omitted, guard-config falls back to blocking prove_it config files by default (backward compatible).
Tasks can include a briefing field — a string that's injected into every SessionStart orientation. This lets infrastructure tasks (like TDD enforcement) provide persistent guidance without being SessionStart tasks themselves.
{
"name": "my-task",
"type": "script",
"command": "./my-script",
"briefing": "Remember to frobnicate the widgets before shipping."
}Set enabled: false on a task to skip it without removing it from config:
{ "name": "slow-review", "type": "agent", "prompt": "prove-coverage",
"promptType": "skill", "enabled": false }Disabled tasks are logged as SKIP with reason "Disabled".
Set quiet: true on a task to suppress all log output except failures:
{ "name": "lock-config", "type": "script", "command": "$(prove_it prefix)/libexec/guard-config", "quiet": true }Quiet tasks don't emit SKIP or PASS entries to the session log. FAIL and BOOM entries are always logged. This is useful for high-frequency guards (like config:lock on every PreToolUse) that would otherwise flood the monitor.
Set timeout (in milliseconds) to limit how long a task can run:
{ "name": "slow-tests", "type": "script", "command": "./script/test", "timeout": 300000 }Tasks have no timeout by default — they run until completion. Set an explicit timeout if you need to guard against runaway processes.
PreToolUse tasks can filter by tool name and command patterns using matcher and triggers on individual tasks:
{
"hooks": {
"claude": {
"PreToolUse": [
{
"name": "guard-commits",
"type": "script",
"command": "./script/check",
"matcher": "Bash",
"triggers": ["(^|\\s)git\\s+commit\\b"]
}
]
}
}
}matcher filters by Claude's tool name (Edit, Write, Bash, etc.). triggers are regex patterns matched against the tool's command argument. Both are optional—omit them to run on every PreToolUse.
Tasks can declare conditions that must be met before they run. This is how you gate expensive reviews on churn thresholds, signal states, or environmental requirements.
{ "name": "my-check", "type": "script", "command": "./script/check",
"when": { "fileExists": ".config" } }Object form — AND. When when is an object, every condition must pass:
{ "when": { "envSet": "CLAUDECODE", "linesChanged": 500 } }Both envSet AND linesChanged must be true. If either fails, the task is skipped.
Array form — OR of ANDs. When when is an array, each element is AND'd internally and any element passing fires the task:
{
"name": "coverage-review",
"type": "agent",
"prompt": "prove-coverage",
"promptType": "skill",
"when": [
{ "envSet": "CLAUDECODE", "linesChanged": 500 },
{ "envSet": "CLAUDECODE", "linesWritten": 1000 }
]
}The env var must be set in both clauses, but either churn threshold firing is enough to run the review. This is the MongoDB/CSS-selector pattern.
| Condition | Type | Description |
|---|---|---|
fileExists |
string | Passes when file exists relative to project root |
envSet |
string | Passes when environment variable is set |
envNotSet |
string | Passes when environment variable is not set |
variablesPresent |
string[] | Passes when all listed template variables resolve to non-empty values |
signal |
string | Passes when the named signal (done, stuck, idle) is active for the current session |
linesChanged |
number | Passes when at least N source lines have changed (additions + deletions) since the task last ran. Git-based—works in both Claude hooks and git hooks. |
linesWritten |
number | Passes when at least N gross lines have been written by the agent since the task last ran. Catches thrashing. Claude Code sessions only. |
sourcesModifiedSinceLastRun |
boolean | Passes when source file mtimes are newer than the last successful run. Works for any task type (script, agent, env). The dispatcher records run data on pass; failures are never cached so the task re-fires until it passes. Tasks without this condition always run (no implicit caching). |
sourceFilesEdited |
boolean | Passes when source files were edited this turn (session-scoped, tool-agnostic). Works on PreToolUse, PostToolUse, and Stop. |
testFilesEdited |
boolean | Passes when test files were edited this turn (session-scoped, tool-agnostic). Works on PreToolUse, PostToolUse, and Stop. |
toolsUsed |
string[] | Passes when any of the listed tools were used this turn |
Each task using linesChanged stores a git ref at refs/worktree/prove_it/<task-name>. When the condition is evaluated, prove_it diffs the ref against the working tree (not just HEAD), filtered to your configured sources globs, summing additions and deletions. This means committed, staged, unstaged, and newly-created file changes all count—so Write/Edit tool calls trigger churn immediately without needing a commit. On first run the ref is created at HEAD (bootstrap—returns 0 if the working tree is clean). This is session-independent and worktree-safe. Refs are cleaned up by prove_it deinit.
When a task passes or resets, the ref advances to a snapshot of the current working tree state (including untracked source files). This ensures all pending changes are captured—advancing to HEAD alone would be a no-op when churn comes from uncommitted Write/Edit operations.
resetOnFail behavior: When a task fails, the ref advancement depends on the hook event:
- PreToolUse (default
resetOnFail: true): The ref advances on failure. Without this, the task deadlocks—it blocks every Write/Edit, including writes to test files that would fix the issue. - Stop / git hooks (default
resetOnFail: false): The ref does NOT advance. The agent gets sent back to fix the issue, and the same accumulated churn keeps triggering the review. - You can override the default with an explicit
resetOnFail: trueorresetOnFail: falseon the task.
While linesChanged measures net drift (git diff: what changed on disk), linesWritten measures gross activity (total lines the agent has written). This catches a different failure mode: thrashing. An agent that writes 500 lines, deletes them, rewrites them differently, and deletes again has written 2000 gross lines but may show 0 net churn. The gross counter catches this.
Gross churn accumulates on every successful PreToolUse for Write/Edit/NotebookEdit to source files. Lines are counted from the tool input (no file I/O needed). The counter is stored as a git blob under refs/worktree/prove_it/__gross_lines, with per-task snapshots under <task>.__gross_lines. Increment uses compare-and-swap for multi-agent safety—concurrent agents can't lose each other's counts.
resetOnFail follows the same rules as linesChanged.
sourceFilesEdited, testFilesEdited, and toolsUsed are session-scoped: they track which tools and files each Claude Code session uses, per-turn. After a successful Stop, the tracking resets so the next Stop only fires if new edits occur.
These conditions solve cross-session bleed—unlike sourcesModifiedSinceLastRun (which uses global file timestamps), session-scoped conditions ensure Session A's edits don't trigger Session B's reviewers.
sourceFilesEdited: true—gates a task on source file edits in the current turn:
{
"name": "my-review",
"type": "agent",
"prompt": "Review the changes...",
"when": { "sourceFilesEdited": true }
}testFilesEdited: true—gates a task on test file edits (matched against tests globs):
{
"name": "test-integrity",
"type": "agent",
"prompt": "Review test changes...",
"when": { "testFilesEdited": true }
}toolsUsed: ["XcodeEdit", "Edit"]—gates a task on specific tools being used:
{
"name": "xcode-review",
"type": "agent",
"prompt": "Review Xcode changes...",
"when": { "toolsUsed": ["XcodeEdit"] }
}Signals let the agent declare where it is in a work cycle. The agent runs prove_it signal done (or stuck, idle) and tasks gated with when: { signal: "done" } fire on the next Stop. This is useful for heavyweight checks you only want at the end of a coherent unit of work rather than every Stop.
PreToolUse intercepts the prove_it signal command automatically—no extra config needed.
Clear-on-pass / preserve-on-fail: After a successful Stop (all tasks pass), the active signal is cleared automatically. After a failed Stop, the signal is preserved so the gated tasks re-fire until they pass. This means you signal once, and the heavy checks keep running until everything is clean.
{
"name": "full-tests",
"type": "script",
"command": "./script/test",
"when": { "signal": "done" }
}Signal commands:
prove_it signal done Declare coherent work complete
prove_it signal stuck Declare stuck / cycling
prove_it signal idle Declare idle / between tasks
prove_it signal done -m "Ready for review" Include a message
Agent tasks spawn a separate AI process to review Claude's work with an independent PASS/FAIL verdict. This is useful because the reviewing agent has no stake in the code it's judging.
By default, agent tasks use claude -p (Claude Code in pipe mode). The reviewer receives a wrapped prompt and must respond with PASS, FAIL, or SKIP.
{
"name": "my-review",
"type": "agent",
"prompt": "Review recent changes for:\n1. Test coverage gaps\n2. Logic errors or edge cases\n3. Dead code\n\n{{files_changed_since_last_run}}\n\n{{recent_commits}}\n\n{{git_status}}"
}These expand in agent prompts:
| Variable | Contents |
|---|---|
{{staged_diff}} |
git diff --cached (staged changes) |
{{staged_files}} |
git diff --cached --name-only |
{{working_diff}} |
git diff (unstaged changes) |
{{changed_files}} |
git diff --name-only HEAD |
{{session_diff}} |
All changes since session baseline (uses Claude Code file-history, falls back to git diff scoped to tracked files) |
{{test_output}} |
Output from the most recent script check |
{{tool_command}} |
The command Claude is trying to run |
{{file_path}} |
The file Claude is trying to edit |
{{project_dir}} |
Project directory |
{{root_dir}} |
Git root directory (may differ from project_dir in monorepos) |
{{session_id}} |
Current Claude Code session ID |
{{git_head}} |
Current HEAD commit SHA |
{{git_status}} |
git status --short (staged/modified/untracked files) |
{{recent_commits}} |
git log --oneline --stat -5 (last 5 commits with file stats) |
{{files_changed_since_last_run}} |
Source files changed since this task's last run (sorted by recency; uses task ref → session baseline → HEAD cascade) |
{{sources}} |
Configured source globs (one per line) |
{{signal_message}} |
Message from the active signal (e.g., from prove_it signal done -m "message") |
{{changes_since_last_run}} |
git diff --stat since this task's last run (uses task ref → session baseline → HEAD cascade) |
Conditional blocks are supported: {{#var}}content{{/var}} renders only when the variable is non-empty.
prove_it ships curated reviewer prompts as Claude Code skills. Reference them in your config with promptType: "skill":
{ "type": "agent", "promptType": "skill", "prompt": "prove-coverage" }| Skill | What it reviews |
|---|---|
prove-approach |
Approach viability: detects cognitive fixation, performs root-cause analysis, and surfaces structurally different alternatives. Designed for Sonnet. |
prove-coverage |
Session diffs for test coverage adequacy |
prove-done |
Thorough pre-ship review: correctness, integration, security, tests, omissions. Uses {{changes_since_last_run}} for scope. Designed for Opus. |
prove-dry |
Codebase-wide duplication review: finds same-behavior implementations and prescribes EXTRACT refactors. Default PASS. |
prove-test-validity |
Test quality review: catches tests that give false confidence (tautological assertions, closed-loop validation, excessive mocking, etc.). Designed for Opus. |
Skills are installed to ~/.claude/skills/<name>/SKILL.md by prove_it install. The prompt body is the skill file with its YAML frontmatter stripped.
Agent tasks accept a ruleFile field that injects the contents of a project-specific rule file into the reviewer prompt. This lets you define testing standards once and apply them to every reviewer:
{
"name": "coverage-review",
"type": "agent",
"prompt": "prove-coverage",
"promptType": "skill",
"ruleFile": ".claude/rules/testing.md"
}The path is resolved relative to the project directory. If the file is missing, the task fails with a clear error—this is intentional so you don't silently run reviews without your rules.
prove_it init generates a default .claude/rules/testing.md with starter rules and a TODO for you to customize. The default agent tasks (coverage-review, done-review) both point to this file.
Agent tasks accept a model field to control which model the reviewer uses:
{ "name": "coverage-review", "type": "agent",
"prompt": "Check test coverage...\n\n{{session_diff}}", "model": "haiku" }For OpenAI/codex models (names starting with gpt-), prove_it auto-switches to codex exec -:
{ "name": "adversarial-review", "type": "agent",
"prompt": "Review this code for bugs...\n\n{{staged_diff}}", "model": "gpt-5.3-codex" }When no model is set and no custom command is provided, prove_it applies defaults:
| Event | Default model | Rationale |
|---|---|---|
| PreToolUse | haiku |
Latency-sensitive gate check |
| Stop | haiku |
Latency-sensitive review |
| pre-commit | sonnet |
Thoroughness matters more |
| pre-push | sonnet |
Thoroughness matters more |
You can also set a top-level model in config to apply a default across all agent tasks. An explicit model on a task always wins. Setting a custom command disables default model selection entirely.
You can use a different AI for each reviewer, so the agent doing the work is checked by a competing model:
{
"name": "commit-review",
"type": "agent",
"prompt": "Review staged changes for bugs and missing tests.\n\n{{staged_diff}}"
},
{
"name": "adversarial-review",
"type": "agent",
"command": "codex exec -",
"prompt": "Second opinion: look for issues the primary reviewer might miss.\n\n{{staged_diff}}"
}The command field accepts any CLI that reads a prompt from stdin and writes its response to stdout. Defaults to claude -p.
Set async: true on an agent task to run it in the background:
{
"name": "coverage-review",
"type": "agent",
"async": true,
"promptType": "skill",
"prompt": "prove-coverage",
"model": "haiku",
"when": { "linesChanged": 541 }
}Async tasks spawn a detached child process and return immediately, so they don't block Claude from continuing work. The lifecycle is:
- Spawn—prove_it forks a worker and lets the Stop pass
- Run—the worker runs the reviewer in the background (RUNNING → PASS/FAIL/SKIP)
- Done—the worker writes its result and logs DONE
- Harvest—on the next Stop, prove_it reads all pending results before running sync tasks
- Enforce—results are settled: ENFORCED:PASS lets the stop continue, a FAIL blocks just like a sync failure
This means an async FAIL blocks Claude on the next stop, not the current one. The default config uses async: true for the coverage reviewer.
async has no effect on SessionStart (which never blocks). PreToolUse tasks can technically be async, but the usefulness is limited since they run on every tool call.
Set parallel: true on a task to fork it immediately and await it at the end of the current hook invocation:
{
"name": "full-tests",
"type": "script",
"command": "./script/test",
"parallel": true,
"when": { "signal": "done" }
}Parallel tasks run concurrently with each other and with subsequent serial tasks in the same Stop invocation. The dispatcher forks each parallel task as a child process, continues walking the task list, and awaits all parallel children after the loop completes. This cuts wall-clock time roughly in half when you have multiple independent heavyweight tasks (e.g., a full test suite and an AI reviewer).
Parallel vs async:
parallel: true |
async: true |
|
|---|---|---|
| When | Fork now, await this invocation | Fork now, fire-and-forget |
| Enforcement | Blocks this Stop if task fails | Blocks the next Stop |
| Use case | Independent heavyweight tasks that must pass before Claude continues | Background reviews that can enforce later |
parallel and async are mutually exclusive—setting both is a validation error. parallel has no effect on SessionStart (which never blocks). On serial task failure mid-loop, all parallel children are killed immediately.
The default config uses parallel: true for full-tests and done-review.
When an agent reviewer FAILs, prove_it creates a backchannel directory where Claude can appeal the decision:
.claude/prove_it/sessions/<session-id>/backchannel/<task-name>/README.md
The README is pre-populated with the failure reason and instructions. Claude can write a response explaining why the failure doesn't apply (planning work, code isn't theirs, changes are unrelated). On the next review cycle, the reviewer reads the backchannel content before rendering its verdict.
When a reviewer PASSes or SKIPs, the backchannel is cleaned up automatically.
Env tasks run a command during SessionStart and inject the output as environment variables into Claude Code's session. They only run on startup and resume (not after /clear or compaction, where the environment is already set).
{
"type": "claude",
"event": "SessionStart",
"tasks": [
{ "name": "load-env", "type": "env", "command": "./script/load_env.sh" }
]
}The command's stdout is parsed as environment variables. Three output formats work:
# .env format
API_KEY=abc123
DEBUG=true
# export format
export API_KEY=abc123
export DEBUG="true"{"API_KEY": "abc123", "DEBUG": "true"}
Multiple env tasks merge in order—later tasks override earlier ones for the same key. If the command fails or output can't be parsed, the error is reported and execution continues.
prove_it ships standalone scripts in libexec/ for common infrastructure tasks:
| Script | What it does |
|---|---|
libexec/guard-config |
Blocks writes to guarded file paths. Uses params.paths (glob patterns) from stdin to determine which paths to block. Falls back to hardcoded prove_it config patterns when params.paths is absent. |
libexec/briefing |
Renders a session orientation on SessionStart: active tasks, signal instructions, review process overview. |
Configure them as type: "script" tasks with command: "$(prove_it prefix)/libexec/<name>". The $(prove_it prefix) subshell resolves to prove_it's install directory, so the scripts work regardless of where prove_it is installed. Reviewer prompts are distributed as skills (see Skill-based prompts).
On every SessionStart, the libexec/briefing script renders an orientation that's injected into Claude's context. It shows:
- Active tasks by event—what runs on Stop, PreToolUse, git commit, etc.
- Signal instructions—if any tasks are gated on signals, Claude gets explicit instructions to run
prove_it signal donewhen a unit of work is complete - Review process—how FAIL verdicts work, how to use the backchannel to appeal, and that a supervisory process audits appeals
The briefing is generated from your effective config, so it always reflects your actual setup. It filters out the briefing task itself to avoid recursion. If rendering fails, the session continues (briefing failure never blocks).
prove_it adapts its TDD enforcement based on what Claude is doing. Four phases control the behavior:
| Phase | What Claude is doing | TDD enforcement |
|---|---|---|
unknown |
Default — no phase declared | Full red-green TDD (same as implement) |
plan |
Designing an approach, not writing code | No enforcement — planning doesn't need tests |
implement |
Writing new features or fixing bugs | Full red-green TDD: write a failing test → confirm failure → write code → confirm pass |
refactor |
Restructuring existing code | Run the test suite regularly — existing tests are the safety net |
Claude switches phases by running:
prove_it phase implement
prove_it phase refactor
prove_it phase plan
In implement mode (and unknown), prove_it tracks a red-green cycle:
- Write a test — prove_it expects a test file edit before source code edits
- Run the test, confirm it fails — proves the test actually tests something
- Write the code — make the test pass
- Run the test, confirm it passes — proves the implementation works
If Claude edits source files without writing tests first, prove_it nudges after a configurable number of edits (default: 3). If Claude writes a test that passes without any source changes, prove_it warns that the test may be vacuous.
In refactor mode, the expectation is simpler: run the existing test suite regularly. If tests fail during a refactor, prove_it warns that behavior may have changed unintentionally.
In plan mode, there's no enforcement — Claude is designing, not coding.
When any task has a briefing field, its text is included in every SessionStart
orientation. The default inject-tdd-plan task uses this to remind Claude of the
TDD workflow regardless of which phase is active.
Run in a separate terminal to watch hook results in real time:
prove_it monitor
Session: ea0da8e4 | /Users/justin/code/searls/sugoi_tv | started 02/13/2026, 08:53
09:00:48 BOOM coverage-review Unexpected reviewer output: Based on my investigation…
09:00:52 PASS fast-tests ./script/test_fast passed (2.3s)
09:01:12 SKIP fast-tests cached pass (no code changes)
09:14:33 PASS commit-review All changes look correct and well-tested.
watching for new entries… (ctrl-c to stop)
prove_it monitor # tail most recent session
prove_it monitor --all # tail all sessions and project logs
prove_it monitor <id> # tail a specific session (prefix match OK)
| Flag | Effect |
|---|---|
--project |
Scope to current project directory. Finds all sessions and project logs for this repo. |
--project=/path/to/repo |
Scope to a specific project directory |
--verbose |
Show full reviewer prompts, responses, and script output in box-drawn blocks |
--sessions |
Show session ID prefix on each line (useful with --all) |
--status=FAIL,BOOM |
Filter to specific status codes (comma-separated) |
--list |
List all sessions with summary info instead of tailing |
| Code | Meaning |
|---|---|
PASS |
Task passed |
FAIL |
Task failed (blocks the action) |
SKIP |
Task skipped (condition not met, disabled, cached, or reviewer said SKIP) |
BOOM |
Task crashed (unexpected error—treated as a soft skip unless model is explicitly set) |
EXEC |
Task is executing |
DONE |
Async review complete, waiting for Stop hook to enforce |
ENFORCED:PASS |
Async result was harvested and settled as pass |
ENFORCED:SKIP |
Async result was harvested and settled as skip |
PLEA |
Developer wrote a backchannel appeal before this review cycle |
SET |
Signal was set (prove_it signal done/stuck/idle) |
CLEAR |
Signal was auto-cleared after successful Stop |
prove_it installs a Claude Code skill
called /prove—evidence-based verification that forces Claude to actually
run the thing and show you the results.
Invoke it with /prove <claim> (e.g., /prove the search API handles pagination). If you just type /prove with uncommitted changes, it'll prove
those changes work. Claude will:
- State what it's trying to prove and what "working" looks like
- Show evidence it works—commands, output, artifacts
- Show evidence it might not work—edge cases, error paths, things it tried to break
- Give its honest judgment—ready to ship, or what needs to change
The skill is installed to ~/.claude/skills/prove/SKILL.md and updated on
every prove_it install.
prove_it ships review prompts that can be run manually or automatically:
| Skill | What it reviews | Designed for |
|---|---|---|
/prove-approach |
Approach viability: detects cognitive fixation, surfaces structurally different alternatives | Sonnet (balanced) |
/prove-coverage |
Test coverage adequacy for changed code | Haiku (fast, cheap) |
/prove-done |
Pre-ship review: correctness, integration, security, tests, omissions | Opus (thorough) |
/prove-dry |
Codebase-wide duplication: finds same-behavior implementations, prescribes extractions | Opus (thorough) |
/prove-test-validity |
Test quality: catches tests that give false confidence (tautological assertions, closed-loop validation, excessive mocking) | Opus (thorough) |
Run manually — invoke any skill as a slash command whenever you want a review. All run as subagents (context: fork), so they don't consume your conversation context.
Run automatically — configure the same prompts as prove_it agent tasks and they'll fire on lifecycle events. The default config does this: prove-coverage runs async after churn thresholds are hit, prove-done runs on prove_it signal done, and prove-approach runs on prove_it signal stuck. prove-test-validity and prove-dry are not in the default config — add them when you want test quality or duplication gating. See Skill-based prompts for config details.
The manual and automatic paths use the same prompt — the difference is who triggers it (you vs. prove_it) and where it runs (Claude Code subagent vs. claude -p subprocess). Both produce an independent review outside the working agent's context.
When prove_it spawns reviewer subagents or runs script tasks, other hooks installed in your environment (like turbocommit) may fire inside those subprocesses. Use the top-level taskEnv field to set environment variables across all prove_it subprocesses:
{
"taskEnv": {
"TURBOCOMMIT_DISABLED": "1"
},
"hooks": { "claude": { "Stop": ["..."] } }
}These variables are merged into the environment of both script tasks and agent reviewer subprocesses. prove_it forces PROVE_IT_DISABLED and PROVE_IT_SKIP_NOTIFY in all subprocesses to prevent recursion—these cannot be overridden by taskEnv. Reviewer subprocesses additionally force CLAUDECODE and LC_ALL.
Merge order (last wins):
process.env—inherited base environmenttaskEnv—your config values- prove_it forced vars—recursion prevention, always win
Agent reviewer tasks run claude -p in non-interactive mode. In this mode, Claude Code requires explicit permission to use tools—there's nobody at the terminal to approve prompts. By default, prove_it passes --allowedTools with a list of common built-in tools (DEFAULT_ALLOWED_TOOLS). This covers most use cases.
If your custom agent tasks need tools outside the default list (e.g., MCP tools), you have two options:
Expand the allowed list with taskAllowedTools in your config:
{
"taskAllowedTools": ["Read", "Write", "Edit", "Glob", "Grep", "Bash", "WebFetch", "WebSearch", "Task", "NotebookEdit", "mcp__xcode__XcodeBuild"],
"hooks": { "claude": { "Stop": ["..."] } }
}Skip permissions entirely with taskBypassPermissions:
{
"taskBypassPermissions": true,
"hooks": { "claude": { "Stop": ["..."] } }
}This passes --dangerously-skip-permissions to reviewer subprocesses, giving them access to all tools with no restrictions. prove_it already isolates reviewer subprocesses (PROVE_IT_DISABLED=1, no recursion), but the subprocess has full tool access.
When neither field is set, prove_it auto-detects: if your Claude Code settings use bypassPermissions mode, reviewers inherit that; otherwise they use the default allowed list.
By default, prove_it tracks Claude's built-in editing tools (Edit, Write, NotebookEdit). If Claude edits files through MCP tools (e.g. Xcode MCP's XcodeEdit), add them to fileEditingTools so prove_it can track them:
{
"fileEditingTools": ["XcodeEdit"],
"sources": ["**/*.swift", "**/*.m"],
"hooks": { "claude": { "Stop": ["..."] } }
}Tools listed in fileEditingTools are tracked alongside the builtins—they participate in sourceFilesEdited, testFilesEdited, toolsUsed, gross churn (linesWritten), and the session_diff git fallback. For gross churn, line counts are estimated from the longest string value in the tool input.
prove_it stores session data in ~/.claude/prove_it/sessions/—log files (.jsonl), state files (.json), and async task directories.
Lazy cleanup: On every fresh session start (startup source), prove_it prunes session files older than 7 days. Pruning is rate-limited to once per 24 hours (tracked via a .last_prune marker file), so it adds no overhead to normal operation.
format.maxOutputChars: Controls the maximum character count for output passed back to Claude Code hooks. Defaults to 12000. Increase if you need longer test output or decrease to save context:
{
"format": { "maxOutputChars": 20000 },
"hooks": { "claude": { "Stop": ["..."] } }
}prove_it install Register global hooks (~/.claude/settings.json)
prove_it uninstall Remove global hooks
prove_it reinstall Uninstall and reinstall global hooks
prove_it init Set up current project (interactive or with flags)
prove_it deinit Remove prove_it from current project
prove_it reinit Deinit and re-init current repository
prove_it doctor Check installation and show effective config
prove_it monitor Tail hook results in real time
prove_it signal <type> Declare a lifecycle signal (done, stuck, idle)
prove_it cancel Cancel running hook tasks for the current session
prove_it disable Silence prove_it hooks for the current session (run via `!`)
prove_it enable Re-enable prove_it hooks for the current session
prove_it catchup Fast-forward reviewer baselines past stale repo state
prove_it phase <mode> Set session phase (unknown, plan, implement, refactor)
prove_it hook <spec> Run a dispatcher directly (claude:Stop, git:pre-commit)
prove_it prefix Print install directory (for resolving libexec scripts)
prove_it record Record a test run result (--name <task> --pass|--fail|--result <N>)
prove_it help Show help
prove_it --version Show version
prove_it defaults to enabled: false—it only runs when explicitly opted in via
prove_it install (global) or prove_it init (project). Both write enabled: true
to their respective config files.
When you need to disable it after installation:
Edit ~/.claude/prove_it/config.json:
{
"ignoredPaths": ["~/bin", "~/dotfiles"]
}For all contributors—edit .claude/prove_it/config.json:
{ "enabled": false }For just you—edit .claude/prove_it/config.local.json:
{ "enabled": false }export PROVE_IT_DISABLED=1When a running Claude session is generating too much noise and you just want prove_it out of the way for the rest of the session:
! prove_it disable # silences PreToolUse / Stop / PostToolUse hooks for this session
! prove_it enable # restore themThis works because prove_it injects PROVE_IT_SESSION_ID into the shell on
SessionStart. The disabled state is keyed to that session id — other sessions
(including new Claude windows) are unaffected. On resume of a disabled session,
you'll see a one-line reminder in your terminal telling you to run
! prove_it enable to restore hooks.
Git hooks (pre-commit, pre-push) are not session-scoped and continue to run.
Use git commit --no-verify if you need to bypass those.
If you git pull (or rebase / reset) mid-session and pull in commits the
session didn't actually produce, reviewers will keep diffing against the
old baseline and flag work that isn't yours. Run:
! prove_it catchup # advance baselines for every task in this session
! prove_it catchup done-review # only advance one taskcatchup advances task refs (refs/worktree/prove_it/<task>) and the
session baseline to the current HEAD, clears successive failure counts,
removes tasks from the suspended list, and deletes any open backchannel
appeal directories. Uncommitted edits stay visible to subsequent reviewers
— catchup zips past committed history, not your in-progress work.
Scoped to the current git checkout (or worktree). Per-task form leaves session-wide state untouched.
prove_it doctor- Hooks not firing—Restart Claude Code after
prove_it install - Tests not running—Check
./script/testexists and is executable (chmod +x) - Hooks running in wrong directories—prove_it only activates in git repos
- Reviews never fire—The default
whenconditions use churn thresholds (linesChanged,linesWritten). Reviews only trigger after enough code has been written. Checkprove_it monitorto see skip reasons with current/threshold counts. If you use MCP tools that edit files (e.g. Xcode MCP'sXcodeEdit), add them tofileEditingToolsso all churn tracking works for those tools:{ "fileEditingTools": ["XcodeEdit"], "hooks": { "claude": { "Stop": ["..."] } } } - Async reviews not enforcing—Async results are harvested on the next Stop. If Claude stops work before the async review completes, the result will be enforced on the stop after that. Check
prove_it monitor --verboseto see RUNNING/DONE status progression. - Hooks hanging or taking too long—Press escape in Claude Code to dismiss the hook UI, then run
! prove_it cancelto kill all running tasks for the current session. The hook exits with approve so Claude can continue. This works because prove_it injectsPROVE_IT_SESSION_IDinto your shell environment on session start. - Config errors after upgrade—Run
prove_it reinstall && prove_it reinitto reset to current defaults
Claude sometimes uses WebFetch for GitHub URLs when the gh CLI is faster and handles authentication. This guard script denies WebFetch for any github.com URL and tells Claude to use gh instead.
1. Create the guard script (requires jq):
mkdir -p ~/bin/prove_it_tasks
cat > ~/bin/prove_it_tasks/prefer_gh_cli_over_fetch << 'SCRIPT'
#!/usr/bin/env bash
# Guard: deny WebFetch for GitHub URLs, redirect to gh CLI.
# Reads hook input from stdin (prove_it pipes tool_name + tool_input).
input=$(cat)
tool=$(echo "$input" | jq -r '.tool_name // empty')
[ "$tool" = "WebFetch" ] || exit 0
url=$(echo "$input" | jq -r '.tool_input.url // empty')
if echo "$url" | grep -qi 'github\.com'; then
echo "Do not use WebFetch for GitHub URLs. Use the gh CLI instead (e.g., gh pr view, gh issue view, gh api)."
exit 1
fi
SCRIPT
chmod +x ~/bin/prove_it_tasks/prefer_gh_cli_over_fetch2. Add to your global config (~/.claude/prove_it/config.json):
{
"hooks": {
"claude": {
"PreToolUse": [
{
"name": "prefer-gh-cli-over-fetch",
"type": "script",
"command": "~/bin/prove_it_tasks/prefer_gh_cli_over_fetch",
"quiet": true
}
]
}
}
}quiet: true suppresses log noise on every pass (most tool calls aren't WebFetch).
How it works: prove_it pipes hook context (tool name, tool input, session ID) as JSON to script tasks on stdin. The script reads stdin, checks whether the tool is WebFetch with a GitHub URL, and exits 1 to deny it. Non-WebFetch tools exit 0 immediately. Because the task has no matcher, prove_it sees all tool calls—individual scripts bail early for irrelevant tools.
See example/basic/ and example/advanced/ for working projects with configs, test suites, and reviewer prompts.
- Node.js >= 18
- Claude Code with hooks support
MIT