Safety & Guard Rails
Guard rails: the brake layer between model and action
A guard rail doesn't make the model smarter — it checks the action before the tool actually runs. This article breaks down the 4 guard layers, 7 guard-rail groups, and the gaps that remain in ClaudeKit.
Case: small task, too-broad permissions
You hand the agent a small job: fix validation in one module. To be safe it scans a few more directories, opens a sensitive config file, then sweeps a couple of unrelated cleanups into the same PR. Tests still pass — but the context now has junk files, a secret may have hit the transcript, and the review is diluted.
Without guard rail
- Glob too broad, dirty context
- cat .env, secret in transcript
- Out-of-scope cleanup, diluted review
- Tests pass, so ship it
With guard rail
- scout-block exit 2, suggests narrowing the pattern
- privacy-block exit 2 + asks user approval
- rule / hard-gate pulls back to the main task
- simplify / review gate re-checks the diff before ship
Guard rail vs prompt instruction
| Aspect | Guard rail | Prompt instruction |
|---|---|---|
| Where it runs | Harness, outside the model | In context, model reads it |
| Enforcement | Code blocks for real (blocks tool) | Model voluntarily complies |
| Long context | Still runs | Easily forgotten / drifts |
| Model misreads | Still blocks (unless it crashes) | Drifts with the model |
| Changing it | Needs code/config edits | Just edit the text |
The two are not mutually exclusive. Use guard rails for high-risk actions; for stylistic things (commit format, comment convention) an instruction is enough.
The 4 guard layers
Hook (code) Real block A script the harness invokes at lifecycle moments. Return exit 2 and the tool does NOT run. This is the only real blocking layer.
But fail-open: if the hook crashes, Claude Code lets it through — the guard turns off silently.
Rule (CLAUDE.md) Instruction Text injected into context on each prompt. Steers behavior, easy to add/edit, no redeploy needed.
No exit 2 forces it. Stronger models comply better; a user prompt can override it.
Hard-gate (XML) Strong instruction <HARD-GATE> lives in skill markdown, preserving sequence (plan before code, review before ship). A stronger signal than a plain rule.
Still an instruction, not a hook. It carries a 'User override' line.
Guard skill (user-invoked) User-initiated Runs when the user types a slash command: /ck:security-scan, /ck:predict, /ck:scenario, /ck:ship.
If not invoked, it doesn't run. It's an active layer, not automatic.
A tool call passing through guards
Exit codes decide block or not
exit 0 Allow Tool runs normally. stdout is read as JSON output.
exit 2 Real block Tool does not run. stdout dropped, stderr returned to the model as an error.
exit 1 / other Error, does NOT block Reports an error, but the tool STILL runs. Misusing exit 1 = a guard you think is on but is actually open.
Only exit 2 truly blocks. A hook that wants to enforce policy must use exit 2.
7 guard-rail groups in CK
| Group | Mechanism | Example | Enforcement |
|---|---|---|---|
| Block file/path | PreToolUse | scout-block | Code, fail-open |
| Block wrong step | UserPromptSubmit | simplify-gate | Code |
| Inject context | UserPromptSubmit | dev-rules-reminder | Code |
| Keep names clean | Pre/Post/Stop | descriptive-name | Code, nudge |
| Hard-gate skill | XML markdown | <HARD-GATE> | Instruction |
| Rule instruction | CLAUDE.md | review-audit | Instruction |
| Guard skill | User-invoked | ck:security-scan | User |
Four hooks worth a close look
scout-block Agent reads node_modules/, globs **/*.ts at root, cats dist/ — burns tokens, dirties context.
privacy-block Agent touches .env, id_rsa, *.pem — secret leaks into transcript. Exit 2 + forces AskUserQuestion.
simplify-gate Diff bloats > 400 LOC / > 8 files / a single file > 200 LOC when the prompt has ship/merge/pr/deploy.
workflow-artifact-gate Checks 5 JSON files (context, risk, verification, review, adversarial) before ship/push/pr/deploy.
Read hook state by 3 labels
script file A .cjs file exists in .claude/hooks/. Having the file does NOT mean the hook is running.
wired settings.json has attached the script to a lifecycle event. Not wired, Claude Code never calls it.
runtime flag Once invoked, code reads .ck.json / default / ENV to decide whether to proceed (e.g. privacyBlock, simplify.gate.enabled).
isHookEnabled() only turns off when hooks.<name> is false. A missing key is usually treated as enabled. Don't collapse everything into "on/off by default".
8 known gaps
- 1 File-access guard lives in hooks (scout-block / privacy-block), not a separate permissions.deny.
- 2 Bash is exempt from privacy-block — only warned, no approval flow.
- 3 workflow-artifact-gate needs opt-in. The strongest gate does not run unless enabled.
- 4 simplify-gate doesn't block by default. You must explicitly set simplify.gate.enabled.
- 5 simplify-gate slips on phrasing — 'ship on Friday' is skipped.
- 6 .ckignore can negate node_modules with a ! allowlist. Check it directly.
- 7 Rules are instructions — Claude can drift when a user prompt overrides.
- 8 HARD-GATE XML is also an instruction, with a 'User override'. No exit 2 enforces it.
Also: there is no rate-limit / quota guard. A runaway agent can fire thousands of Read/Bash calls; limits live in Claude Code / provider / billing, not in this guard rail.
6 checks before you trust a guard rail
- Which hooks does settings.json wire to which lifecycle?
- Which hooks has .ck.json disabled? (both outer + inner config)
- Is the current Claude Code session skipping permission prompts?
- Any project .ckignore override? Is node_modules ! allowlisted?
- Are the rule files in CLAUDE.md complete?
- Is any hook crashing silently in this session (check stderr)?
Snapshot of Engineer Kit stable claudekit-engineer@2.19.1. The 2.19.2-beta is preparing to drop 2 Agent Teams hooks, so the hook list may change when upstream ships a new version.
Read the full article
Read the full version with the ClaudeKit hooks lifecycle map, settings.json / .ck.json snippets, per-hook mechanics, and a deep analysis of the gaps →