Safety & Guard Rails

Guard rails: the brake layer between model and action

A guard rail doesn't make the model smarter — it checks the action before the tool actually runs. This article breaks down the 4 guard layers, 7 guard-rail groups, and the gaps that remain in ClaudeKit.

UI Review Gate

Some guard rails block tools with code. UI Review Gate adds a human browser review before the agent continues.

Open guide

Case: small task, too-broad permissions

You hand the agent a small job: fix validation in one module. To be safe it scans a few more directories, opens a sensitive config file, then sweeps a couple of unrelated cleanups into the same PR. Tests still pass — but the context now has junk files, a secret may have hit the transcript, and the review is diluted.

Without guard rail

Glob too broad, dirty context
cat .env, secret in transcript
Out-of-scope cleanup, diluted review
Tests pass, so ship it

With guard rail

scout-block exit 2, suggests narrowing the pattern
privacy-block exit 2 + asks user approval
rule / hard-gate pulls back to the main task
simplify / review gate re-checks the diff before ship

Guard rail vs prompt instruction

Aspect	Guard rail	Prompt instruction
Where it runs	Harness, outside the model	In context, model reads it
Enforcement	Code blocks for real (blocks tool)	Model voluntarily complies
Long context	Still runs	Easily forgotten / drifts
Model misreads	Still blocks (unless it crashes)	Drifts with the model
Changing it	Needs code/config edits	Just edit the text

The two are not mutually exclusive. Use guard rails for high-risk actions; for stylistic things (commit format, comment convention) an instruction is enough.

The 4 guard layers

Hook (code) Real block

A script the harness invokes at lifecycle moments. Return exit 2 and the tool does NOT run. This is the only real blocking layer.

But fail-open: if the hook crashes, Claude Code lets it through — the guard turns off silently.

Rule (CLAUDE.md) Instruction

Text injected into context on each prompt. Steers behavior, easy to add/edit, no redeploy needed.

No exit 2 forces it. Stronger models comply better; a user prompt can override it.

Hard-gate (XML) Strong instruction

<HARD-GATE> lives in skill markdown, preserving sequence (plan before code, review before ship). A stronger signal than a plain rule.

Still an instruction, not a hook. It carries a 'User override' line.

Guard skill (user-invoked) User-initiated

Runs when the user types a slash command: /ck:security-scan, /ck:predict, /ck:scenario, /ck:ship.

If not invoked, it doesn't run. It's an active layer, not automatic.

A tool call passing through guards

User prompt Prompt gate (UserPromptSubmit) can block + ask for a fix

Model decides to call a tool Read / Bash / Edit / Write ...

Pre-tool hook (scout / privacy) exit 2 → tool does NOT run, reason returned to model

Tool exec exit 0 → real read / write / run on the machine

Post-tool hook validate, scan, nudge next step (PostToolUse / Stop)

→ Result returns to the model, having passed the guard layers

Exit codes decide block or not

exit 0 Allow

Tool runs normally. stdout is read as JSON output.

exit 2 Real block

Tool does not run. stdout dropped, stderr returned to the model as an error.

exit 1 / other Error, does NOT block

Reports an error, but the tool STILL runs. Misusing exit 1 = a guard you think is on but is actually open.

Only exit 2 truly blocks. A hook that wants to enforce policy must use exit 2.

7 guard-rail groups in CK

Group	Mechanism	Example	Enforcement
Block file/path	PreToolUse	`scout-block`	Code, fail-open
Block wrong step	UserPromptSubmit	`simplify-gate`	Code
Inject context	UserPromptSubmit	`dev-rules-reminder`	Code
Keep names clean	Pre/Post/Stop	`descriptive-name`	Code, nudge
Hard-gate skill	XML markdown	`<HARD-GATE>`	Instruction
Rule instruction	CLAUDE.md	`review-audit`	Instruction
Guard skill	User-invoked	`ck:security-scan`	User

Four hooks worth a close look

scout-block

Agent reads node_modules/, globs **/*.ts at root, cats dist/ — burns tokens, dirties context.

gap: Build commands (npm/go/cargo build) always pass through. Fail-open when the hook crashes.

privacy-block

Agent touches .env, id_rsa, *.pem — secret leaks into transcript. Exit 2 + forces AskUserQuestion.

gap: Bash tool is exempt, only warned. Secret patterns are hard-coded, no config to add more.

simplify-gate

Diff bloats > 400 LOC / > 8 files / a single file > 200 LOC when the prompt has ship/merge/pr/deploy.

gap: Fresh install does NOT block: needs simplify.gate.enabled=true. 'ship on Friday' is also skipped.

workflow-artifact-gate

Checks 5 JSON files (context, risk, verification, review, adversarial) before ship/push/pr/deploy.

gap: The strongest gate, but MUST be opted in: wire it in settings.json + enable it in .ck.json.

Read hook state by 3 labels

script file

A .cjs file exists in .claude/hooks/. Having the file does NOT mean the hook is running.

wired

settings.json has attached the script to a lifecycle event. Not wired, Claude Code never calls it.

runtime flag

Once invoked, code reads .ck.json / default / ENV to decide whether to proceed (e.g. privacyBlock, simplify.gate.enabled).

isHookEnabled() only turns off when hooks.<name> is false. A missing key is usually treated as enabled. Don't collapse everything into "on/off by default".

8 known gaps

1 File-access guard lives in hooks (scout-block / privacy-block), not a separate permissions.deny.
2 Bash is exempt from privacy-block — only warned, no approval flow.
3 workflow-artifact-gate needs opt-in. The strongest gate does not run unless enabled.
4 simplify-gate doesn't block by default. You must explicitly set simplify.gate.enabled.
5 simplify-gate slips on phrasing — 'ship on Friday' is skipped.
6 .ckignore can negate node_modules with a ! allowlist. Check it directly.
7 Rules are instructions — Claude can drift when a user prompt overrides.
8 HARD-GATE XML is also an instruction, with a 'User override'. No exit 2 enforces it.

Also: there is no rate-limit / quota guard. A runaway agent can fire thousands of Read/Bash calls; limits live in Claude Code / provider / billing, not in this guard rail.

6 checks before you trust a guard rail

Which hooks does settings.json wire to which lifecycle?
Which hooks has .ck.json disabled? (both outer + inner config)
Is the current Claude Code session skipping permission prompts?
Any project .ckignore override? Is node_modules ! allowlisted?
Are the rule files in CLAUDE.md complete?
Is any hook crashing silently in this session (check stderr)?

Snapshot of Engineer Kit stable claudekit-engineer@2.20.0, which dropped the 2 Agent Teams hooks (task-completed-handler, teammate-idle-handler) present in 2.19.x. The hook list may change when upstream ships a new version.

See each hook in detail in the Custom Hooks guide

Read the full article

Read the full version with the ClaudeKit hooks lifecycle map, settings.json / .ck.json snippets, per-hook mechanics, and a deep analysis of the gaps →

Read full article