Designing for Claude Code

I’ve been building Claude Code plugins for a few months now - a whole marketplace of them, actually - and I recently started building plugins at work, too. Somewhere between my fifth skill rewrite and my third “why did Claude forget that rule” incident, I realized I’d accidentally assembled a mental framework for when to use what.

This post is an attempt to codify that framework, and to share it with the community.

Let’s get into it.

This is a long one. Here’s the map if you want to jump ahead:

Core Philosophy — three principles that inform every decision
The Decision Matrix — how to route any behavior to the right tool
Composable Workflow Patterns — five patterns that hold up in practice (Skill+Hook, Skill+Subagent, XML branching, shared steps, autonomous orchestration)
Sandboxing Integration — how sandboxing changes what’s possible
Anti-Patterns to Avoid — eight mistakes, all mine
Quick Reference — routing table for common behaviors
Hook Events Reference — all 21 events, handler types, and matcher patterns
Getting Started — the order to layer things in

Note

Things are changing fast in this space. Like, really fast. I’ll do my best to keep this updated as I learn more, and any time there are changes that impact what’s here. But fair warning: if you’re reading this six months from now, some of this might already be outdated. That’s the deal with building on top of something that ships new features every week.

Core Philosophy

Three principles guide every decision I make. These aren’t theoretical - they’ve been validated repeatedly by people building real workflows, including me.

1. Determinism over suggestion

If something must happen every time, use a hook. If it should happen when contextually relevant, use a skill. If it requires judgment in an isolated context, use a subagent.

Multiple practitioners have independently discovered that CLAUDE.md rules and prompt-based instructions get deprioritized in long sessions. Lakshmi Narasimhan’s guide puts it bluntly: “An instruction in your CLAUDE.md file is a suggestion. The AI will probably follow it, but in a long conversation, it might get pushed out of the context window or just de-prioritized. A hook is a hard-coded rule. It’s a command that is guaranteed to run when its specific event happens.”

I’ve felt this one personally. I had rules in my CLAUDE.md that Claude followed perfectly for the first 20 minutes of a session, and then just… stopped. Not maliciously. It just got crowded out by more immediate context. That’s when hooks clicked for me.

Another way to think about this: who is the primary audience for what you’re writing?

If it’s Claude as a task execution engine - the thing you’re building is instructions for Claude to follow precisely. That should be 99.9% XML. Structured steps, explicit conditions, no ambiguity. You’re programming behavior, not having a conversation.
If it’s Claude as a reasoning partner - maybe you need Claude to think through something you’re not sure about yourself, or you’re building a chatbot-style experience. That should be a plain prompt, or a more structured prompt in a markdown file somewhere. Bonus points if you structure it so it’s reusable.

The format should match the intent. When I’m writing a skill that deploys code, that’s XML - Claude doesn’t need to reason about whether to run the linter, it needs to run the linter. When I’m writing a skill that helps me brainstorm a blog post topic, that’s conversational markdown - I want Claude to improvise.

2. Progressive disclosure over upfront loading

Only load what’s needed, when it’s needed. Metadata is cheap; full context is expensive.

At startup, only the name and description from all Skills’ YAML frontmatter are loaded into the system prompt. The full SKILL.md is read only when the skill becomes relevant. Supporting files (reference docs, templates, scripts) are read only when Claude accesses them. Scripts are executed via bash and only their output consumes tokens - the script itself never enters the context window.

This is by design: the context window is a shared resource, and every token your skill loads competes with conversation history and everything else Claude needs to reason.

3. Simplicity compounds; complexity decays

Every layer of abstraction you add is a layer Claude can misinterpret. Start simple, add complexity only when you’ve hit a wall.

The most-cited best practices compilation - shanraisshan’s claude-code-best-practice repo - states it definitively: “Simple control loops outperform multi-agent systems. Low-level tools (Bash, Read, Edit) plus selective high-level abstractions beat heavy RAG or complex frameworks.”

But here’s the nuance: it’s unrestrained complexity that decays. Your overall workflow can absolutely be complex - in fact, most real-world workflows are. The key is that complexity should be composed from simple, digestible pieces, not piled into a monolith.

I’ve been exploring a pattern where shared workflows and workflow steps get composed at runtime as needed. Think of it like function decomposition, but for Claude instructions. Each piece is small enough to understand in isolation, and you assemble them into larger workflows by referencing shared steps. The benefits are real: your workflows stay maintainable, each invocation only pulls in the context it needs (saving tokens for the actual work), and when something breaks, you know exactly which piece to fix.

The most successful Claude Code users I’ve seen keep individual components simple and iterate continuously - refining CLAUDE.md, skills, and workflows based on what Claude gets wrong. I’m one of them. My plugins have gone through four major versions, and every version got simpler at the component level, even as the overall system got more capable.

The Decision Matrix

Use this as your primary routing logic. For any given behavior you want Claude to exhibit, ask these questions in order.

1. Does this need to happen every single time, no exceptions?

Use a Hook.

Hooks are deterministic. They execute outside the agentic loop, which means Claude can’t forget, deprioritize, or reinterpret them. They’re shell commands triggered by lifecycle events, and they add zero token overhead because they run outside the context window.

The title of Lakshmi Narasimhan’s widely-shared article captures the consensus: “Claude Code Hooks: The Feature You’re Ignoring While Babysitting Your AI.” His conclusion: hooks transform Claude from “an AI assistant you have to supervise” into “an AI assistant that follows your rules automatically.”

Good candidates:

Auto-formatting after file edits (PostToolUse)
Blocking dangerous commands before execution (PreToolUse)
Running tests before Claude is allowed to “finish” (Stop)
Sending notifications when Claude needs input (Notification)
Re-injecting critical rules after compaction (PreToolUse on UserPromptSubmit)
Validating output against a schema before accepting it

Bad candidates:

Anything requiring judgment or context-awareness (use a skill)
Anything that varies based on what Claude is working on

Hook types by complexity

Command hooks (90% of cases): Shell scripts. Fast, predictable, no token cost. This is your workhorse.

Prompt hooks: Send a question to a lightweight model for a yes/no decision. No tool access. Use when the decision is simple but not purely mechanical.

Agent hooks: Spawn a subagent that can read files, run commands, and reason. Use when verification requires understanding the codebase state. One developer describes combining a Stop hook with an agent hook that verifies whether all requested tasks were completed and tests were run - creating “an AI agent that verifies its own work before stopping.” His conclusion: “the more you automate the verification of AI-generated code, the better the output quality.”

The compaction survival pattern

This one deserves special attention because it solves a problem that will bite you eventually. Long sessions trigger context compaction, and Claude can “forget” important rules.

The early community pattern was a PreToolUse hook on UserPromptSubmit that re-injects rules via cat .claude/rules.txt on every prompt. That works, but it’s blunt - you’re burning tokens re-injecting the same rules whether compaction happened or not.

A more powerful approach uses the PreCompact hook event with an agent-type hook. PreCompact fires right before compaction happens, so you can have an agent capture the full session state before it gets compressed:

settings.json

{
  "hooks": {
    "PreCompact": [
      {
        "hooks": [
          {
            "type": "agent",
            "prompt": "Record everything we've discussed in this session and any existing task lists with the current state of the task list to `.claude/local/sessions/<datetime>-short-description.md`. If this file already exists from a previous compaction, append to it rather than overwriting."
          }
        ]
      }
    ]
  }
}

This gives you a full session journal that survives compaction. Instead of fighting context rot by re-injecting rules, you’re preserving the actual context. The agent captures what matters - decisions made, tasks in progress, context that would otherwise be lost - and writes it to a file Claude can reference after compaction.

You can combine both approaches: PreCompact for session preservation, plus a lightweight UserPromptSubmit hook that re-injects just your non-negotiable rules (the ones Claude should never deprioritize regardless of context).

2. Is this reusable knowledge, a workflow, or a procedure Claude should follow?

Use a Skill.

Skills are contextual instructions that Claude loads on-demand. They’re the right tool for encoding “how to do X well” - domain knowledge, step-by-step procedures, templates, coding patterns, and reference material.

Shrivu Shankar, in a widely-shared post, argues that skills formalize a “scripting”-based agent model that is more robust and flexible than the rigid, API-like model that MCP represents. His framing: instead of hand-crafting tools and abstracting away reality for the agent, you give the agent access to the raw environment - binaries, scripts, docs - and it writes code on the fly to interact with them. Skills are the productized version of this pattern. Simon Willison echoed this, calling skills “maybe a bigger deal than MCP.”

Good candidates:

Multi-step workflows (e.g., “how to create a PR in this repo”)
Domain-specific knowledge injection (e.g., “our API conventions”)
Template-driven output (e.g., “generate a component using this structure”)
Procedures with decision points Claude needs to reason through
Anything invoked by the user via /skill-name

Skills can restrict tools too

This one’s important because it blurs the line between skills and subagents in a useful way. Skills can declare allowed-tools in their YAML frontmatter:

---
name: deploy
description: Production deployment workflow
allowed-tools: [Read, Glob, Grep, Bash, WebSearch]
---

When a skill declares allowed-tools, Claude is restricted to only those tools while executing the skill. No Write, no Edit, no accidentally modifying code during a read-only review workflow. This means you get tool restrictions - one of the primary reasons to reach for a subagent - without the overhead of spawning a separate context. The skill runs in your main conversation, with your full context, but with guardrails on what it can do. One less reason to create agents for everything.

Skill design principles

Keep SKILL.md under 500 lines. Move reference material, templates, examples, and data into supporting files in the skill directory. Once Claude loads SKILL.md, every token competes with conversation history. Supporting files have zero context cost until read.

Use tons of reference files. This is the single most impactful thing you can do for token efficiency. Reference files - anything in your skill directory that isn’t SKILL.md itself - cost nothing until Claude actually reads them. Templates, examples, platform-specific rules, data schemas, lookup tables - all of it should be in separate files. Your SKILL.md should be the lightest thing in the directory: just the instructions for what to do and pointers to where the details live. Every line you move from SKILL.md into a reference file is a line that only consumes tokens when it’s actually relevant.

Use XML for structured workflows. When the skill defines a predictable sequence of steps, wrap them in XML tags. Anthropic’s own prompting docs confirm that XML tags help Claude parse prompts more accurately, leading to higher-quality outputs. XML is especially valuable for branching workflows where Claude needs to follow different paths based on conditions - the tags make the boundaries explicit and prevent Claude from improvising when it shouldn’t.

<workflow>
  <step name="validate">
    Run the validation script in scripts/validate.sh.
    If validation fails, stop and report the errors.
  </step>
  <step name="generate">
    Read templates/component.tsx and generate the output
    following the template structure exactly.
  </step>
  <step name="verify">
    Run the quality gates defined in ../_shared/steps/run-quality-gates.md.
  </step>
</workflow>

One skill, one job. Don’t create “do everything” skills. Create focused skills that can be composed. I learned this the hard way - my first plugin had one massive skill that tried to handle six different workflows. Version 2 had six focused skills. Guess which one actually worked.

Attach scripts instead of documenting procedures. Instead of writing “run eslint with these flags and then run prettier with these flags” in your SKILL.md, put that logic in a script within the skill directory and tell Claude to execute it. The script runs via bash and only its output enters the context. The script logic itself never consumes tokens.

Put output templates in separate files. If the skill produces structured output (a PR body, a changelog entry, a component scaffold), put the template in a file like templates/output.md. This keeps instructions and output format cleanly separated, makes templates easier to maintain, and avoids bloating SKILL.md.

Writing effective skill descriptions

The description field is critical - it’s how Claude decides whether to load your skill. Community testing has revealed specific patterns that dramatically improve auto-invocation accuracy.

Use a WHEN + WHEN NOT pattern. Generic descriptions fail. Specific trigger conditions succeed. John Conneely’s testing found that structuring descriptions with explicit invocation and exclusion conditions made skills invoke reliably every time:

description: >
  Stakeholder context for Test Project when discussing product features,
  UX research, or stakeholder interviews. Auto-invoke when user mentions
  Test Project, product lead, or UX research. Do NOT load for general
  stakeholder discussions unrelated to Test Project.

Use possessive pronouns for personal skills. Conneely discovered that using “HIS/HER/THEIR” in descriptions scoped personal skills perfectly:

description: >
  Personal work preferences and communication style for [Name].
  Auto-invoke when drafting HIS emails, Slack messages, or internal updates;
  planning HIS work or tasks; optimising HIS productivity workflows.
  Do NOT load for external blog posts or customer-facing communications.

Scott Spence’s reliability testing: In 200+ tests across different prompt types, Spence found that simple “suggestion” hooks for skill activation hit only ~50% reliability. A forced evaluation pattern - where Claude is required to explicitly evaluate each skill’s relevance before proceeding - hit 80-84%. His follow-up post digs into how to make skills activate reliably. The takeaway: if auto-invocation matters to you, invest in description quality and consider evaluation hooks.

3. Does this task need isolated context, restricted tools, or a specialized persona?

Use a Subagent.

Subagents get their own context window, their own system prompt, and their own tool permissions. Use them when the work would pollute your main conversation or when the task is truly independent.

Note: if you only need tool restrictions, consider using a skill with allowed-tools in its frontmatter first (see above). You get the same guardrails without the overhead of a separate context. Save subagents for when you genuinely need context isolation.

Good candidates:

Code review (isolated from the implementation context)
Research/exploration that generates verbose output
Security audits (read-only tool access)
Architecture analysis that shouldn’t modify code
Any task where you want the result but not the intermediate reasoning

Use subagents sparingly

This is critical, and it’s not just me saying it.

Anthropic’s own prompting docs for Claude Opus 4.6 explicitly state that it “has a strong predilection for subagents and may spawn them in situations where a simpler, direct approach would suffice.” Their recommendation: add explicit guidance about when subagents are and aren’t warranted:

Use subagents when tasks can run in parallel, require isolated context, or involve independent workstreams that don’t need to share state. For simple tasks, sequential operations, single-file edits, or tasks where you need to maintain context across steps, work directly rather than delegating.

The shanraisshan best practices repo recommends using “feature-specific subagents (extra context) with skills (progressive disclosure) instead of general QA, backend engineer” agents. If you’re creating generic-persona agents, you’ve overcomplicated things. Feature-specific subagents paired with focused skills is the pattern that works.

The token cost is real, too. Every subagent gets its own context window. If you’re spinning up agents for things that could be a direct tool call, you’re burning tokens for no reason. Delegate verbose, isolated work. Do simple work directly.

Subagent memory

Subagents can maintain persistent memory directories that survive across conversations:

---
name: code-reviewer
description: Reviews code for quality and best practices
memory: user
---

The memory field scopes where the persistent directory lives:

user: Memory persists across all your projects
project: Memory is scoped to the current project

This turns a stateless reviewer into one that learns your codebase. The subagent can read and write to this directory across sessions, building up patterns, conventions, and recurring issues over time.

4. Is this project-level context that Claude needs in every session?

Use CLAUDE.md.

CLAUDE.md is for stable, always-true information. It loads at the start of every conversation and should act as a router, not an encyclopedia.

Alex from alexop.dev captures the consensus: “Use CLAUDE.md for short, always-true project conventions and standards. Use skills when you want Claude to auto-apply a richer workflow, often with supporting files.” His other post on progressive disclosure is worth reading if you’re finding your CLAUDE.md growing out of control.

Good content for CLAUDE.md:

Project structure overview
Tech stack and key dependencies
Build/test/lint commands
Coding conventions and constraints (use MUST/MUST NOT language - “Prefer TypeScript” is a suggestion Claude might ignore; “MUST use TypeScript strict mode” is a rule it tends to follow)
Pointers to skills, docs, and reference material
Team norms (commit message format, PR process, etc.)

Keep CLAUDE.md lean. Community consensus suggests aiming for under 150 lines. As one practitioner noted: “/memory, /rules, constitution.md does not guarantee anything.” This is why the layered approach matters - CLAUDE.md sets the baseline, hooks enforce the non-negotiables, and skills handle the complex procedures.

Use path-scoped rules for directory-specific context. Instead of bloating CLAUDE.md, put rules in .claude/rules/*.md with frontmatter globs:

.claude/rules/
├── api-rules.md       # paths: src/api/**
├── frontend-rules.md  # paths: src/components/**
└── test-rules.md      # paths: tests/**

These load only when Claude accesses files matching the glob pattern. Your CLAUDE.md stays lean; your rules stay relevant.

5. Does this need to persist across sessions?

Use the right memory mechanism for the scope.

Memory Type	Scope	How It Works	Best For
CLAUDE.md	Project	Loaded every session. You maintain it manually.	Stable facts, conventions
Session Memory	Project	Automatic. Claude writes summaries to disk; loads them in future sessions.	”What did I do yesterday?” continuity
Subagent Memory	User or Project	Persistent directory a subagent reads/writes across conversations.	Domain knowledge that accumulates
Tasks	Project	Filesystem-persisted DAGs (`~/.claude/tasks`). Survive across sessions and compaction.	Multi-session project tracking with dependencies
Memory Tool (API)	Custom	Client-side file directory for just-in-time retrieval. You control storage.	Long-running agentic workflows

The structured recovery pattern

For long-running projects spanning multiple sessions, memory files need to be bootstrapped deliberately:

Initializer session: Set up a progress log (what’s done, what’s next), a feature checklist (scope of work), and references to startup scripts.
End-of-session update: Before ending, update the progress log with what was completed and what remains.
Work on one feature at a time. Only mark complete after end-to-end verification. This keeps the progress log trustworthy.

This turns memory into a structured recovery mechanism, so each new session picks up exactly where the last one left off - rather than Claude spending the first five minutes being re-explained what you built yesterday.

6. Does Claude need real-time code intelligence for this project?

Configure LSP via .lsp.json.

LSP (Language Server Protocol) integration gives Claude IDE-like code intelligence: go-to-definition, find-references, hover info, symbol search, and real-time diagnostics. This means Claude navigates your codebase structurally rather than via text search - reportedly ~900x faster for navigation operations.

LSP plugins can be configured in .lsp.json at the plugin root, or inline in plugin.json:

.lsp.json

{
  "typescript": {
    "command": "typescript-language-server",
    "args": ["--stdio"],
    "extensionToLanguage": {
      ".ts": "typescript",
      ".tsx": "typescriptreact",
      ".js": "javascript",
      ".jsx": "javascriptreact"
    }
  }
}

LSP plugins configure the connection, not the server

You still need to install the language server binary separately (e.g., npm install -g typescript-language-server, pip install pyright, go install golang.org/x/tools/gopls@latest).

Available via the official Anthropic marketplace and community marketplaces like Piebald-AI’s claude-code-lsps (covering 17+ languages). Install with /plugin marketplace add then /plugin install.

LSP is still maturing

LSP support was officially added in v2.0.74 (December 2025). There are some known issues with plugin propagation. If things aren’t working, check the /plugin Errors tab and verify the language server binary is in your PATH.

Composable Workflow Patterns

The real power isn’t in any single feature - it’s in how they compose. These patterns are drawn from what the community has found works in practice. I use several of them in my own plugins.

Pattern 1: Skill + Hook (Guided Execution with Guardrails)

The skill defines what to do. The hook ensures quality gates are met. This is the “babysitting elimination” pattern.

User invokes /deploy
  → Skill loads: deployment checklist, environment config template
  → Claude follows the procedure
  → Stop hook (agent type): verifies all tests pass and changelog is updated
  → PostToolUse hook: formats any modified files

The skill gives Claude the procedure; the hooks enforce the non-negotiable checkpoints. You get the benefits of Claude’s reasoning on the complex parts, with deterministic enforcement on the mechanical parts.

Pattern 2: Skill + Subagent (Divide and Verify)

The skill orchestrates; the subagent does isolated work.

User invokes /implement-feature
  → Skill loads: feature implementation workflow
  → Step 1: Claude writes the code (main context)
  → Step 2: Skill instructs Claude to delegate review to a subagent
  → Subagent (read-only tools) reviews the code, returns findings
  → Step 3: Claude addresses findings in main context

The subagent’s verbose analysis stays out of the main context. Only the summary comes back. This mirrors how real engineering teams work: specialization without fragmentation.

Pattern 3: Context-Dependent Branching (XML-Structured)

A single skill that adapts based on how it’s invoked or what it finds. The XML structure makes branching explicit - Claude doesn’t have to guess which path to take.

<workflow>
  <step name="detect-context">
    Check the current branch name and recent git log.
    Determine if this is a feature branch, hotfix, or main.
  </step>
 
  <step name="feature-branch" condition="feature branch detected">
    Follow the standard PR preparation workflow.
    Read templates/pr-template.md for the PR body format.
    Read ../_shared/steps/run-quality-gates.md for pre-PR checks.
  </step>
 
  <step name="hotfix" condition="hotfix branch detected">
    Follow the expedited review process.
    Skip changelog generation. Flag for immediate review.
    Read templates/hotfix-template.md for the PR body format.
  </step>
</workflow>

Pattern 4: Shared Workflow Steps (The Composability Pattern)

Reusable workflow steps that multiple skills can reference. This is the closest thing to “shared libraries” that works today, and it’s how I structure my plugin marketplace.

.claude/skills/
├── _shared/                          # Shared reference library
│   ├── SKILL.md                      # description + index of shared resources
│   ├── steps/
│   │   ├── validate-environment.md   # Reusable step: check env vars, deps
│   │   ├── run-quality-gates.md      # Reusable step: lint, test, type-check
│   │   └── prepare-commit.md         # Reusable step: stage, message, sign
│   ├── templates/
│   │   ├── pr-body.md
│   │   └── changelog-entry.md
│   └── scripts/
│       ├── check-deps.sh
│       └── validate-schema.sh
├── deploy/
│   └── SKILL.md                      # References: ../_shared/steps/validate-environment.md
├── implement-feature/
│   └── SKILL.md                      # References: ../_shared/steps/run-quality-gates.md
└── create-pr/
    └── SKILL.md                      # References: ../_shared/steps/prepare-commit.md

Each skill’s SKILL.md explicitly tells Claude to read the shared step file:

## Step 3: Quality Gates
Read and follow the procedure in `../_shared/steps/run-quality-gates.md`.
Execute the script at `../_shared/scripts/validate-schema.sh` and check its output.

This works because shared files live in one place (single source of truth), there’s no context cost until Claude reads the referenced file, scripts are executed not read, any skill can reference any shared step, and changes propagate immediately.

Cross-plugin sharing is limited

For cross-plugin sharing in a marketplace, you’re limited to symlinks (which resolve at install time, giving each plugin its own copy). Alternatively, consider a dedicated “commons” plugin that other plugins reference. This is fragile and undocumented - worth watching as the plugin system matures.

Pattern 5: Hooks as Autonomous Workflow Orchestrators

For fully autonomous workflows, hooks create a closed loop where Claude works, gets verified, and persists state - all without you watching.

SessionStart hook
  → Injects today's task list from tasks/ directory
  → Loads relevant session memory

PreToolUse hook (on UserPromptSubmit)
  → Re-injects non-negotiable rules

PreCompact hook (agent type)
  → Preserves full session state to .claude/local/sessions/
  → Captures task lists, decisions, context before compression

PostToolUse hook (on Write/Edit)
  → Runs formatter
  → Updates dirty-file tracker

Stop hook (agent type)
  → Subagent verifies: did Claude complete what was asked?
  → If not: returns blocking signal with "Tests failing. Fix before finishing."
  → If yes: updates progress log, writes session summary

Notification hook
  → Desktop notification when Claude finishes or needs input

Two gotchas worth knowing

Formatter noise: If your formatter changes files in a PostToolUse hook, Claude gets a system reminder about those changes every time, eating into your context window. The smarter approach for heavy formatting is to format on commit via a Stop hook rather than after every individual edit.

Always check stop_hook_active in Stop hooks. When it’s true, Claude is already continuing because of a previous Stop hook. Exit 0 immediately. Without this check, your hook blocks Claude forever. This is the #1 mistake new hook authors make.

Sandboxing Integration

Sandboxing isn’t just a security feature - it’s an autonomy enabler. Anthropic’s internal usage data shows sandboxing reduces permission prompts by 84%. With sandboxing on, you can grant broader permissions because the blast radius is contained.

When to enable sandboxing

Always, unless you have a specific reason not to. Anthropic recommends sandboxing as the default configuration for all environments. It adds <15ms of latency and provides:

Filesystem isolation: Writes restricted to working directory and subdirectories
Network isolation: Only approved domains reachable, via a proxy server outside the sandbox
OS-level enforcement: Uses macOS Seatbelt or Linux bubblewrap - not prompt-based, not bypassable by Claude, covers all subprocesses

Both isolation layers are critical. Without network isolation, a compromised agent could exfiltrate SSH keys. Without filesystem isolation, a compromised agent could escape the sandbox and gain network access.

Sandboxing + Hooks synergy

This is the combination that turns Claude Code into a genuinely autonomous tool:

Without sandbox: You need conservative permission rules → more prompts → more babysitting
With sandbox + auto-allow mode: Sandboxed commands run automatically without prompting. Hooks handle quality gates. You handle only the exceptions that escape the sandbox boundary.

The result: Claude works freely within defined limits, hooks enforce your standards deterministically, and you only get interrupted for things that genuinely need your attention.

Things that need to escape the sandbox

These gracefully fall back to the normal permission flow - you don’t lose safety, you just get prompted:

Docker commands (need daemon access - add docker to excludedCommands)
Network calls to non-allowlisted domains
Git operations requiring authentication
Anything writing outside the project directory

Anti-Patterns to Avoid

I’ve hit every single one of these. Learn from my mistakes.

1. The “God Skill”

A single skill that tries to handle every workflow. Split it. One skill, one job. Multiple focused skills that compose are always better than one massive skill. My first plugin version had one of these. It was 800 lines of SKILL.md. Claude got confused constantly. Don’t be me.

2. The Agent Army

Creating subagents for everything - especially generic-persona agents like “backend engineer” or “QA tester.” Anthropic explicitly warns that Opus 4.6 over-spawns subagents. Feature-specific subagents with skills beat general-purpose agents. If you have more than 3-4 subagents, you’ve probably overcomplicated things.

3. CLAUDE.md as Documentation

CLAUDE.md is not your project wiki. It’s a bootstrap file and a router. Keep it under 150 lines. Extract everything else into skills, rules, and reference files. Your CLAUDE.md should become a router pointing to the right context, with each referenced file having a specific job.

4. Prompting What Should Be Hooked

If you find yourself repeatedly telling Claude to run the linter, format code, or check tests - that’s a hook, not a prompt. “Prompts are suggestions; hooks are guarantees.” If you’re typing “please remember to run prettier” by hand, you’re doing it wrong.

5. Documenting Procedures Instead of Attaching Scripts

Instead of writing step-by-step shell commands in SKILL.md, put the logic in a script file within the skill directory. Scripts execute via bash; only their output enters the context window. The script’s source code never consumes tokens.

6. Putting Output Templates Inline in SKILL.md

Templates belong in separate files within the skill directory. They get read on-demand (no context cost until needed), they’re easier to maintain, and they keep SKILL.md focused on instructions rather than output structure.

7. Ignoring Compaction and Context Rot

Context rot - where Claude gradually deprioritizes earlier instructions - is the primary failure mode in long sessions. Multiple community voices call this the #1 issue. Mitigate with:

PreCompact hooks (agent type) that preserve session state before compression
UserPromptSubmit hooks that re-inject non-negotiable rules
Session memory for automatic cross-session persistence
Tasks for structured progress tracking that survives compaction
Aggressive /clear usage when switching topics
Subagent delegation to keep the main context clean

One developer’s mantra captures it: “Compound, don’t compact. Extract learnings automatically, then start fresh with full context.”

8. Overusing MCP Where a Skill Would Suffice

I’ve seen developers building bloated MCP servers with dozens of tools that just mirror a REST API (read_thing_a(), read_thing_b(), update_thing_c()). The “scripting” model - formalized by skills - is better for most cases. MCP’s focused role should be as a secure gateway providing a few powerful, high-level tools, not as a replacement for domain knowledge and procedures that belong in skills.

Quick Reference: “Where Does This Go?”

What You Want	Where It Goes	Why
”Always format on save”	Hook (PostToolUse)	Deterministic, no tokens
”Block rm -rf”	Hook (PreToolUse)	Must happen every time
”Survive compaction”	Hook (PreCompact, agent type)	Preserves session state before compression
”Notify me when done”	Hook (Notification/Stop)	No more watching the terminal
”How to deploy”	Skill	Multi-step procedure with judgment
”Our API conventions”	Skill or Rule	Domain knowledge, loaded on-demand
”Generate a component”	Skill + template file	Template stays out of context until read
”Run validation logic”	Skill + script file	Script output enters context, not source
”Review this code”	Subagent (read-only tools)	Needs isolated context
”Project uses TypeScript strict”	CLAUDE.md	Always-true constraint
”Test command is `pnpm test`”	CLAUDE.md	Stable project fact
”Frontend rules”	`.claude/rules/frontend.md`	Path-scoped, loads only when relevant
”What I did yesterday”	Session Memory (automatic)	Cross-session continuity
”Track feature progress”	Tasks	Persistent DAGs, survives compaction
”Reusable PR template”	Skill directory (`templates/`)	On-demand, no upfront cost
”Shared validation step”	`_shared/` skill directory	Single source of truth, referenced by other skills
”Type info and go-to-definition”	LSP plugin (`.lsp.json`)	Structural navigation, ~900x faster

On Hook Events

Event	When it fires
`SessionStart`	When a session begins or resumes
`UserPromptSubmit`	When you submit a prompt, before Claude processes it
`PreToolUse`	Before a tool call executes — can block it
`PermissionRequest`	When a permission dialog appears
`PostToolUse`	After a tool call succeeds
`PostToolUseFailure`	After a tool call fails
`Notification`	When Claude Code sends a notification
`SubagentStart`	When a subagent is spawned
`SubagentStop`	When a subagent finishes
`Stop`	When Claude finishes responding
`TeammateIdle`	When an agent team teammate is about to go idle
`TaskCompleted`	When a task is being marked as completed
`InstructionsLoaded`	When a CLAUDE.md or `.claude/rules/*.md` file is loaded into context
`ConfigChange`	When a configuration file changes during a session
`WorktreeCreate`	When a worktree is being created via `--worktree` or `isolation: "worktree"` — replaces default git behavior
`WorktreeRemove`	When a worktree is being removed (at session exit or subagent finish)
`PreCompact`	Before context compaction
`PostCompact`	After context compaction completes
`Elicitation`	When an MCP server requests user input during a tool call
`ElicitationResult`	After a user responds to an MCP elicitation, before the response goes back to the server
`SessionEnd`	When a session terminates

Hook Handler Types

Each hook specifies a type field:

Type	Description
`command`	Run a shell command. Input arrives on stdin as JSON. Output via stdout/stderr/exit code.
`http`	POST event data to a URL. Response body uses same JSON format as command output.
`prompt`	Single-turn LLM evaluation (Haiku by default). Returns `{"ok": true/false, "reason": "..."}`.
`agent`	Multi-turn subagent with tool access. Up to 50 tool-use turns, 60s default timeout. Same ok/reason format.

Matcher Patterns

Matchers are regex patterns that filter when a hook fires. Each event type matches on a specific field:

Event	Matches on	Example values
`PreToolUse`, `PostToolUse`, `PostToolUseFailure`, `PermissionRequest`	tool name	`Bash`, `Edit\|Write`, `mcp__.*`
`SessionStart`	how session started	`startup`, `resume`, `clear`, `compact`
`SessionEnd`	why session ended	`clear`, `logout`, `prompt_input_exit`, `bypass_permissions_disabled`, `other`
`Notification`	notification type	`permission_prompt`, `idle_prompt`, `auth_success`, `elicitation_dialog`
`SubagentStart` / `SubagentStop`	agent type	`Bash`, `Explore`, `Plan`, or custom names
`PreCompact`	trigger cause	`manual`, `auto`
`ConfigChange`	config source	`user_settings`, `project_settings`, `local_settings`, `policy_settings`, `skills`
`UserPromptSubmit`, `Stop`, `TeammateIdle`, `TaskCompleted`, `WorktreeCreate`, `WorktreeRemove`	—	no matcher support; always fires

Getting Started

If you’re building from scratch, layer things in this order. Each layer should solve a real friction point you’ve experienced - don’t pre-optimize.

CLAUDE.md - Get the basics: project structure, commands, constraints. Keep it lean. Use MUST/MUST NOT language for rules you care about.
Sandboxing - Run /sandbox and enable it. Immediate reduction in permission noise.
Two hooks - Auto-format (PostToolUse) and notifications (Stop/Notification). Ten minutes of setup. Hours of saved babysitting.
Your first skill - Pick your most common workflow. Package it with a template file and at least one script. Use XML for the workflow structure. Write a specific description with WHEN/WHEN NOT conditions.
LSP - If you’re working in a typed language, install the language server and the LSP plugin. Claude’s code navigation goes from text-search to structural.
Shared steps - As you build more skills, extract common procedures into _shared/. Reference them from other skills. Attach scripts instead of documenting procedures.
Path-scoped rules - As CLAUDE.md grows, move directory-specific guidance into .claude/rules/*.md with frontmatter globs.
Subagents - Only when you genuinely need isolated context or restricted tools. Feature-specific, not generic-persona.
Memory/Tasks - When your projects span multiple sessions and you need continuity. Bootstrap deliberately with the structured recovery pattern.

The difference between frustration and productivity isn’t the tool - it’s how you layer the pieces. Start with what hurts most, automate it, and build from there.

Happy building, everyone.

Mentioned in this post

Claude Code Hooks Documentation - Official Anthropic docs for hook events and configuration
Claude Code Skills Documentation - Official docs for creating and managing skills
Anthropic’s Sandboxing Engineering Blog - “Beyond permission prompts: making Claude Code more secure and autonomous”
shanraisshan/claude-code-best-practice - Community-maintained best practices repository
Shrivu Shankar - “How I Use Every Claude Code Feature” - The post that framed skills as a “scripting” model
Simon Willison - “Claude Skills are awesome, maybe a bigger deal than MCP” - The post that started the skills-vs-MCP conversation
Lakshmi Narasimhan - “Claude Code Hooks: The Feature You’re Ignoring” - Practical hooks guide
Scott Spence - Measuring Claude Code Skill Activation - Reliability testing for skill auto-invocation
alexop.dev - Claude Code Customization Guide - Separation of concerns between CLAUDE.md and skills
alexop.dev - “Stop Bloating Your CLAUDE.md” - Progressive disclosure for AI coding tools
Piebald-AI/claude-code-lsps - Community LSP plugin marketplace covering 17+ languages
Claude Code Memory Documentation - Managing Claude’s persistent memory
Anthropic Memory Tool (API) - Memory tool for long-running agentic workflows
brennacodes/brenna-plugs - My Claude Code plugin marketplace (the one that taught me all of this)

Explorer

Designing for Claude Code

Core Philosophy

1. Determinism over suggestion

2. Progressive disclosure over upfront loading

3. Simplicity compounds; complexity decays

The Decision Matrix

1. Does this need to happen every single time, no exceptions?

Hook types by complexity

The compaction survival pattern

2. Is this reusable knowledge, a workflow, or a procedure Claude should follow?

Skills can restrict tools too

Skill design principles

Writing effective skill descriptions

3. Does this task need isolated context, restricted tools, or a specialized persona?

Use subagents sparingly

Subagent memory

4. Is this project-level context that Claude needs in every session?

5. Does this need to persist across sessions?

The structured recovery pattern

6. Does Claude need real-time code intelligence for this project?

Composable Workflow Patterns

Pattern 1: Skill + Hook (Guided Execution with Guardrails)

Pattern 2: Skill + Subagent (Divide and Verify)

Pattern 3: Context-Dependent Branching (XML-Structured)

Pattern 4: Shared Workflow Steps (The Composability Pattern)

Pattern 5: Hooks as Autonomous Workflow Orchestrators

Sandboxing Integration

When to enable sandboxing

Sandboxing + Hooks synergy

Things that need to escape the sandbox

Anti-Patterns to Avoid

1. The “God Skill”

2. The Agent Army

3. CLAUDE.md as Documentation

4. Prompting What Should Be Hooked

5. Documenting Procedures Instead of Attaching Scripts

6. Putting Output Templates Inline in SKILL.md

7. Ignoring Compaction and Context Rot

8. Overusing MCP Where a Skill Would Suffice

Quick Reference: “Where Does This Go?”

On Hook Events

Hook Handler Types

Matcher Patterns

Getting Started

Mentioned in this post

Recent Notes

Graph View

Table of Contents