Claude Code in Production: What Actually Works
April 2026 · Claude Code v2.1.89 · Claude 4.6 family
I’ve been running Claude Code at scale across 30+ domains, a 7-stage content pipeline, and a fleet of automated agents that generates and deploys sites. The setup that came out the other end of that isn’t what I expected when I started.
The difference between burning $50/day and getting real leverage isn’t which model you pick. It’s what loads at startup, what you isolate in subagents, and where advisory rules end and hard enforcement begins. This is a breakdown of what I actually use.
Not a tips post. This is for people running systems on Claude Code, not one-off prompts.
The mental model
Three things determine whether Claude is cheap or expensive:
What loads at startup. CLAUDE.md runs every session. Skills are on-demand (~50 tokens at startup, full content only when invoked). Path-scoped rules only load when Claude touches matching files. Auto memory loads the index every session. Get this wrong and you’re either paying for context you never use, or missing context you need.
How fast the context fills. The window is 1M tokens on Opus/Sonnet 4.6. That sounds like a lot. The problem is quality degrades at 20-40% full due to attention dilution, not 80-90% like you’d expect. Your effective budget is much smaller than the headline number. I set CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=60 so compaction kicks in before things go sideways. I learned this the hard way watching generation quality silently degrade mid-pipeline before I understood where the degradation actually started.
Where research happens. File reads from main context are expensive. Delegating exploration to subagents keeps your main session clean: subagents return summaries, not raw file contents. This is the single highest-leverage habit to build.
The rule I follow: CLAUDE.md for things Claude gets wrong without it (under 80 lines). Skills for domain knowledge, loaded on demand. Hooks for things that must always happen. Subagents for anything that reads lots of files.
One thing beats everything else: give Claude a verification method. Tests, screenshots, expected outputs. Without a concrete pass/fail signal, Claude is guessing. With one, it iterates to correctness.
Model lineup (April 2026)
| Model | Cost per MTok in/out | Context |
|---|---|---|
| Opus 4.6 | $5 / $25 | 1M |
| Sonnet 4.6 | $3 / $15 | 1M |
| Haiku 4.5 | $1 / $5 | 200K |
| Opus Fast (preview) | $30 / $150 | 1M |
In practice:
- Haiku: subagent exploration, file search, simple lookups. Cheap enough to throw at anything. Set
CLAUDE_CODE_SUBAGENT_MODEL=haikuin settings and your exploration subagents run at ~60% cheaper than default. - Sonnet: 80% of my coding tasks. Default model. Implementation, tests, refactors.
- Opus: complex architecture, multi-file refactors, hard debugging. I don’t reach for it by default.
- opusplan: my favorite combo. Opus handles planning, Sonnet handles execution. Best cost/quality ratio I’ve found for complex features. Switch to it with
/model opusplan.
One thing worth flagging: ultrathink. It was deprecated in January 2026; Anthropic’s stated reason was that extended thinking is now always-on so the keyword was redundant. I noticed quality drops on complex generation tasks. Turns out enough other people did too, because Anthropic restored it in v2.1.68 (March 2026) after user backlash. Extended thinking is still always-on, but ultrathink forces a single max-effort turn in a way the defaults don’t reliably deliver. For sustained high-effort work across a full session, effort levels are the better lever:
/effort high # persist for this session
--effort high # CLI flag at startup
Set CLAUDE_CODE_EFFORT_LEVEL=medium in settings as your default and override per session when needed.
The files that matter
The directory structure looks complicated on paper. There are really only six things you care about:
your-project/
├── CLAUDE.md # Loaded every session, keep under 80 lines
├── CLAUDE.local.md # Personal overrides, gitignored
│
├── .claude/
│ ├── settings.json # Permissions + hooks, committed
│ ├── rules/ # Path-scoped rules (loads only when relevant)
│ └── skills/ # On-demand workflow packages
│
└── .mcp.json # MCP server configs, committed
Global personal config lives at ~/.claude/CLAUDE.md (under 15 lines, your preferences that apply everywhere) and ~/.claude/settings.json for global permissions.
CLAUDE.md: be ruthless
People treat CLAUDE.md as a dumping ground for everything they want Claude to know. That’s the mistake. Every line has a cost.
My rule: if removing this line wouldn’t cause Claude to make mistakes, delete it.
Under 80 lines for project files. Under 15 for ~/.claude/CLAUDE.md. If Claude already does something correctly without an instruction, that line is dead weight: it loads every session and does nothing.
Use HTML comments <!-- --> for maintainer notes; they’re stripped before injection. Use @path/to/file to defer large content into separate files that load on-demand. Never put code snippets in CLAUDE.md; use file:line references instead.
A solid CLAUDE.md:
# Project: Site Factory
## Stack
- TypeScript 5.x, Node 22, Next.js 15, Prisma + PostgreSQL
- Testing: Vitest, Playwright for E2E
- Infra: Docker, Kubernetes, OpenTofu
## Commands
- `pnpm dev` dev server
- `pnpm test` Vitest
- `pnpm test:e2e` Playwright
- `pnpm lint` + `pnpm typecheck` quality gates
- `pnpm build` production build
## Code Style (only what Claude gets wrong)
- ES modules only (NEVER require())
- Use `type` over `interface` unless extending
- Errors: AppError from src/lib/errors.ts
- Logging: logger from src/lib/logger.ts, NEVER console.log
## Workflow
- ALWAYS run typecheck after changes
- NEVER modify src/generated/ (auto-generated)
- NEVER add dependencies without explicit approval
## Architecture (@docs/ARCHITECTURE.md for details)
- src/api/ route handlers
- src/services/ business logic
- src/repositories/ Prisma data access
No philosophy, no explanations, no padding. Just the specific things Claude gets wrong when an instruction is absent.
.claudeignore: free 30-40% savings
Create this file at project root:
node_modules/
.next/
dist/
build/
coverage/
*.min.js
Adding .next/ alone cuts context 30-40% in Next.js projects. This is one of the easiest wins and most people skip it entirely.
settings.json: the real control surface
The settings file handles permissions, hooks, and env vars. Four scopes, highest priority wins:
- Managed: enterprise policy, can’t be overridden
- User (
~/.claude/settings.json): personal global - Project (
.claude/settings.json): team-shared, committed - Local (
.claude/settings.local.json): personal project overrides, gitignored
My project settings template:
{
"$schema": "https://json.schemastore.org/claude-code-settings.json",
"model": "sonnet",
"env": {
"CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "60",
"CLAUDE_CODE_SUBAGENT_MODEL": "haiku"
},
"permissions": {
"allow": [
"Bash(pnpm run lint)",
"Bash(pnpm run test *)",
"Bash(pnpm run typecheck)",
"Bash(pnpm run build)",
"Bash(git *)",
"Read(docs/**)",
"Read(src/**)"
],
"deny": [
"Read(.env)",
"Read(.env.*)",
"Bash(rm -rf *)",
"Bash(curl * | sh *)"
]
}
}
Hooks: the most underused feature
Here’s what people get wrong about CLAUDE.md: it’s advisory. Claude follows it about 80% of the time. If something must always happen (no exceptions, no interpretation), use a hook.
Hooks are deterministic. They execute on every matching event regardless of what Claude decided. In my content pipeline, a hook runs a quality gate after every file write. Not advisory, not best-effort. Every write. That’s the difference between “Claude usually formats correctly” and “every file that leaves this pipeline is formatted correctly.”
There are 26 hook events covering the full lifecycle: SessionStart, SessionEnd, PreToolUse, PostToolUse, UserPromptSubmit, SubagentStart, SubagentStop, TaskCreated, PreCompact, WorktreeCreate, and more.
The response format (v2+) uses a decision field:
{
"decision": "block",
"reason": "Cannot edit on main branch. Create a feature branch first.",
"suppressOutput": false
}
For PreToolUse hooks, decisions can be "block", "approve", or "defer" (v2.1.89+; defer kicks the decision to the user instead of blocking). For PermissionDenied hooks, "retry" is now available to trigger a retry attempt.
Three hooks I run on everything
Auto-format on save (never think about formatting again):
{
"PostToolUse": [{
"matcher": "Edit|Write",
"hooks": [{
"type": "command",
"command": "FILE=$(echo $TOOL_INPUT | jq -r '.file_path // empty'); [ -z \"$FILE\" ] && exit 0; case \"$FILE\" in *.ts|*.tsx|*.js|*.jsx) npx prettier --write \"$FILE\" 2>/dev/null ;; esac; exit 0",
"timeout": 10
}]
}]
}
Block edits on main (prevents the “oops, just pushed to main” moment):
{
"PreToolUse": [{
"matcher": "Edit|Write",
"hooks": [{
"type": "command",
"command": "[ \"$(git branch --show-current)\" != \"main\" ] || { echo '{\"decision\": \"block\", \"reason\": \"Create a feature branch first.\"}' >&2; exit 2; }",
"timeout": 5
}]
}]
}
Block secret file reads (defense in depth):
{
"PreToolUse": [{
"matcher": "Read",
"hooks": [{
"type": "command",
"command": "FILE=$(echo $TOOL_INPUT | jq -r '.file_path // empty'); case \"$FILE\" in *.pem|*.key|*credentials*|*.env*) echo '{\"decision\":\"block\",\"reason\":\"Cannot read secret files\"}' >&2; exit 2;; esac; exit 0",
"timeout": 5
}]
}]
}
As of v2.1.85+, hooks support a conditional "if" field: you can fire a hook only when another permission matches:
{
"PreToolUse": [{
"matcher": "Bash",
"if": "Bash(npm *)",
"hooks": [{ "type": "command", "command": "echo 'npm command detected'" }]
}]
}
The hooks running on my pipeline aren’t demonstrations. They’re the enforcement layer that makes CLAUDE.md instructions irrelevant for anything critical. CLAUDE.md says “don’t edit on main.” A hook guarantees it.
Path-scoped rules: keep CLAUDE.md lean
The pattern that makes large projects manageable: detailed conventions in .claude/rules/ files with path matchers. They load only when Claude works on matching files.
---
# .claude/rules/api-rules.md
paths:
- "src/api/**/*.ts"
- "src/handlers/**/*.ts"
---
# API Design Rules
- All handlers return { data, error } shape
- Use zod for request body validation
- Never expose internal error details to clients
- Rate limiting middleware: src/middleware/rateLimit.ts
- Auth middleware extracts user from JWT → req.user
---
# .claude/rules/testing-rules.md
paths:
- "tests/**"
- "**/*.test.ts"
---
# Testing Conventions
- describe/it blocks, AAA pattern
- Mock only external boundaries (NEVER internal modules)
- Database tests use test container in tests/setup.ts
- Each test mirrors its source: src/services/auth.ts → tests/services/auth.test.ts
Rules live outside CLAUDE.md, load only when relevant, and don’t inflate startup context.
A few things I learned the hard way:
- Dense bullets beat explanatory prose. 10 imperative bullets cost the same tokens as 3 paragraphs but are faster for Claude to extract.
- Merge tiny rule files. A 26-line rule file takes a file load slot and adds overhead. File count, not line count, determines load cost.
- End rules with a verification command.
"After editing .tf files, run tofu validate. Don't re-read the file."Way better than implied re-reads. - Watch the pointer trap. If a rule file is already always-loaded, a CLAUDE.md line saying “see rule-file.md” double-loads the content. Pointers are only useful when the target isn’t already in context.
Skills: progressive disclosure at scale
Skills are the biggest token-saving mechanism I’ve found for complex projects. Claude gets just the name + description at startup (~50 tokens per skill). The full SKILL.md only loads when the skill is invoked.
Startup: name + description (~50 tokens per skill)
On invoke: SKILL.md body loaded (~500-2000 tokens)
On demand: references/ scripts/ (only what's needed)
Skill frontmatter controls a lot:
| Field | What it does |
|---|---|
model | Override model just for this skill |
effort | Effort level override |
context: fork | Run in isolated subagent context |
agent | Which subagent type to use with context: fork |
allowed-tools | Tools permitted without approval when skill is active |
paths | Auto-activate when working on matching files |
argument-hint | Autocomplete hint, e.g. [issue-number] |
disable-model-invocation: true | Only the user can invoke, not Claude |
user-invocable: false | Hidden from / menu (Claude-only) |
context: fork is the field most people miss. It runs the skill in a completely isolated subagent context; the current session’s context never gets polluted by the skill’s work. Pair it with agent: Explore for read-heavy skills and your main context stays clean. Use disable-model-invocation: true for skills that should only be invoked deliberately (deploy pipelines, for example). Claude will never decide to run them autonomously.
The !`command` syntax in skill bodies is underrated. It runs shell commands before the skill context goes to Claude:
---
name: pr-summary
context: fork
agent: Explore
allowed-tools: Bash(gh *)
---
PR diff: !`gh pr diff`
PR comments: !`gh pr view --comments`
Claude gets the live PR content without you having to manually inject it.
Built-in skills worth knowing:
/batch <instruction>: parallel execution, one agent per unit in isolated worktrees, each opens a PR/simplify [focus]: three parallel review agents, aggregate findings, apply fixes/loop [interval] <prompt>: recurring prompt execution within the session
Subagents: the architecture that changes everything
The #1 token-saving technique. Subagents run in a separate context window. Your main session gets only the summary.
This matters because research tasks (reading files, understanding code structure, checking patterns) can touch 20-50 files. If that happens in your main context, those file reads accumulate and push you toward the 20-40% degradation zone fast. In a subagent, those reads are isolated. They never hit your main context.
Key subagent frontmatter fields:
---
name: researcher
description: Explores codebase to answer questions. Use PROACTIVELY when you need to understand how something works before implementing changes.
model: haiku # cheap for exploration
tools: Read, Grep, Glob # read-only
---
You are a codebase researcher. Your job is to explore and summarize.
Rules:
- Read only what's needed to answer the specific question
- Return a concise summary (under 500 words)
- Include exact file paths and line numbers for key findings
- Do NOT suggest changes. Just report findings
The description field drives when Claude delegates. Make it explicit: “Use PROACTIVELY when…” not just “for research tasks.”
Model resolution when there are competing settings:
CLAUDE_CODE_SUBAGENT_MODELenv var (highest priority)- Per-invocation
modelparameter - Subagent definition
modelfrontmatter - Main conversation’s model
Setting CLAUDE_CODE_SUBAGENT_MODEL=haiku globally gives you cheap exploration subagents without configuring each one individually.
The isolation: worktree field is powerful for implementation subagents: each agent gets its own git worktree, so parallel agents can’t conflict on file writes.
Agent teams: experimental but real
Enable with CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1.
This isn’t just parallel subagents. It’s multiple Claude Code instances with their own context windows that can message each other directly. One session is the team lead; others are teammates who self-claim tasks from a shared task list.
The useful distinction:
- Subagents: investigation, review, one-shot tasks. Report back to main agent only.
- Agent Teams: coordinated parallel implementation; teammates message each other directly.
The Anthropic engineering blog has a case study on building a C compiler with 16 parallel agents using this. The pattern works. Each agent handles an independent piece, team lead coordinates merges.
Limitations worth knowing: no session resumption with in-process teammates, one team per session, and split panes mode (via tmux or iTerm2) doesn’t work in VS Code terminal, Windows Terminal, or Ghostty.
Git worktrees: parallel work without merge hell
Claude Code has native worktree support. Each worktree is an isolated copy of your repo, perfect for parallel agents working on independent features.
claude --worktree feature-auth # Creates .claude/worktrees/feature-auth/
claude --worktree feature-auth --tmux # Opens in tmux pane
claude -w # Auto-generated worktree name
The .worktreeinclude file copies gitignored files (like .env) to new worktrees automatically. worktree.sparsePaths handles large monorepos with sparse checkout.
The /batch skill automates the whole pattern: one agent per work unit, each in its own worktree, each opens its own PR. No coordination needed.
Workflow patterns that changed how I work
The annotation cycle
This is the pattern I reach for on any non-trivial feature. Credit to Boris Tane for writing it up clearly:
- Generate a detailed
plan.mdwith code snippets and file paths - Add inline notes directly into the plan
- “Address all the notes and update the document. Don’t implement yet.”
- Repeat 1-6 rounds until the plan is solid
- “Implement it all. Mark tasks completed. Continuously run typecheck.”
The key insight: annotation catches architectural problems before a line of code is written. Refactoring a plan costs nothing. Refactoring a half-implemented feature is expensive.
The two-correction rule
If you’ve corrected Claude on the same issue twice in a session, /clear and start fresh. A clean session with a better prompt consistently outperforms a long, correction-polluted session. This is counterintuitive: it feels like you’re losing progress. You’re not. You’re avoiding the compounding cost of a degraded context.
The interview pattern for requirements
Before building anything complex:
I want to build [brief description]. Interview me in detail using the AskUserQuestion tool.
Keep interviewing until we've covered everything, then write a complete spec to SPEC.md.
Start a fresh session for implementation. This separation prevents Claude from building assumptions into the code before you’ve articulated what you actually want.
The writer/reviewer pattern
Session A writes code. Session B (fresh context) reviews. Different context, different perspective. It catches the self-review blind spots that single-pass misses, especially valuable for security-sensitive code.
Fan-out for bulk operations
When you need to apply a pattern across many files:
for file in src/api/*.ts; do
claude -p "add AppError handling to $file" --allowedTools Edit,Read
done
Or use /batch for true parallel execution with worktree isolation.
Context management habits
- Check with
/contextbefore long sessions /compactat logical breakpoints (not as a last resort)- Prefer
/clearover multiple compactions; fresh sessions produce better code /btwfor quick questions without growing the main contextEsc+Escto rewind to the previous checkpoint
One source of context bloat that’s easy to miss: operational docs that accumulate history (STATUS.md, PLAN.md). Current state only, history to an archived file. The archive goes in .claudeignore. A 487-line STATUS.md costs 10K tokens on every session that touches it. At 60 lines it’s 1.2K. That gap is real and it compounds.
Prompt efficiency
Bad: "make this better"
Good: "add error handling to src/api/users.ts, wrapping the prisma call in
try/catch using AppError from src/lib/errors.ts"
Bad: "fix the auth"
Good: "the JWT refresh in src/services/auth.ts throws on expired tokens
instead of refreshing. Fix it using the refreshToken helper."
Vague prompts force Claude to explore first. Specific prompts go straight to the change.
Remote control, scheduling, and voice
A few features I use regularly that don’t get enough coverage:
Remote control: connect from phone or another device while a session runs on your machine. claude --remote-control "My Project" or /remote-control inside a session. Accessible at claude.ai/code or the iOS/Android app. Outbound HTTPS only, no inbound ports to expose.
Cloud scheduled tasks: runs on Anthropic infrastructure, works when your machine is off. /schedule to configure from CLI or via claude.ai/code/scheduled. Minimum 1-hour interval, clones GitHub repos fresh each run, can only push to claude/-prefixed branches by default.
Voice mode: Push-to-Talk input, 20 languages. Works exactly like typing; the transcribed prompt goes through the normal agent loop. I use this when I want to think out loud while coding. Available on Pro/Max.
Computer use: Claude takes screenshots and controls your desktop. No setup on Pro/Max plans (March 2026). Useful for visual feedback loops: UI testing, browser automation, anything where the success criterion is visual.
MCP: use it, but selectively
MCP servers add tools Claude can call. Powerful, but expensive: every tool definition loads into context at startup.
MCP Tool Search is the feature that makes large MCP setups viable. It defers tool definitions until needed, cutting context from ~72K tokens to ~8.7K on a 20-tool server (85% reduction). Enabled by default. Force-enable with ENABLE_TOOL_SEARCH=true when using ANTHROPIC_BASE_URL.
The servers I actually use:
# Real-time, version-specific library docs (no more API hallucinations)
claude mcp add --transport sse context7 https://mcp.context7.com/sse
# Semantic code search across 30+ languages
claude mcp add serena -- uvx --from git+https://github.com/oraios/serena serena start-mcp-server --context ide-assistant --project $(pwd)
# Structured reasoning chains
claude mcp add sequential-thinking -s local -- npx -y @modelcontextprotocol/server-sequential-thinking
A few things that took me time to figure out:
Scope heavy MCP servers to subagents. Define mcpServers inline in subagent definitions instead of globally. Tool descriptions from large servers (20+ tools) add 1,500-2,000 tokens to main context at startup, permanently. If you only need a database MCP inside one subagent, keep it there:
---
name: db-analyst
description: Runs database queries and returns summaries
mcpServers:
postgres:
command: postgres-mcp
args: ["--connection-string", "${DB_URL}"]
---
Trim tool docstrings in your own MCP servers. FastMCP and similar frameworks serialize full docstrings into the tool manifest. “Optional job ID to scope results to a specific organization.” → “Org/job scope.” Same semantics, far fewer tokens. I validated 400-600 token savings per session on a 24-tool server.
Disconnect unused servers. /mcp shows what’s loaded. Each unused server adds 5-10% context overhead even if no tools are called. Check it periodically.
Memory system
Two separate mechanisms, different purposes:
CLAUDE.md: you write it, Claude reads it every session. Priority order: managed (enterprise) > user (~/.claude/) > project > subdirectory. Supports @path/to/file imports for on-demand content (max 5 hops deep).
Auto memory: Claude writes notes to itself across sessions. Enabled by default since v2.1.59. Storage at ~/.claude/projects/<project>/memory/MEMORY.md + topic files. First 200 lines / 25KB of MEMORY.md loads per session; topic files load on demand.
The thing that tripped me up: memory description specificity drives recall. Generic descriptions never get semantically matched:
Bad: "VPN prefix thing"
Good: "Do not recommend VPC CNI prefix delegation (ENABLE_PREFIX_DELEGATION);
caused prod incident"
The specific description fires when the topic comes up. The generic one doesn’t.
What to delete from memory:
- Historical audit summaries once patterns are in CLAUDE.md
- Planning memories after the work ships
- Entries that duplicate what’s in rules files
- Stale project state for completed work
Type-appropriate decay:
| Memory type | Decay | Action |
|---|---|---|
user | Very slow | Keep indefinitely |
feedback | Slow | Update when guidance changes |
reference | Medium | Update when tooling changes |
project | Fast (sprint-level) | Delete after work ships |
Token cost reduction: the full list
Most of these are set-and-forget. You configure them once and they run silently on every session. The ones labeled “Prompt habit” require discipline but they’re free.
| Technique | Savings | Effort |
|---|---|---|
.claudeignore for node_modules/, .next/, dist/ | 30-40% context | 1 file |
| Batch API (async workloads) | 50% off all tokens | Use /v1/messages/batches |
| Prompt caching for repeated system prompts | 90% off cache reads | Auto with Anthropic SDK |
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=60 | Prevents degradation | 1 setting |
CLAUDE_CODE_SUBAGENT_MODEL=haiku | ~60% on exploration | 1 setting |
| MCP Tool Search | 85% MCP context reduction | Default enabled |
| Sonnet default, Opus only when needed | ~60% vs all-Opus | Discipline |
opusplan mode | Best cost/quality | /model opusplan |
| Skills instead of fat CLAUDE.md | ~80% context reduction | Medium effort |
| Path-scoped rules | Load only relevant rules | Medium effort |
| Subagents for research | Isolates file reads | Prompt habit |
/clear between unrelated tasks | Free reset | Prompt habit |
/btw for quick questions | Zero context growth | Prompt habit |
Effort level low for simple tasks | Reduces thinking cost | /effort low |
| CLAUDE.md under 80 lines | Avoids saturation | Discipline |
max_tokens right-sizing | Up to 50% output savings | 150 for classify, 512 for JSON |
Compact JSON in API calls (no indent=2) | 30-40% per payload | Remove indent= |
| Anomaly-only summaries | 50-80% on data context | Send exceptions, not normals |
| SQL data pre-filtering | Up to 90% on heavy queries | Push truncation to DB |
| Merge tiny rule files | Reduces file load overhead | Combine <30-line files |
Combined realistic savings: 60-80% reduction in total token spend. These aren’t marginal. .claudeignore alone is 30% for free. The Batch API 50% discount applies to everything: input and output tokens, all models. If you have any async workloads, you’re paying double if you’re not using it.
Operational doc architecture: the invisible cost
This one took me months to notice. Six months into running multi-repo planning environments, I looked at what was loading at session start and found a STATUS.md at 487 lines that I’d been appending build notes to for weeks. Every session was loading 10K tokens of history before doing anything useful. Trimmed it to 60 lines (~1.2K tokens). That’s 8.8K tokens saved on every session that touches it, before I’ve typed a word.
The antipattern: appending build history, session notes, and changelog entries inline into STATUS.md or PLAN.md. Creates 400+ line docs where 3 lines change each session.
The fix: Current state only (~60 lines). History goes to a dedicated archive file. The archive goes in .claudeignore.
The rule: If a document’s first section isn’t the current state, it has drifted into changelog territory.
Related: "Update STATUS.md" is ambiguous: Claude reads the whole file to understand structure before updating. Naming exact sections is better:
Bad CLAUDE.md: "After each session, update STATUS.md and commit."
Good CLAUDE.md: "After meaningful work: update STATUS.md 'Current Step' +
'Upcoming Steps' + append one line to 'Recent Completed'
only (not the whole file)."
Large index files have the same problem. A research/INDEX.md of 237 lines / 38KB often costs more to load than the file it points to. Fix: split into topic sub-indexes, or just use .claudeignore on the index and grep by filename. Descriptive filenames (anti-fingerprinting-deep-research-2026.md) beat prose index descriptions.
Programmatic API: patterns from production
My content pipeline calls Claude programmatically at every stage: research (Opus + web search), generation (Opus with 22-tier quality retries), factcheck (Sonnet), translation (Sonnet, 11 locales). At 30+ sites, $40-100/site in API costs, every optimization compounds.
Compact JSON serialization. json.dumps(data, indent=2) adds 30-40% token overhead. Whitespace formatting is for human readers, not models.
# 1,220 tokens
json.dumps(config, indent=2)
# ~581 tokens
json.dumps(config, separators=(",", ":"))
Right-size max_tokens. Over-allocating wastes queue allocation:
| Task | max_tokens |
|---|---|
| Binary classification | 150 |
| Structured JSON | 512 |
| Multi-step reasoning | 1024 |
| Free-form generation | 2048+ |
Always use JSON schema. Freeform responses waste tokens on preamble, markdown fences, prose wrappers. Schema enforcement is free. Use it on every structured call.
Model tier by task. Defaulting to Sonnet everywhere leaves 50-60% cost reduction on the table for classification-heavy workloads. Haiku handles binary classification fine.
Anomaly-only data summaries. Don’t send healthy/normal states to the model. “19 modules: 17 running. Paused: X, Y. Error: Z” beats listing all 19 with status.
Fuzzy dedup caching. Exact SHA256 cache keys miss paraphrases, common in social media and ticket descriptions. Levenshtein at 0.85 similarity gets 20-30% more cache hits:
from rapidfuzz import fuzz
for cached_key, cached_text in cache.items():
if fuzz.ratio(text, cached_text) >= 85:
return cache[cached_key]
Token measurement. Claude CLI and API often under-report input tokens due to caching and compression. For budget enforcement: max(reported_tokens, estimated_tokens) where estimated = char_count / 4.
Security: what I’ve actually run into
Most Claude Code security writing is either paranoid boilerplate or dismissive. The real picture is narrower: a handful of specific CVEs worth knowing, one structural problem with AI-generated code that requires process discipline, and one easy configuration mistake.
-
CVE-2025-59536 (CVSS 8.7): Hook and MCP init commands execute before the trust dialog is accepted. Code runs before user consent. Don’t run Claude Code with
--dangerously-skip-permissionsin directories you don’t control. Fixed in v1.0.111. -
CVE-2025-55284 (CVSS 7.1): API key exfiltration via DNS; prompt injection abuses allowlisted
dig/nslookupto exfiltrate secrets. Fixed in v1.0.4. -
CVE-2026-21852 (CVSS 5.3): API key exfiltration via malicious
ANTHROPIC_BASE_URLconfig. Never accept that env var from external/untrusted config. Fixed in v2.0.65. -
CVE-2025-52882 (CVSS 8.8): VS Code/JetBrains extension WebSocket auth bypass; malicious websites can connect to the unauthenticated local WebSocket server, potential RCE. Fixed in VS Code extension v1.0.24+.
-
CVE-2026-33068 (CVSS 7.7): Workspace trust dialog bypass via repository-controlled settings files. Fixed in v2.1.53.
The structural thing: AI-generated code has higher security vulnerability rates than human-written code. Commonly cited at 1.5-2x (community studies, no single canonical source). Auth flows, payment logic, and data mutations need close human review, not just Claude review. Use hooks to block credential file access: deterministic, not advisory.
The stuff worth reading
Official:
- Claude Code docs (code.claude.com/docs): surprisingly good, kept current
- The changelog: worth reading each release, features ship frequently
Community:
- awesome-claude-code: 200+ repos, well-curated
- claude-code-system-prompts: all 110+ internal prompt strings updated per release. Useful for understanding how Claude Code actually thinks about things.
- claude-code-hooks-mastery: all hook events with UV single-file scripts. Good reference.
- ccusage: track token usage per session/day/model. Run this for a week and you’ll see exactly where money is going.
Articles:
- Boris Tane’s How I Use Claude Code: the annotation cycle method, worth reading in full
- Anthropic engineering’s Building a C Compiler: 16 parallel agents in practice
- Shipping Faster with Worktrees (incident.io): parallel agent workflow with real throughput numbers
The pitfall checklist
Things I’ve done wrong so you don’t have to:
-
Early misunderstanding compounds. Claude misunderstands the first premise, builds on it, you get a working implementation of the wrong thing. Fix: ruthless scoping, reviewed plan before any code.
-
Kitchen sink sessions. Mixing unrelated tasks in one session pollutes context and degrades quality on both. Just
/clear. -
Correcting the same issue twice. If you’ve made the same correction twice and Claude is still getting it wrong, the context is too polluted to recover. Start fresh.
-
CLAUDE.md as a linter. Advisory rules miss 20% of the time. Use hooks for anything that must always happen.
-
Too many MCP servers. Performance degrades as available tools increase. Check with
/mcpand disable what you’re not using. -
No verification method. Claude writes code without a pass/fail signal. Tests or expected outputs aren’t optional.
-
Delegating security-sensitive code completely. Auth, payments, data mutations need human eyes on the output.
-
Late compaction. Default 95% trigger is past the degradation zone. Set
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=60. -
Operational docs as changelogs. Appending session history inflates documents 10x over months. If the first section isn’t current state, the doc has drifted. Current state only; history to a
.claudeignore’d archive. -
Vague memory descriptions. “Team info” never fires semantically. “Do not use VPC CNI prefix delegation (caused prod outage)” always fires.
-
indent=2in API payloads. Costs 30-40% per call. Never. -
Pointers to always-loaded content.
"See rules/db.md for details"in CLAUDE.md when db.md is always-loaded just loads the content twice.
The compounding effect of getting this right is real. Each individual optimization is small. The combination of .claudeignore, path-scoped rules, Haiku subagents, properly scoped MCP, and compact JSON adds up to 60-80% reduction in token spend, with better output quality, not worse.
The setup is front-loaded. But it’s also the work that compounds: you do it once and it runs on every session that follows.
One thing I’d push back on in most Claude Code writing I see: too much focus on model selection. People obsess over Opus vs. Sonnet. In my experience that’s maybe the fourth or fifth most important thing. The bigger wins are structural: what context you load, what you isolate, whether your hooks actually enforce your rules. Get those right first, then worry about which model you’re on.
Everything else is iteration.