The Complete Course
Agentic Software Engineering

Speed without
discipline
is not engineering.

A three-act course on building real software with AI agents — the 5-phase framework, the skills that drive it, and the operating environment that keeps it safe.

Foyzul Karim  ·  linkedin.com/in/foyzul v1.0  ·  81 slides
The Hook02 / 81

Speed without discipline is not engineering.

AI coding agents make development 10× faster. Without process, that speed produces broken code, lost context, and runaway cost.

01

The Paradigm Shift

  • Agents reason across entire codebases
  • They write, test, fix, and review code
  • Not autocomplete — genuine collaborators
  • Speed increases 10× or more
Agents are teammates, not tools.
02

Quality at Risk

  • Code that looks correct but breaks things
  • Missed edge cases and side effects
  • Security flaws — credentials, injections
  • No architectural context or conventions
Without process, speed kills quality.
03

Cost Escalation

  • Every interaction costs tokens
  • Bloated sessions waste context
  • Inline scripts burn tokens per call
  • Undisciplined workflows multiply spend
Without hygiene, costs spiral.
This course gives you the 5-phase framework, guardrails, and cost controls to turn raw speed into disciplined engineering.
The Hook03 / 81

Course Architecture

A five-part progression: theory → practice → pain → cure → mastery.

1
The Hook
Understanding the framework and why process matters.
2
Skills Intro
Just enough skill scaffolding to start building.
3
The Build
A real project — phase by phase, every skill read before invoked.
4
Problems We Hit
Guardrails and UI automation — introduced after you feel the pain.
5
The Operating Environment
Settings, CLAUDE.md, rules, hooks, and advanced deep dives.
Theory → practice → pain → cure → mastery. A deliberate progression.
The Hook04 / 81

Why This Course Exists

Problem   “Vibe coding” — asking AI to write code without process — produces unmaintainable, untested code that breaks in production.

Four Pillars — What Makes This Course Different
01

5-Phase Framework

Process creates reproducibility. Reproducibility creates trust.
02

Extension Architecture

Skills + Hooks + MCP extend Claude's capabilities to your stack.
03

Live Demonstrations

Theory without practice is entertainment, not education.
04

Hands-On Learning

You build alongside. Every phase, every skill, every decision.
Process beats prompts. The 5-phase framework turns “vibe coding” into engineering.
The Hook05 / 81

Who This Course Is For

PersonaTheir ProblemWhat They Gain
Software Engineers AI writes code but they can't maintain it. Disciplined framework + quality gates.
Team Leads Team uses AI inconsistently. Repeatable standards + review process.
Senior Developers Need the full picture, not just code generation. Architecture + context + optimization.
Architects & Leads No standards exist for AI-assisted work. Design patterns others can follow.
Prerequisites    Terminal familiarity · Git · One programming language
No matter your role, agentic engineering needs process + tools + practice.
The Hook06 / 81

What Is an Agent?

Problem   One-shot prompts fail on complex tasks. Every LLM call has limited context, no planning, and no verification.

Concept   An Agent is an intelligent orchestrator that decomposes work and decides what to do next.

Agent
Orchestrator
decomposes  ·  decides  ·  verifies
Sequential Skills
Ordered, predictable workflows.
Parallel Subagents
Fast, scalable execution.
Agents decompose work into sequential skills or parallel subagents. The fundamental building block.
The Hook07 / 81

What Is a Large Language Model?

Definition
A neural network trained on vast text that learns to predict the next token — enabling it to generate text, write code, and reason.
The Breakthrough
Attention Is All You Need (2017) — the transformer architecture replaced sequential processing with parallel attention.
The Four Pillars
01
Pre-training
  • Trained on billions of documents
  • Absorbs patterns by exposure
  • Predicts patterns, not meaning
02
Next-Token Prediction
  • Predicts the next likely token
  • Adds it back, repeats
  • Simple mechanism, intelligent output
03
Emergent Capabilities
  • Small: completes sentences
  • Medium: summarizes paragraphs
  • Large: writes code, debugs, reasons
04
Transformer Arch
  • Looks at all words at once
  • Decides which parts matter
  • Long-range coherence
!Hard limits: finite context window (~200K tokens) · no persistent memory across sessions · KV-cache is working memory, not true memory.
The Hook08 / 81

How LLMs Power Agents

The LLM is the brain. The agent is the body.

01
Intent
You state what you want in natural language. The LLM parses words into a structured goal.
LLM parses → structured goal
02
Breakdown
The LLM decomposes the goal into steps. Route → schema → validation → query → errors.
LLM plans → executable steps
03
Execution
The LLM generates the actual code — functions, SQL migrations, test cases.
LLM builds → working code
04
Loopback
The LLM evaluates the result. Tests pass? Secure? Generates fix if needed.
LLM checks → adapts → repeats
Also known as: perceive-plan-act-observe · ReAct · OODA — same pattern, different names.
The quality of the loop is determined by the quality of the intent. Vague prompts → vague breakdown → vague code. The framework gives you structure. Specificity gives you precision.
The Hook09 / 81

The Agent as Orchestrator

From single-intent assistants to purpose-based orchestrators that coordinate parallel subagents.

Sequential Skills
1 · REQRequirements
2 · ARCHArchitecture
3 · TASKSTask Generation
4 · TDDTDD Implementation
5 · REVIEWReview & Merge
Ordered workflows — one step feeds the next.
Parallel Subagents
Subagent 1   Database schema design
Subagent 2   API parser logic
Subagent 3   Error handling layer
Subagent 4   Test suite generation
Orchestrator coordinates · integrates · verifies.
Scales with the problem: small task → 1 subagent · complex system → 10 subagents, coordinated, verified.
The Hook10 / 81

The Problem With “Vibe Coding”

The cycle
Prompt    Code    Hope    Ship    Bug    Repeat
Five Problems
01
No Requirements
“Build a dashboard” — but which metrics? Which users? You operate on assumptions.
02
No Architecture
Agent writes file by file with no blueprint. Every PR is a breaking change waiting to happen.
03
No Task Breakdown
“Build the backend” is not a task. It is a wish.
04
No Tests
Without tests, quality is unknown. Manual verification is a prayer, not a strategy.
05
No Review
Bugs and security flaws go straight into your codebase.
Vibe coding produces toys. The 5-phase framework turns speed into engineering.
The Hook11 / 81

The 5-Phase Agentic Framework

Each phase has one owner and produces one deliverable. No phase is optional.

#PhaseOwnerDeliverable
1Requirement EngineeringYouREQ document — what to build
2System ArchitectureYou + ClaudeARCH doc — data models, APIs, modules
3Task GenerationClaudeTask list with tests + scope + REQ trace
4TDD ImplementationClaudeWorking code + passing tests
5Review & MergeYou + ClaudeApproved PR — 16 parallel checks
Each phase has one owner and one deliverable. Discipline over speed.
The Hook12 / 81

Three Scenarios, One Framework

The same 5-phase pipeline — different entry points. The discipline never changes.

1 · REQRequirements
2 · ARCHArchitecture
3 · TASKSTask Generation
4 · TDDTDD Implementation
5 · REVIEWReview & Merge
ScenarioREQARCHTASKSTDDREVIEWNote
Greenfield 1 2 3 4 5 Blank slate — all 5 phases.
New Feature 2 3 4 5 System exists — skip REQ.
Bugfix RCA 3 4 5 Root-cause analysis first — understand why.
The entry point changes. The discipline doesn't. Same framework, different starting line.
Reflection13 / 81
Reflection

What Did We Learn?

  1. One-shot prompts fail — complex work needs agents.
  2. “Vibe coding” skips steps — 5 phases give AI a process.
  3. The framework adapts — same discipline, different entry points.
Process beats prompts. Next: the tools that make this process executable.
Skills Intro  ·  The CLI14 / 81

Claude Code: Your Agentic CLI

Problem  The 5-phase framework needs a tool — one that reads codebases, runs tests, and loads skills.

Concept  Claude Code is Anthropic's official CLI — your terminal interface to agentic engineering.

01
Reads Full Codebase
Analyzes architecture, dependencies, and conventions across your entire project.
02
Edits, Tests & Shell
Writes code, runs tests, executes bash — all within your existing workflow.
03
Skills via /commands
Type /req, /arch, /tdd — Claude loads your skill and follows your process.
04
MCP Integration
Connects to Playwright, databases, search — giving Claude eyes, arms, and memory.
Install Recommended
curl -fsSL https://claude.ai/install.sh | bash
Legacy (no auto-update): npm install -g @anthropic-ai/claude-code
Needs from you
Clear requirements · Architecture decisions · Well-scoped tasks · Your review
Surfaces
Terminal CLI  ·  VS Code extension  ·  JetBrains plugin  ·  Desktop app  ·  Web (claude.ai/code)  ·  iOS
Auth
Claude.ai subscription (Pro / Max / Teams / Enterprise)  ·  API key  ·  Enterprise SSO (AWS Bedrock · Google Vertex · Azure)
Claude Code is the interface. Your clarity is the input. Garbage in, garbage out.
Skills Intro  ·  Layout15 / 81

The .claude/ Directory

Problem  Skills, hooks, settings, rules — scattered. Without the map, you waste hours searching.

Project Layout
your-project/ ├── .claude/ │ ├── skills/ │ │ ├── req.md │ │ ├── arch.md │ │ ├── tdd.md │ │ └── review.md │ ├── agents/ │ │ ├── code-reviewer.md │ │ └── test-writer.md │ ├── hooks/ │ │ └── PreToolUse-*.sh │ ├── settings.json │ └── rules/ │ └── *.md ├── CLAUDE.md └── CLAUDE.local.md # gitignored
skills/
On-demand instructions via slash commands.
hooks/
Automatic guardrails at lifecycle events.
settings.json
Project config — committed, team standard.
rules/
Conditional instructions scoped to file paths.
agents/
Specialized subagents with focused tool sets. Invoked automatically when task matches description.
Global: ~/.claude/settings.json + ~/.claude/CLAUDE.md. Project wins. No magic.
Skills Intro  ·  Concept16 / 81

Skills: Automate Your Workflows

Problem  Every session starts from zero. You re-explain conventions — every single time.

Solution  Skills = markdown in .claude/skills/, loaded on demand via slash command.

01
You type /command
/req  /arch  /tdd  /review
Signal: switch to this workflow.
02
Claude loads skill
.claude/skills/req.md
Process, checklists, output format.
03
Claude follows it
Socratic → ARCH → Tasks → TDD → Review
Consistent. Repeatable. Every time.
User-invoked
You decide when. Not the agent.
On-demand
Zero context bloat until needed.
Version-controlled
In .claude/skills/ — shared, committed.
Skills are matched by LLM reasoning over the description field — not keyword matching. Write descriptions that articulate when to use the skill, not just what it does.
Advanced: Set context: fork in skill frontmatter to run the skill in an isolated subagent context — useful for skills that dispatch parallel agents.
A skill is a contract. Define once. Claude executes every time.
Skills Intro  ·  Hands-on17 / 81

Build Your First Skill

Goal  Create a skill Claude follows. YAML frontmatter + markdown body.

1
Create directory
mkdir -p .claude/skills
2
Write the file
.claude/skills/req.md
3
Define the process
Checklists + output format
4
Invoke it
/req
.claude/skills/req.md
--- name: req description: Socratic requirements color: green --- ## Socratic Interview Checklist - [ ] Surface assumptions - [ ] Identify edge cases - [ ] Define boundaries - [ ] WHAT not HOW
Frontmatter = label. Body = recipe.
Good skill: clear checklists · defined outputs · discipline rules. You design. Claude executes.
Skills Intro  ·  Cost18 / 81

Save Tokens with External Scripts

Problem  Embedding bash in skills bloats context. Those tokens burn — again and again.

Pattern  Extract logic to scripts. Reference from the skill. Bash loads externally.

DON'T Inline bash
## Run Tests cd src && npm run build && npx jest --coverage ... --testPathPattern='...' && npx eslint . ... --format junit -o ...
~850 tokens per load — loaded into context every time.
DO Reference external script
## Run Tests Execute: ./scripts/run-tests.sh # Full script on disk: # scripts/run-tests.sh
~18 tokens per load — script loaded by Bash tool, not context.
The math: 850 × 20 = 17,000 wasted vs 18 × 20 = 360 tokens. 47× savings.
Skills Intro  ·  Design19 / 81

Skill Design Patterns

Problem  Skills can be ignored or bloated. Four patterns fix accuracy and tokens.

01
Accuracy
Lead with the Rule
LLMs over-attend to start/end. Buried instructions get lost.
Put the #1 rule at the TOP. Output format at the BOTTOM. Bookend.
02
Tokens
One Skill Per Workflow
Mega-skills load everything. Most is irrelevant.
req.md · arch.md · tdd.md · review.md. Load only what you need.
03
Tokens
Use @path, Never Paste
Pasting docs into skills bloats context permanently.
Write @docs/ARCH.md. Loaded only when needed.
04
Accuracy
Explicit Output Tags
Vague instructions produce vague results.
Use <output_format>, <acceptance_criteria>. Higher precision than prose.
05
Accuracy
Write Descriptions as Trigger Conditions
The description field is evaluated by LLM reasoning — not keyword matching — to decide when to auto-invoke the skill.
Write: "Use when the user asks for a structured requirements interview"   — not: "Requirements skill" (manual invocation only).
Design for the LLM's attention pattern. Accuracy is a decision.
Skills Intro  ·  Communication19A / 81

Prompting for Precision

Four rules that determine whether Claude follows your process or improvises.

01
Accuracy
Be Explicit, Not Conversational
"Refactor src/auth/middleware.ts to extract JWT validation into validateToken(), add expired-token handling, add unit tests"
beats: "clean up the auth code"
02
Context
Investigate Before Answering
Add to CLAUDE.md:
"ALWAYS read relevant files before proposing edits. Never speculate about code you have not opened."
Prevents hallucinated imports and phantom file paths.
03
Tokens
Only What's Requested
"Only make changes directly requested. Do not refactor adjacent code."
Prevents scope drift — the #1 cause of wasted tokens.
04
Accuracy
Match Thinking to Complexity
/fastroutine execution
thinkstandard tasks
think hardmulti-step reasoning
ultrathinkarchitectural decisions
The framework gives you structure. Specificity gives you precision. Process + precise prompts = disciplined engineering.
Skills Intro  ·  Agents19B / 81

Built-in Agents: Explore Before You Build

Two read-only agents that make architecture and exploration safer — shipped inside Claude Code, no setup required.

Explore Agent
Tools: Read, Grep, Glob (read-only — cannot write or edit)
Use for: Understanding an unfamiliar codebase before proposing changes.
How to invoke: Ask Claude to explore before architecting, or use the Agent tool with subagent_type="Explore".
When: Before writing an architecture document. Before touching legacy code.
Plan Agent
Tools: Read-only — cannot modify files.
Use for: Designing architecture grounded in actual codebase state.
How to invoke: Shift+Tab cycles permission modes → Plan mode. Or type /plan.
When: Before implementing a complex feature. Before a refactor.
Both agents are read-only by definition — they literally cannot write files. Safe to run on unfamiliar codebases before you understand them.
The best architecture is grounded architecture. Explore before you plan. Plan before you code.
The Build  ·  Phase 1 / 520 / 81

The Craft of Requirement Engineering

Problem  Vague requirements compound into wrong architecture, wrong tasks, wrong code. Ambiguity is the #1 cause of AI project failure.

Concept  A requirement is a contract. Precise, verifiable, readable by anyone — without being in the room.

1 · REQRequirements
2 · ARCHArchitecture
3 · TASKSTask Gen
4 · TDDTDD
5 · REVIEWReview
Socratic Interview
Claude asks. You answer. Intent → Behaviors → Edge Cases → Acceptance → Decisions → Artifact. Six phases, every time.
Detail Without Technicality
WHAT not HOW. “The wizard auto-saves” is in. “debounced useEffect” is out. File names, schemas, code belong to Phase 2.
Sprint-Sized Scope
One digestible chunk per REQ. If the conversation reveals more, split — never push scope downstream.
Mode ARaw idea → full Socratic interview, 6 phases.
Mode BExisting PRD → gap-fill interview, diagnoses missing pieces.
This is the craft. Next: the specific problems our skill was built to solve.
Phase 1  ·  Why this skill21 / 81

Why We Built plan-requirements

Problem  Four recurring failures in requirement gathering that kill projects before the first line of code.

01
Re-explaining
The Blank Slate
Every session starts from zero. Claude has no memory of your product context — you re-explain everything, or worse, forget to.
02
Production bugs
Happy Path Bias
Edge cases get skipped — always. Failure modes discovered in production, not requirements. 100× costlier.
03
Sprint chaos
The PRD Gap
Stakeholder docs aren't sprint-ready. PRDs miss verifiable acceptance criteria, explicit scope, “what a dev gets wrong” nuance.
04
Deadline miss
Scope Bleed
Large features aren't split at the source. Conversations scope-creep in real time. Bloated requirements break every later phase.
These are not user failures. They are systemic gaps. That's what a skill fixes.
Phase 1  ·  How it works22 / 81

plan-requirements: How It Works

Solution  Six-phase Socratic flow + readiness checklist + Phase 1 Gate. Structured interview, structured output.

Solves: Blank Slate
Structured 6-phase flow (A–F)
Intent · Behaviors · Edge Cases · Acceptance · Decisions · Artifact. No re-explaining.
Solves: Happy Path Bias
Dedicated Edge Case phase (C)
Systematic probing: input edges, concurrency, dependencies, security, scale.
Solves: PRD Gap
Two entry modes + acceptance criteria
Mode B gap-fills existing PRDs. Every requirement gets a verifiable criterion.
Solves: Scope Bleed
Sprint-sizing + 8-check readiness gate
Must be sprint-sized. Must have scope boundaries. Must be cold-readable.
Phase 1 Gate
1. Can I explain every requirement without ambiguity?   2. Can any teammate read this cold and enter sprint planning with full context?
Both must be YES before the artifact is generated. Both must be YES before Phase 2 begins.
Output: /specs/requirements/REQ-<slug>.md — Traceable IDs · Decisions log · Scope boundaries · Open questions.
Phase 1  ·  Live Demo23 / 81

Demo: plan-requirements

step 1Load skill
step 2Socratic interview
step 3Edge case probing
step 4Readiness check
step 5Phase 1 Gate
step 6REQ artifact
01
The Interview Style
Claude asks one question at a time. Summarizes before moving on. Offers concrete options when you're unsure.
02
Edge Case Probing
Watch the “first attempt” probe. “What would a developer get wrong?”
03
Readiness Check + Phase 1 Gate
Claude won't generate the artifact until all 8 criteria pass — and both gate questions answer YES.
Live Demo
# Command /plan-requirements # Starting prompt "I want to track my AI sessions"
Expected output
/specs/requirements/ REQ-session-tracking.md - Summary + Problem/Motivation - Functional Req (R1, R2, R3…) - Edge Cases table - Decisions Log - Scope: In / Out - Open Questions
Watch the skill, not just the output. The process is the product.
The Build  ·  Phase 2 / 524 / 81

The Craft of System Architecture

Problem  Architecture designed without reading existing code produces designs that don't fit. Wrong patterns, wrong assumptions, wrong boundaries.

Concept  Ground in reality before designing. Propose, stress-test, then walk the code.

1 · REQRequirements
2 · ARCHArchitecture
3 · TASKSTask Gen
4 · TDDTDD
5 · REVIEWReview
Ground in Reality
Read CLAUDE.md. Run file-tree + search-codebase. Understand existing patterns before proposing new ones. Design for the codebase you have.
Design Collaboratively
You propose. Claude stress-tests. Offer options with tradeoffs. Every decision traces back to a requirement ID.
Know Where It Lands
The Change Footprint: created, modified, deleted, touched-but-not-changed. A design without a footprint is a whiteboard exercise.
GreenfieldDesign phases dominate; footprint is shallow.
BrownfieldPhase D2 (footprint) is center of gravity.
RefactorChange Footprint is the primary deliverable.
This is the craft. Next: the problems our skill was built to solve.
Phase 2  ·  Why this skill25 / 81

Why We Built plan-architecture-v2

Problem  Four recurring failures in AI-assisted architecture that produce ungrounded, untraceable, unimplementable designs.

01
Token waste
Speculative Reads Burn Tokens
Claude reads 20+ files to “orient itself.” Every session reinvents discovery. Thousands of tokens on files that never inform the design.
02
Rewrite
Designs Ignore Existing Patterns
Proposes layered where hexagonal exists. Designs new auth where one works. Conflicts with in-flight migrations.
03
Unimplementable
Architecture Without Footprint
Design docs describe structure but not where it lands. No file paths. No “what changes here.” Another dev cannot implement.
04
Production incident
No Stress-Test Pass
Designs shipped without validating against failure scenarios. Rollback paths unclear. Regression risk never assessed.
These are process failures, not people failures. The skill encodes discipline so the tool enforces it.
Phase 2  ·  How it works26 / 81

plan-architecture-v2: How It Works

Solution  3-step context gathering + 7-phase flow + Change Footprint Walk + Phase 2 Gate.

01
file-tree.sh
Maps the codebase shape — no speculative reads.
02
search-codebase.sh -m 3
Keyword calibration: >100 matches = too broad. <5 files = too narrow.
03
Targeted Read
2 attempts max. Then Glob. Read only on 3-line preview signal.
AContext
BStructure
CTech
DDesign
D2Footprint
ECross-Cutting
FStress-Test
GArtifact
Change Footprint Walk (D2)
+ New
What gets created, where, following which pattern.
~ Modified
What changes — one line per file.
− Deleted
What goes away, and why.
! Touched
Silent-regression hotspots.
Phase 2 Gate: can another senior dev implement this from the doc alone, and point to every place this change lands?
Phase 2  ·  Live Demo27 / 81

Demo: plan-architecture-v2

step 1Load skill
step 2Context scripts
step 3Design (B–D)
step 4Footprint walk
step 5Stress-test
step 6ARCH artifact
01
Context Gathering
Watch the bash scripts run — file-tree, then search-codebase. No speculative reads. Only files surfaced by keywords get touched.
02
Change Footprint Walk
Brownfield project — watch Phase D2 dominate. New files, modified files, touched-but-not-changed.
03
Stress-Test Pass
Watch Claude challenge its own design. Forward: what breaks at runtime. Backward: what regresses silently.
Live Demo
# Command /plan-architecture from: specs/requirements/REQ-session-tracking.md
Expected output
/specs/architecture/ ARCH-session-tracking.md - Architecture Summary + Tech Choices - Data Models + API Contracts - Change Footprint - Areas of Impact + Risk - Stress-Test Scenarios - Decisions Log
Watch the grounding. Context first, design second. No whiteboard architecture.
The Build  ·  Phase 3 / 528 / 81

The Craft of Task Generation

Problem  Vague to-do items produce vague code. Tasks without test plans produce untested code. Tasks detached from architecture produce wrong code.

Concept  A task is a TDD-ready specification. Test plan first. Anchored on architecture. Sized for a single TDD cycle.

1 · REQRequirements
2 · ARCHArchitecture
3 · TASKSTask Gen
4 · TDDTDD
5 · REVIEWReview
One File, Full Context
Tasks live inside ARCH-*.md. Architecture + tasks in one document. The TDD agent reads one path. No cross-referencing, no stale links.
Test Plan Before Code
Behavior tests from REQ acceptance. Edge cases from failure modes. Resilience from ARCH stress-tests. Regression guards from touched files.
Footprint-Anchored Scope
Every task maps to a slice of the Change Footprint. Every entry claimed. No drifting off-plan.
2–4 prod filesper task (excluding tests)
3–8 scenariosbehavior + edge + stress + regression
Never xlsplit by endpoint, layer, concern, entity
This is the craft. Next: the problems our skill was built to solve.
Phase 3  ·  Why this skill29 / 81

Why We Built generate-tasks

Problem  Four recurring failures in task breakdown that turn architecture into chaos at implementation time.

01
Interpretation drift
Tasks as Vague To-Dos
“Implement auth” — no tests, no files, no boundaries. Every developer interprets differently.
02
Untested code
No Test-First Discipline
Tests added after to “cover” what was built. Acceptance criteria not translated. Regression risks never tested.
03
Divergence
Tasks Detached from Architecture
Tasks in Jira/Linear, no link to ARCH files. Footprint entries orphaned. Architecture and tasks diverge.
04
Silent breakage
No Regression Guard
Touched-but-not-changed files never tested. “Should not affect them” — until it does.
The gap between architecture and code is where projects die. The skill bridges it with structure, not willpower.
Phase 3  ·  How it works30 / 81

generate-tasks: How It Works

Solution  5-step flow: Understand → Anchor → Draft Tests → Build Spec → Write to ARCH. Test plan before code.

1Understand
2Anchor
3Draft Tests
4Build Spec
5Write to ARCH
Footprint → Task Files Expected
+ New→ New   carry pattern forward
~ Modified→ Modified   carry “what changes” note
− Deleted→ Modified   diff shows deletion
! Touched→ Must NOT modify   add regression-guard tests
Four Test Scenario Sources
REQ acceptance criteria→ Behavior tests
REQ edge cases / failures→ Error / edge tests
ARCH forward stress-test→ Resilience tests
ARCH touched-but-not-changed→ Regression-guard tests
Task shape  Status · Effort (never xl) · Priority · Dependencies · REQ-IDs · Footprint slice · High-risk callouts
Output: tasks embedded in ARCH-*.md #Tasks. One file = architecture + decisions + contracts + tasks.
Phase 3  ·  Live Demo31 / 81

Demo: generate-tasks

step 1Load skill
step 2Read ARCH + REQ
step 3Draft test plan
step 4Anchor on footprint
step 5Write tasks
01
Test Plan Drafting
Watch Claude pull tests from four sources — REQ acceptance, edge cases, ARCH stress-tests, regression risks. Every test traces upstream.
02
Footprint Anchoring
Watch each task claim a slice of the Change Footprint. New, modified, must-not-modify. Every entry accounted for.
03
Regression-Guard Tests
Touched-but-not-changed files get explicit regression tests. “Verify existing login flow still works.” Nothing assumed.
Live Demo
# Command /generate-tasks from: specs/architecture/ARCH-session-tracking.md
Expected output
ARCH-session-tracking.md #Tasks: - Task T1: [Clear Title] - Status · Effort · Priority · REQ-IDs - Test Plan: behavior + edge + stress + regression - Files Expected: new / modified / must-not-modify - Implementation Notes + Scope
Watch the translation. Architecture → tasks → tests. Structure, not willpower.
The Build  ·  Phase 4 / 532 / 81

The Craft of Test-Driven Development

Problem  Code written before tests produces untestable code. Batched tests hide which change broke what. Skipped refactoring accumulates debt.

Concept  RED → GREEN → REFACTOR. One test at a time. Minimum code to pass.

1 · REQRequirements
2 · ARCHArchitecture
3 · TASKSTask Gen
4 · TDDTDD
5 · REVIEWReview
RED
Write one failing test. Run it. Confirm it fails for the right reason — missing function, not syntax error. Collaborative: show the developer and pause. Autonomous: verify and proceed.
GREEN
Write the minimum production code to make the test pass. No more, no less. Run the suite — new test passes, no existing tests broken.
REFACTOR
Assess: duplication, naming, structure. If warranted, propose the refactor. Run the suite again. Only when all tests pass, pick up the next test.
Collaborative (default)Pause at every red/green. Developer confirms.
/tdd autoRuns without pausing. Stops on unexpected failures.
Both modes: one test at a time, minimum code, verify failure reason, respect Must NOT Modify, stop on ambiguity.
Phase 4  ·  Why this skill33 / 81

Why We Built the TDD Skill

Problem  Four discipline failures that turn good task specs into bad code.

01
Verification theater
Jumping Ahead
Production code first, test retrofitted. Test verifies code exists, not that it meets the requirement. TDD in name only.
02
Archaeology
Batching Tests
Multiple tests written before any pass. When one breaks, you cannot tell which change caused it.
03
Fragile tests
Mocking Internals
Mocking the thing being tested. Refactoring breaks tests that should still pass. Change-detectors, not behavior-verifiers.
04
Scope creep
Scope Drift
Silently touching files outside the task. “I noticed this was broken so I fixed it.” No test plan, no review.
TDD is not a technique — it is a discipline. Discipline requires enforcement, not intention.
Phase 4  ·  How it works34 / 81

The TDD Skill: How It Works

Solution  Two modes. RED→GREEN→REFACTOR cycle. Before-you-start checklist. Footprint-respecting. Phase 4 Gate.

Test Discipline
Arrange-Act-AssertOne behavior per test
Import from prod pathEven if module doesn't exist yet
Mock boundariesNot internals — external deps only
Verify failure reasonMissing function, not syntax error
Production Code Discipline
Minimum to passNo more, no less
Follow existing patternsFrom Implementation Notes
Respect Files ExpectedOnly create/modify listed files
Must NOT Modify = sacredRegression-guard tests verify
Phase 4 Gate
Do all tests pass and does the code match the architecture decisions from Phase 2?
That's the skill on paper. Now let's see it in action.
Phase 4  ·  Live Demo35 / 81

Demo: Test-Driven Development

step 1Load skill
step 2Before-start check
step 3RED test
step 4GREEN code
step 5REFACTOR?
step 6Full suite
01
Watch the RED Step
Claude writes one test. Runs it. Verifies it fails for the right reason — missing function, not syntax error. Collaborative: pauses to show you.
02
Watch the GREEN Step
Claude writes minimum code — just enough to pass. Run the suite. New test green, existing tests unbroken.
03
Watch Scope Discipline
Footprint-respecting implementation. Files Expected honored. Must NOT Modify untouched. If drift detected, skill pushes back.
Live Demo
# Command /tdd from: specs/architecture/ARCH-session-tracking.md
Expected output
- Tests pass: task test plan complete - Full suite: no regressions - Task status: done - Files created/modified per spec - Must NOT Modify files: untouched - Next: /review task
Watch the cycle. RED → GREEN → REFACTOR. One test at a time. Discipline, not speed.
The Build  ·  Phase 5 / 536 / 81

The Craft of Code Review

Problem  Rubber-stamp approvals. Review everything or nothing. No severity triage. Reviewers who fix instead of flag. Review theater.

Concept  Triage first, review second. Human confirms scope. Sub-skills review in parallel. Read-only — flag, don't fix.

1 · REQRequirements
2 · ARCHArchitecture
3 · TASKSTask Gen
4 · TDDTDD
5 · REVIEWReview
Triage, Don't Drown
Review everything = review nothing well. Propose checks; developer confirms. Task completion always in pipeline mode. Security always when user-facing.
Orchestrator Pattern
One skill dispatches many. Parallel agents, each reading its own SKILL.md. Filtered diffs. Bounded scope. Parallel execution.
Read-Only Discipline
Flag findings. Do not fix. Do not write code. The developer decides. Separation of detection and remediation.
Critical
Blocks merge
High
Strongly blocks
Medium
Should fix
Low
Suggestion
Manual
Dev checks
This is the craft. Next: the problems the skill was built to solve.
Phase 5  ·  Why this skill37 / 81

Why We Built the Review Skill

Problem  Four failures that turn code review into approval theater — or avoidance.

01
Merge lottery
Rubber-Stamp Approvals
“LGTM” on 500 lines without reading. No architecture verification. No security scan.
02
Inconsistent depth
Review Everything or Nothing
A typo fix gets the same depth as a new auth system. Large PRs get superficial review because nobody has time.
03
Signal lost
No Severity Discipline
A missing semicolon gets the same attention as a SQL injection. Style preferences drown out security findings.
04
Two jobs
Reviewer as Janitor
Reviewer renames a variable, pushes a commit. Developer never learns why it was wrong. Two jobs badly instead of one job well.
Review is the last line of defense. When it fails, all prior phases' value is lost.
Phase 5  ·  How it works38 / 81

review: How It Works

Solution  Triage → Dispatch parallel agents → Collect → Deduplicate → Compile report. Human confirms scope at every step.

Pipeline Mode
Verify implementation against ARCH + REQ. Task completion always included. Source chain: REQ → ARCH → task → code.
General Mode
PR / branch / staged / diff file. Gather diff, detect stack, propose checks, dispatch. No spec verification.
1Read changeset
2Propose checks
3Developer confirms
416 parallel agents
5Collect + dedupe
6Verdict
Parallel Dispatch
Filtered diff per agent. React → .tsx only. Each reads its own SKILL.md.
Collect + Compile
Same line → highest severity wins. Merge comments. Insights combined.
Verdict
PASS PASS-FINDINGS FAIL
Each of the 16 review agents is defined this way — .claude/agents/code-reviewer.md
--- name: code-reviewer description: Reviews code for quality, security, and correctness. Use when the user asks for a code review. tools: Read, Grep, Glob model: sonnet isolation: worktree --- You are a code reviewer. Flag findings, do not fix.
Phase 5 Gate: would I mass-merge this without reading it? If yes — I haven't reviewed properly.
Phase 5  ·  Live Demo39 / 81

Demo: Code Review

step 1Load skill
step 2Triage
step 3Dispatch agents
step 4Collect findings
step 5Deduplicate
step 6Report + verdict
01
Watch the Triage
Claude proposes which checks to run. Task completion always in pipeline. Security when user-facing. You confirm or adjust.
02
Watch Parallel Dispatch
Multiple agents spawn simultaneously. React patterns gets .tsx. DB patterns gets migrations. Each reads its own SKILL.md.
03
Watch the Verdict
PASS / PASS-FINDINGS / FAIL. Every finding: severity + source chain + concrete next steps.
Live Demo
# Command /review
Expected output
- Executive Summary + Verdict - Per-file findings (severity + line) - Source chain context (REQ/ARCH) - Concrete next steps per finding - Checklist summary: OK / WARN / BLOCK - Re-review protocol for fixes
One skill, many reviewers. Structure, not personality.
A C T    3

The Operating
Environment.

The build looked smooth. It wasn't. Credentials nearly committed. UI checked by hand. Conventions forgotten every session. Context windows overflowing. Process works — but process without environment is fragile.

Safety
Hooks & Guardrails
Capability
MCP & Playwright
Instructions
CLAUDE.md & Rules
Configuration
Settings & Permissions
Operations
Session Hygiene
0 4    Problems we hit

Guardrails.

The build looked smooth. It wasn't. Claude nearly committed an API key, tried to delete the migrations directory, and shipped code that passed tests but didn't match formatting. Time to add the guardrails.

Guardrails  ·  Problem 142 / 81

The Credential Near-Miss

⚠   NEAR-MISS REPORT sess_9a8b7c6d5e  ·  14:32 UTC
The Story
During TDD, Claude created a test fixture with a dummy API key. The value isn't real — but it matches the regex pattern for an OpenAI key (sk-...).
Risk
If committed, the key would live in git history forever.
What saved us: manual review. But manual review is unreliable.
The Insight
We need an automatic guard that blocks the commit before it happens.
# The dummy that almost shipped: const apiKey = "sk-aBcDeFgHiJkLmNoPqRsTuVwXyZ"; # Looks fake. Matches the regex. # Once in git history → forever.
This is where Hooks come in. →
Manual review is the last line of defense. Hooks are the first.
Guardrails43 / 81

Hooks: Automatic Guardrails

Problem  Claude can make destructive changes — commit credentials, delete files, force-push. These happen even when Claude is “trying” to help.

Solution  Hooks = shell commands that fire at lifecycle events. Claude doesn't control them. They run automatically.

The blocking mechanism
Exit code 0 = allowed.   Exit code 2 = BLOCKED.
Lifecycle Events
EventTimingCan Block?Typical Use
PreToolUseBEFORE tool runsYESCredential guard, destructive command check
PostToolUseAFTER tool runsNoAuto-format, logging
NotificationClaude needs inputNoDesktop alert — makes Auto mode practical
UserPromptSubmitBefore prompt processedNoInject git branch, project state automatically
PreCompactBEFORE compaction runsNoSave critical context before summary
Stop / SessionStartFinish / StartupNoInformational only
PreToolUse = ultimate safety layer. Notification = what makes Auto mode practical without watching the terminal.
Guardrails44 / 81

PreToolUse in Action: Credential Guard

.claude/hooks/PreToolUse-credential-guard.sh
#!/bin/bash # PreToolUse hook: blocks commits with credentials STAGED_FILES=$(git diff --cached --name-only) for file in $STAGED_FILES; do # OpenAI keys if grep -E 'sk-[a-zA-Z0-9]{20,}' "$file" 2>&1; then echo "BLOCKED: OpenAI key"; exit 2; fi # AWS keys if grep -E 'AKIA[0-9A-Z]{16}' "$file" 2>&1; then echo "BLOCKED: AWS key"; exit 2; fi # Private keys if grep -E 'BEGIN.*PRIVATE KEY' "$file" 2>&1; then echo "BLOCKED: Private key"; exit 2; fi done; exit 0
What This Blocks
  • OpenAI API keys (sk- pattern)
  • AWS access keys (AKIA pattern)
  • Private keys (RSA / EC / OPENSSH)
  • Database URLs with passwords
🛡   Key Truth
Claude can bypass permissions — CANNOT bypass PreToolUse exit code 2. The exit code is the ultimate safety layer.
Three regexes, three lines of defence. API keys never reach git.
Guardrails  ·  Problem 245 / 81

More Hooks: Auto-Format & Lint

Problem  Code passed tests but didn't match project formatting standards. Manual formatting is tedious and inconsistent.

.claude/hooks/PostToolUse-auto-format.sh
#!/bin/bash # PostToolUse: auto-format after Claude edits TOOL_NAME=$1 if [[ "$TOOL_NAME" == "Write" || \ "$TOOL_NAME" == "Edit" ]]; then npx prettier --write "**/*.{ts,tsx}" npx eslint --fix "src/**/*.{ts,tsx}" git add -A fi
The Effect
  • Every file Claude touches is auto-formatted
  • Lint errors are fixed automatically
  • Zero manual work — happens after every edit
Hook Types Summary
TypeSpeedUse Case
commandFastFormatting, linting
promptSlowSmart validation
agentVariableComplex workflows
httpNetworkExternal integrations
Prompt Hook — LLM-evaluated (Haiku by default), no script file needed
// .claude/settings.json — prompt hook { "hooks": { "PreToolUse": [{ "matcher": "Bash", "hooks": [{ "type": "prompt", "prompt": "Check if this bash command is safe. Block (exit code 2) if it contains rm -rf, DROP TABLE, or force push to main." }] }] } }
Claude (Haiku) evaluates the action semantically — two lines of config catches what regex can't.
PostToolUse can't block — but it can clean up after Claude every single time.
Guardrails  ·  Live Demo46 / 81

Demo: Hooks Blocking a Commit

Step-by-Step
  1. Claude creates a test fixture with a dummy API key (sk-dummy12345…)
  2. Claude stages the file with git add
  3. Claude attempts to commit
  4. PreToolUse hook fires — credential guard detects sk- pattern → BLOCKED
  5. Error message tells Claude exactly what to fix: “Replace with clearly fake value like sk-FAKE-TEST-KEY”
  6. Claude fixes the key → commit succeeds
Before Hooks
  • API key committed to git
  • Permanent history damage
  • Manual cleanup required
  • No automated protection
After Hooks
  • Commit blocked automatically
  • Clear error message guides fix
  • Zero damage to git history
  • Claude learns from feedback
Hooks don't just catch mistakes — they teach. The error message tells Claude exactly what to fix.
Guardrails  ·  Reflection47 / 81
Guardrails Reflection

What Did We Learn?

  1. PreToolUse is the one hook you cannot skip. Exit code 2 = blocked. Works even when Claude bypasses permissions.
  2. Credential guards are non-negotiable. API keys in git = permanent damage. Block before the commit happens.
  3. PostToolUse automates the boring stuff. Formatting, linting — every file Claude touches gets cleaned automatically.
  4. Hooks teach, not just block. Good error messages tell Claude exactly what to fix and how.
  5. Skills = on-demand. Hooks = always active. They complement, don't compete.
Claude can read every line of code. But it has never seen the app. Next: giving Claude eyes.
Guardrails  ·  Security47B / 81

Sandboxing: OS-Level Isolation

Hooks  block what you anticipate.  Sandboxing  blocks everything else.

What It Does
  • macOS: seatbelt (sandbox-exec) restricts filesystem and network access
  • Linux: bubblewrap (bwrap) for the same guarantee
  • Bash commands run in an isolated process — can't touch the broader filesystem or network without explicit permission
How to Enable
{ "sandbox": { "enabled": true } }
One line in settings.json. No script file needed.
The Payoff
84%
fewer permission prompts. Not because you allowed more — because the sandbox eliminated the need to ask about low-risk, isolated commands.
Relationship to Hooks
PreToolUse hooks → block specific known-bad actions
Sandboxing → OS-level isolation for everything else

Two layers. Hooks are the allow/deny list. Sandboxing is the perimeter.
Skills = on-demand. Hooks = always active. Sandboxing = always isolated. All three together = Auto mode you can actually trust.
0 4    Problems we hit

UI Automation.

We merged the code. But we checked the UI manually — opening the browser, clicking, watching the console. That's tedious — and Claude can't see what we see. Or can it?

UI Automation  ·  The Problem49 / 81

Manual UI Verification Is Tedious

The Manual Checklist
1. Start dev server 2. Open browser 3. Check dashboard loads 4. Verify charts render 5. Test date filter works 6. Check console for errors 7. Test responsive on mobile 8. Verify dark mode works 9. Take screenshot 10. Close browser
Why This Is a Problem
Time
5–10 minutes per change.
Blindness
Claude reads code, not pixels.
Leaks
Visual bugs slip through.
Repeat
No reproducibility.
“We need Claude to SEE the app, not just read the code.”
Manual UI checks are velocity killers. Time for a new tool.
UI Automation  ·  Concept50 / 81

MCP: Giving Claude Eyes and Arms

Concept  Model Context Protocol (MCP) connects Claude to external tools. Bash is powerful but limited — Claude can't browse or take screenshots. MCP changes that.

CapabilityWithout MCPWith Playwright MCP
Read codeReads files, “looks correct”Navigates URL, sees actual UI
ScreenshotsNot possibleCaptures viewport automatically
Console errorsManual check onlyReads errors programmatically
DOM dataCannot extractExtracts accessibility tree
VerificationManual → slowAutomated → fast, reproducible
How It Works
Claude decides
Calls MCP tool
MCP opens browser
Returns result
Claude sees!
MCP extends Claude beyond the terminal. Playwright MCP = Claude can see the app.
UI Automation  ·  Setup51 / 81

Setting Up Playwright MCP

3-Step Setup
01
Install the package
npm install -D \ @anthropic-ai/playwright-mcp
02
Configure .mcp.json
{ "mcpServers": { "playwright": { "command": "npx", "args": [ "@anthropic-ai/playwright-mcp" ] } }}
03
Verify with /context
Type /context in Claude Code to verify the Playwright MCP server is connected.
Key Playwright MCP Tools
navigate
Load a URL in the browser
screenshot
Capture the current viewport
getConsoleMessages
Read browser console errors
snapshot
Get accessibility tree of page
evaluate
Run JavaScript in the browser
click / fill / select
Interact with page elements
Where does .mcp.json live? — Three Scopes
.mcp.json in repo root
Committed to git. Whole team gets the same MCP servers.
.claude/settings.local.json
Gitignored. Personal overrides only.
~/.claude.json
Applies to all projects on this machine.
Use Project scope for Playwright — your team should verify UI the same way you do.
One config file, one npm install. Claude now has a browser at its command.
UI Automation  ·  Live Demo52 / 81

Demo: Playwright Takes Over

Automated UI Verification
1. Start dev server (localhost:5173) 2. Tell Claude: "Verify the dashboard looks correct" 3. Watch Claude use Playwright MCP: ✓ navigate → page loaded 200 ✓ screenshot → saved to file ✓ getConsoleMessages → 0 errors 4. More verifications: ✓ evaluate → 3 charts rendered ✓ evaluate → "129 sessions" found 5. Result: Dashboard verified in 30 seconds.
Manual Check
  • 10 minutes per change
  • Human interaction required
  • Inconsistent coverage
  • Easy to skip steps
Playwright MCP
  • 30 seconds, fully automated
  • Zero human interaction
  • Same checks every time
  • Fully reproducible
Without Playwright, Claude reads code. With Playwright, Claude sees the app.
UI Automation53 / 81

More Essential MCP Servers

ServerPurposeWhen to Use
QdrantSemantic search over docsSearch across project docs, ADRs, past decisions
Context7Up-to-date library docsLatest API signatures when training data is stale
PostgreSQLDatabase queriesVerify schema, run queries, check migrations
FigmaDesign-to-codeAccess design specs programmatically from mockups
Registrymodelcontextprotocol.ioHundreds more servers. Add as workflow evolves.
Tool Search: Save Tokens
Tool Search defers MCP tool definitions — they load on demand rather than at startup. Reduces context from ~72K to ~8.7K tokens. Controlled via the ENABLE_TOOL_SEARCH environment variable. Set to 0 to disable if you need all tools loaded upfront.
We'll talk more about managing context in the Session Hygiene section.
MCP composes: start with Playwright + Qdrant + Context7. Add servers as workflow evolves.
UI Automation  ·  Reflection54 / 81
UI Automation Reflection

What Did We Learn?

  1. MCP = Claude gets superpowers. Browser, database, search, design — MCP servers extend Claude far beyond the terminal.
  2. Playwright MCP eliminates manual UI checks. 30 seconds for full verification, not 10 minutes. Fully automated, fully reproducible.
  3. Tool Search saves tokens. On-demand loading instead of startup loading. Reduces context from ~72K to ~8.7K tokens.
  4. MCP servers compose. Start with Playwright + Qdrant + Context7. Add as workflow evolves.
  5. Introduced when pain was felt. Not “just in case” — because manual UI verification was killing velocity.
MCP gives Claude eyes. But every session, Claude still forgets your conventions. Next: persistent memory.
UI Automation  ·  Development Loop54A / 81

Frontend Workflow: Code, Screenshot, Iterate

Playwright isn't just for verification — it's for visual iteration. Claude sees what you see.

The Pattern
Write code → Playwright screenshot → Claude sees output → Claude adjusts → repeat
Contrast
Verification: "Does it work?" → assertions, pass/fail
Iteration: "Does it look right?" → screenshot, visual judgment, adjust
Example Prompt
"Build the dashboard. After each change, take a screenshot and tell me if the layout matches this design spec."
Key Tools for Iteration
screenshot → capture visual state
evaluate → check computed styles & positions
fill_form → interact with forms to test states
browser_navigate → move between pages mid-iteration
Claude sees the screenshot. Use that. Visual iteration closes the loop between "code runs" and "design matches."
0 5    The operating environment

Persistent
Context.

Claude executes your process perfectly — then forgets everything next session. Your conventions, your patterns, your rules. Every session starts from zero.

Persistent Context56 / 81

The Goldfish Problem

You spend forty minutes teaching Claude your authentication system. You close the terminal. Open a new session. Claude has no idea. It is a stranger again.

Story 1 — Logging
During the build, Claude used console.log for debugging. Our team convention: structured logger with correlation IDs. We corrected it.
Next session — console.log again. Three sessions. Three corrections.
Story 2 — Naming
Claude generated database columns in camelCase. Our convention: snake_case. Corrected. Forgotten.
Corrected. Forgotten. The agent has no way to remember "how we do things here."
"LLMs don't have memories. They have context windows. Those windows are finite. And every session, the window starts empty."
Persistent Context57 / 81

Why Agents Forget

Five technical reasons every session starts from zero. Understanding the mechanism is the first step to fixing it.

01

Stateless Inference

Model weights are frozen. No real-time learning happens during conversation. Every response is computed from scratch.
02

Context Rebuilt Per Call

Full history is re-sent every API call. There is no persistent memory store — only the messages you see.
03

Context Window Overflow

Oldest messages are silently dropped when the window fills. Your early instructions are the first to disappear.
04

No Persistent Storage

Session ends, context evaporates. Nothing is saved between conversations unless you explicitly write it to disk.
05

Indexing Without Understanding

Search finds text, not meaning. Retrieval gives you chunks, not comprehension. The agent still needs to reason.
The Insight
Skills solve the process problem — Claude follows your phases. But skills don't solve the conventions problem. We need a way to tell Claude: "These are the rules. Every session. Permanently."
Persistent Context58 / 81

Claude Code's Memory System

Claude Code doesn't have one memory system. It has four — each solving a different part of the problem.

📝
CLAUDE.md
Written by you. Instructions, rules, conventions. Loaded fully into context every session. Version-controllable. The foundation of persistent memory.
🤖
MEMORY.md
Written by Claude. Learnings and observations from working with you. First 200 lines only. Machine-local, not shared across devices.
📁
Rules
Conditional instructions scoped to file paths. API rules for API files. Frontend rules for frontend. The #1 mechanism for reducing always-loaded context.
Commands
Runtime tools — /init, /memory, /compact, /context, /cost. Manage memory actively. Compact before degradation.
Four layers, one principle: critical rules in files, not chat history.
Persistent Context59 / 81

CLAUDE.md: Your Most Powerful Lever

Hierarchy — All Files Concatenate
LevelLocationWho WritesScope
Project./CLAUDE.mdYouThis repository
User~/.claude/CLAUDE.mdYouAll projects — global conventions
Rules./.claude/rules/*.mdYouPer file-type, conditional
Local./CLAUDE.local.mdYouGitignored — personal only
Auto~/.claude/projects/.../MEMORY.mdClaudeMachine-local only
Subdirectory./src/api/CLAUDE.mdYouLazy-loaded when Claude operates in that directory
@path import syntax — reference files without bloating CLAUDE.md
<!-- CLAUDE.md --> Use the auth patterns from @docs/auth-standards.md See database conventions at @docs/db-standards.md
Subdirectory CLAUDE.md files load lazily — only when Claude reads files in that directory, keeping startup context lean.
✏️
The 80-Line Rule
Over 200 lines = reduced adherence. Anthropic targets ~60 lines. Models follow ~150–200 instructions; Claude's system prompt takes ~50. Every line beyond 120 competes with code context for attention.

Lost-in-the-middle effect: models over-attend to start and end. Put critical instructions at the TOP. Use @path imports to reference files without bloating CLAUDE.md.
Loaded every session, automatically. No slash command. The single highest-leverage file in your repo.
Persistent Context67 / 81

Auto Memory: Claude Writes Its Own Notes

When you correct Claude, it detects patterns, checks if they're already known, and writes new entries to MEMORY.md.

How It Works
1
You correct Claude on a convention or decision.
2
Claude detects the pattern in your correction.
3
Checks if the pattern is already in MEMORY.md.
4
If new, writes a concise entry for future sessions.
Example MEMORY.md Entries
# Auto-generated by Claude - "The project uses bun, not npm" - "Always run typecheck before committing" - "User prefers explicit return types" - "Never use console.log — use the logger utility" - "Use snake_case for database columns"
Key Differences from CLAUDE.md
CLAUDE.md — written by you, full file loaded, version-controllable. MEMORY.md — written by Claude, first 200 lines only, machine-local, not shared. Both concatenate — they don't override.
Persistent Context68 / 81

Rules: Conditional Memory

Problem  API code and frontend code need different standards. Loading all rules wastes context.

Two Types
WITH PATHS Only when editing matching files
--- name: api-standards description: Standards for API and route files paths: ["src/api/**/*.ts", "src/routes/**"] --- # API Standards - Always use async/await, never callbacks - Validate all inputs at route boundaries
  • Validate with zod
  • Return 400/500 status codes
  • Include rate limit headers
  • Use TypeScript strict mode
The #1 mechanism for reducing always-loaded context.
WITHOUT PATHS Every session, universal
  • Use TypeScript strict mode
  • No console.log in production code
  • Write tests for all new features
  • Prefer explicit return types
Load only the rules relevant to the files being edited.
Rules without paths: load every session unconditionally — use sparingly as they consume context on every task.
Decision Rule
"Does this need to be true everywhere, or only in certain parts?"    Everywhere → without paths. Certain parts → with paths. Path-scoped rules are the #1 context optimization.
Persistent Context69 / 81

The Memory Commands

Five slash commands to manage memory actively. Don't wait for problems — manage context proactively.

CommandPurposeWhen to Use
/initGenerate starter CLAUDE.md from codebaseStarting a new project — scans files, infers conventions
/memoryEdit memory files in system editorNeed to update CLAUDE.md or MEMORY.md manually
/compactSummarize conversation historyFreeing context space before it degrades
/contextVisualize context usage as colored gridDebugging what is loaded and what is not
/costShow token usage and costMonitoring spend and efficiency
/powerupInteractive lessons built into Claude CodeWeekly skill refresh — learning by doing inside the tool
⚠️
The 60% Rule
Compact at 60%, not when warnings fire at 80–95%. By the time you see warnings, quality has already degraded. The difference between a quality summary and a degraded summary is permanent — you cannot recover lost context.
// .claude/settings.json — PreCompact hook { "hooks": { "PreCompact": [{ "hooks": [{ "type": "command", "command": "echo \"Decisions: $(date)\" >> .claude/session-log.md && cat .claude/working-notes.md >> .claude/session-log.md" }] }] } }
PreCompact fires before compaction summarizes your session. Use it to persist decisions that a summary might lose — architectural choices, rejected approaches, open questions.
Persistent Context70 / 81

Lost in the Middle

Research by Liu et al. (2023) shows language models over-attend to the start and end of context — the middle gets lost. This is not a bug. It is a property of how attention works.

Attention Pattern
START
MIDDLE
END
High → Low → High attention
Symptoms You See
  • Claude duplicates code it wrote earlier in the session
  • Forgets start-of-session conventions and project structure
  • Hallucinates file paths and API signatures
  • Responses get slower as the session grows
  • Costs spike without more work being done
Three Implications
1. Critical rules go in CLAUDE.md (start of context), not conversation.
2. Compact BEFORE degradation — not after symptoms appear.
3. Must-remember rules belong in FILES, not chat history.
Context Assembly Order — why CLAUDE.md survives, chat history doesn't
System prompt (always loaded) → Output style → Git state → CLAUDE.md files ← YOUR RULES LIVE HERE → MCP tool definitions → Skill descriptions → Conversation history ← GROWS UNTIL COMPACTION
CLAUDE.md loads near the top of context — high attention zone.

Conversation history is at the bottom — the first thing compaction summarizes away.

Rules in CLAUDE.md survive. Rules in chat don't.
Persistent Context71 / 81

Best Practices: The Memory Playbook

Memory systems only work if you maintain them. Six practices that separate teams that get compounding value from those that get frustration.

01

Keep CLAUDE.md under 80 lines

Critical rules at the top. Every line beyond 120 competes with others for attention. Use @path imports for detail.
02

Confirm Good Decisions

Memory learns caution, not correctness. Confirm what works so Claude remembers it. Silence teaches nothing.
03

Use Path-Scoped Rules

API rules for API files. Frontend rules for frontend. The #1 context optimization. Load only what you need.
04

Compact at 60%

Quality summary vs. degraded summary. Do not wait for the 80% warning. The difference is permanent.
05

Quarterly Review

15 minutes every quarter. CLAUDE.md grows stale as your codebase evolves. What was true in January may be wrong in April.
06

Start with Official MCP Memory

Free. Local. Five minutes to set up. The best starting point for any team before building custom systems.
"The teams that win aren't the ones with the most sophisticated memory system. They're the ones that maintain the system they have."
Persistent Context72 / 81

Memory Is the Differentiator

Process without memory is Groundhog Day. Memory turns repetition into compounding. Three pillars that make agentic engineering sustainable.

🏗️
Structure
CLAUDE.md + Rules + Skills = static foundation. Written by you. Version-controlled. Loaded every session.
🔄
Adaptation
MEMORY.md + MCP servers = dynamic learning. Written by Claude. Updated automatically. Learns from corrections.
🌐
Portability
AGENTS.md + file-based config = cross-tool future. Not locked to Claude Code. Works with any agent that reads files.
"Process without memory is Groundhog Day. Memory turns repetition into compounding."
Persistent Context  ·  Reflection73 / 81
Persistent Context Reflection

What Did We Learn?

  1. Agents have no native memory. Context windows are finite, stateless, and reset every session. Files are the only persistent layer.
  2. CLAUDE.md is your most powerful lever. Keep it under 80 lines. Critical rules at the top. Use @path imports for detail.
  3. Four memory systems work together. CLAUDE.md (you write), MEMORY.md (Claude writes), Rules (conditional), Commands (runtime management).
  4. Rules scope instructions to file paths. The #1 context optimization — load only what is relevant to the files being edited.
  5. Compact at 60%, not 80%. Quality summary vs. degraded summary. Set a PreCompact hook to protect critical context.
  6. Maintain your memory system. Quarterly reviews, confirm good decisions, start with MCP memory. Maintenance beats sophistication.
Three layers — CLAUDE.md always loaded, Rules conditionally loaded, Skills on demand. Structure substitutes for memory.
0 6    The operating environment

Configuration
& Trust.

Whose settings win when they conflict? And how much autonomy should Claude have? Two questions, one hierarchy.

Configuration & Trust68 / 81

The Configuration Collision

You set TypeScript strict mode. Your colleague disables it locally. CI uses a managed policy. Who wins?

The Story
During the build, I had Claude configured for Auto mode — it acted freely. A colleague cloned the repo and Claude immediately ran rm -rf on a test directory without asking. Same project, different safety level.
The problem: there was no configuration hierarchy. No way to say “team standard overrides personal preferences, but enterprise policy overrides everything.”
Settings hierarchy
Solves: whose configuration wins when they conflict.
Permission modes
Solves: how much autonomy Claude should have per project.
Configuration & Trust69 / 81

The 5-Level Settings Hierarchy

Five levels of settings load on startup. Higher number wins.

LVL 5
Managed
Enterprise / OS level — cannot override.
LVL 4
CLI Arguments
--model, --effort — passed at command line.
LVL 3
Local
.claude/settings.local.json — gitignored, personal overrides.
LVL 2
Project
.claude/settings.json — committed, team standard.
LVL 1
User
~/.claude/settings.json — global defaults.
🔑Settings OVERRIDE — highest wins. CLAUDE.md files CONCATENATE — all applicable load together.
Configuration & Trust70 / 81

Permission Modes: The Trust Gradient

The question  How much autonomy should Claude have? Too much = dangerous. Too little = slow.

RECOMMENDED
Default
Asks permission
Learning, unfamiliar codebases.
EXPLORE
Plan
Read-only exploration
Architecture understanding.
ADVANCED
Auto
Acts freely
Trusted workflows with hooks.
Trust Gradient
New project
Default (asks)
Hooks solid
Auto (acts freely)
The missing layer: Sandboxing
Hooks catch specific bad actions you've anticipated. Sandboxing catches everything else — OS-level process isolation (macOS seatbelt / Linux bubblewrap).
{ "sandbox": { "enabled": true } }
Result: 84% fewer permission prompts — the practical unlock for Auto mode.
Deny rules — write these before enabling Auto
{ "permissions": { "deny": [ "Bash(rm -rf *)", "Bash(git push --force *)", "Bash(DROP *)" ] } }
Deny rules are checked first. Full syntax on the next slide.
Revised graduation: Default → hooks solid → sandbox on → deny rules set → Auto. Full permission rules syntax: next slide.
Config & Trust  ·  Security70B / 81

Permission Rules: Allow & Deny Syntax

The question  Auto mode without rules is trust without boundaries. Here's how to set them.

The permissions block — settings.json
{ "permissions": { "allow": [ "Bash(npm run *)", "Bash(git *)", "Read(**)", "Edit(./src/**)" ], "deny": [ "Bash(rm -rf *)", "Bash(git push --force *)", "Bash(DROP *)" ] } }
Rule Syntax
Tool → applies to all uses of that tool
Tool(specifier) → specific pattern only
** → any path · * → any segment
Read(**) → allow reading any file Edit(./src/**) → only inside src/ Bash(npm run *) → any npm script Bash(rm -rf *) → deny rm -rf
Evaluation Order
1. Deny rules checked first
2. Allow rules checked next
3. Prompt if neither matches
Deny wins over allow. Be explicit about what to block.
You don't have to trust everything to trust Auto mode. Deny the dangerous, allow the routine, prompt the rest.
Operational Discipline71 / 81

The Degrading Session

Every message fills the context window. Quality degrades silently.

The Story
45 minutes into the build, Claude started referencing files that didn't exist. Responses got slower. Costs spiked. We didn't notice until Claude generated a component that imported from a hallucinated path — from a previous conversation turn that was no longer relevant.
0 – 15 min
Sharp
Accurate, fast responses.
15 – 30 min
Drifting
Gradually losing earlier context. Costs rising.
30+ min
Degraded
Hallucinations, stale references, expensive errors.
You need operational discipline. Not glamorous — but it separates sustainable use from expensive frustration.
Operational Discipline72 / 81

Session Hygiene: Keep Sessions Lean

CommandWhen to UseWhat It Does
/clearBetween unrelated tasksClears conversation. Start fresh with empty context.
/contextWhen session feels slowShows what's loaded — skills, files, MCP tools. Diagnose bloat.
/costRegularlyDisplays token usage and cost. Track spend.
/compactBefore major new tasksSummarizes context into a checkpoint. Reduces token load.
--continueAfter closing and reopening terminalResume the most recent session without starting fresh.
--resume <id>When you need a specific past sessionResume by session name or ID.
Thinking Levels — match effort to complexity
level 1/fast
level 2think
level 3think hard
level 4think harder
level 5ultrathink
Operational rhythm:  Start session → /context → work → /cost/compact/clear for new task.
Operational Discipline  ·  Reflection73 / 81
Operational Discipline Reflection

What Did We Learn?

  1. Sessions degrade silently. Quality drops, costs spike, hallucinations increase — you don't notice until something breaks.
  2. /clear between unrelated tasks. /compact at checkpoints. /context to diagnose. /cost to track spend.
  3. Match thinking effort to complexity. /fast for execution. ultrathink for architecture. Everything in between for everything in between.
  4. Session hygiene is operational discipline. Not glamorous, but it separates sustainable use from expensive frustration.
That's the operating environment — five layers, from safety to operations. Let's bring it all together.
Synthesis74 / 81

The Complete Picture

Act 1
The Hook
Why discipline matters.
Act 2
The Build
The 5-phase process in action.
Act 3
The Environment
What makes the process reliable.
Safety
Hooks & Guardrails
Capability
MCP & Playwright
Instructions
CLAUDE.md & Rules
Configuration
Settings & Permissions
Operations
Session Hygiene
“Vague instructions produce vague code. Precise instructions produce precise code.”
“Structure substitutes for memory.”
“Skills = on-demand. Hooks = always active. CLAUDE.md = always loaded.”
From “vibe coding” to disciplined engineering with AI agents. No black boxes. No magic spells.
Synthesis  ·  Next Steps74B / 81

Where to Go From Here

This course gave you the framework, skills, and environment. The frontier moves weekly. Here's how to stay ahead of it.

01
Built-in Lessons
/powerup — interactive lessons inside Claude Code itself. Free. No new tab. Run it weekly.
02
Stay Current
Anthropic engineering blog. docs.anthropic.com/release-notes/claude-code. New hooks, new MCP servers, new permission features drop regularly.
03
Community
awesome-claude-code (21.6k GitHub stars) — curated skills, hooks, workflows. Faster than docs for real-world patterns.
04
Deep Dives
  • Headless mode & Claude Agent SDK → programmatic agents
  • CI/CD with GitHub Actions → automate the 5-phase pipeline
  • Git worktrees → parallel agent isolation
  • Plugin development → distribute your skills
One Action Today
Run /powerup.
Check your CLAUDE.md line count. If it's over 80 lines, refactor.
Add one hook you haven't added yet.
Stay curious. The engineers shipping fastest aren't using the newest tools — they're using the tools they understand deeply.
The Complete Course  ·  Fin
Three Acts. From problem to mastery.

From vibe coding
to disciplined
engineering.

No black boxes. No magic spells. The framework, the skills, and the environment — yours to take into your team.

Created by
Foyzul Karim
linkedin.com/in/foyzul
[ course URL: TBD ]
[ github: TBD ]
[ community: TBD ]