The Complete Course

Agentic Software Engineering

Speed without
discipline
is not engineering.

A three-act course on building real software with AI agents — the 5-phase framework, the skills that drive it, and the operating environment that keeps it safe.

Foyzul Karim · linkedin.com/in/foyzul v1.0 · 81 slides

The Hook02 / 81

Speed without discipline is not engineering.

AI coding agents make development 10× faster. Without process, that speed produces broken code, lost context, and runaway cost.

The Paradigm Shift

Agents reason across entire codebases
They write, test, fix, and review code
Not autocomplete — genuine collaborators
Speed increases 10× or more

Agents are teammates, not tools.

Quality at Risk

Code that looks correct but breaks things
Missed edge cases and side effects
Security flaws — credentials, injections
No architectural context or conventions

Without process, speed kills quality.

Cost Escalation

Every interaction costs tokens
Bloated sessions waste context
Inline scripts burn tokens per call
Undisciplined workflows multiply spend

Without hygiene, costs spiral.

→This course gives you the 5-phase framework, guardrails, and cost controls to turn raw speed into disciplined engineering.

The Hook03 / 81

Course Architecture

A five-part progression: theory → practice → pain → cure → mastery.

The Hook

Understanding the framework and why process matters.

Skills Intro

Just enough skill scaffolding to start building.

The Build

A real project — phase by phase, every skill read before invoked.

Problems We Hit

Guardrails and UI automation — introduced after you feel the pain.

The Operating Environment

Settings, CLAUDE.md, rules, hooks, and advanced deep dives.

→Theory → practice → pain → cure → mastery. A deliberate progression.

The Hook04 / 81

Why This Course Exists

Problem “Vibe coding” — asking AI to write code without process — produces unmaintainable, untested code that breaks in production.

Four Pillars — What Makes This Course Different

5-Phase Framework

Process creates reproducibility. Reproducibility creates trust.

Extension Architecture

Skills + Hooks + MCP extend Claude's capabilities to your stack.

Live Demonstrations

Theory without practice is entertainment, not education.

Hands-On Learning

You build alongside. Every phase, every skill, every decision.

→Process beats prompts. The 5-phase framework turns “vibe coding” into engineering.

The Hook05 / 81

Who This Course Is For

Persona	Their Problem	What They Gain
Software Engineers	AI writes code but they can't maintain it.	Disciplined framework + quality gates.
Team Leads	Team uses AI inconsistently.	Repeatable standards + review process.
Senior Developers	Need the full picture, not just code generation.	Architecture + context + optimization.
Architects & Leads	No standards exist for AI-assisted work.	Design patterns others can follow.

Prerequisites Terminal familiarity · Git · One programming language

→No matter your role, agentic engineering needs process + tools + practice.

The Hook06 / 81

What Is an Agent?

Problem One-shot prompts fail on complex tasks. Every LLM call has limited context, no planning, and no verification.

Concept An Agent is an intelligent orchestrator that decomposes work and decides what to do next.

Agent

Orchestrator

decomposes · decides · verifies

→

⇣

Sequential Skills

Ordered, predictable workflows.

⇉

Parallel Subagents

Fast, scalable execution.

→Agents decompose work into sequential skills or parallel subagents. The fundamental building block.

The Hook07 / 81

What Is a Large Language Model?

Definition

A neural network trained on vast text that learns to predict the next token — enabling it to generate text, write code, and reason.

The Breakthrough

Attention Is All You Need (2017) — the transformer architecture replaced sequential processing with parallel attention.

The Four Pillars

Pre-training

Trained on billions of documents
Absorbs patterns by exposure
Predicts patterns, not meaning

Next-Token Prediction

Predicts the next likely token
Adds it back, repeats
Simple mechanism, intelligent output

Emergent Capabilities

Small: completes sentences
Medium: summarizes paragraphs
Large: writes code, debugs, reasons

Transformer Arch

Looks at all words at once
Decides which parts matter
Long-range coherence

!Hard limits: finite context window (~200K tokens) · no persistent memory across sessions · KV-cache is working memory, not true memory.

The Hook08 / 81

How LLMs Power Agents

The LLM is the brain. The agent is the body.

Intent

You state what you want in natural language. The LLM parses words into a structured goal.

LLM parses → structured goal

Breakdown

The LLM decomposes the goal into steps. Route → schema → validation → query → errors.

LLM plans → executable steps

Execution

The LLM generates the actual code — functions, SQL migrations, test cases.

LLM builds → working code

Loopback

The LLM evaluates the result. Tests pass? Secure? Generates fix if needed.

LLM checks → adapts → repeats

Also known as: perceive-plan-act-observe · ReAct · OODA — same pattern, different names.

→The quality of the loop is determined by the quality of the intent. Vague prompts → vague breakdown → vague code. The framework gives you structure. Specificity gives you precision.

The Hook09 / 81

The Agent as Orchestrator

From single-intent assistants to purpose-based orchestrators that coordinate parallel subagents.

Sequential Skills

1 · REQRequirements

2 · ARCHArchitecture

3 · TASKSTask Generation

4 · TDDTDD Implementation

5 · REVIEWReview & Merge

Ordered workflows — one step feeds the next.

Parallel Subagents

Subagent 1 Database schema design

Subagent 2 API parser logic

Subagent 3 Error handling layer

Subagent 4 Test suite generation

Orchestrator coordinates · integrates · verifies.

→Scales with the problem: small task → 1 subagent · complex system → 10 subagents, coordinated, verified.

The Hook10 / 81

The Problem With “Vibe Coding”

The cycle

Prompt → Code → Hope → Ship → Bug → Repeat

Five Problems

No Requirements

“Build a dashboard” — but which metrics? Which users? You operate on assumptions.

No Architecture

Agent writes file by file with no blueprint. Every PR is a breaking change waiting to happen.

No Task Breakdown

“Build the backend” is not a task. It is a wish.

No Tests

Without tests, quality is unknown. Manual verification is a prayer, not a strategy.

No Review

Bugs and security flaws go straight into your codebase.

→Vibe coding produces toys. The 5-phase framework turns speed into engineering.

The Hook11 / 81

The 5-Phase Agentic Framework

Each phase has one owner and produces one deliverable. No phase is optional.

#	Phase	Owner	Deliverable
1	Requirement Engineering	You	REQ document — what to build
2	System Architecture	You + Claude	ARCH doc — data models, APIs, modules
3	Task Generation	Claude	Task list with tests + scope + REQ trace
4	TDD Implementation	Claude	Working code + passing tests
5	Review & Merge	You + Claude	Approved PR — 16 parallel checks

→Each phase has one owner and one deliverable. Discipline over speed.

The Hook12 / 81

Three Scenarios, One Framework

The same 5-phase pipeline — different entry points. The discipline never changes.

1 · REQRequirements

→

2 · ARCHArchitecture

→

3 · TASKSTask Generation

→

4 · TDDTDD Implementation

→

5 · REVIEWReview & Merge

Scenario	REQ	ARCH	TASKS	TDD	REVIEW	Note
Greenfield	1	2	3	4	5	Blank slate — all 5 phases.
New Feature	—	2	3	4	5	System exists — skip REQ.
Bugfix	RCA	—	3	4	5	Root-cause analysis first — understand why.

→The entry point changes. The discipline doesn't. Same framework, different starting line.

Reflection13 / 81

Reflection

What Did We Learn?

One-shot prompts fail — complex work needs agents.
“Vibe coding” skips steps — 5 phases give AI a process.
The framework adapts — same discipline, different entry points.

→Process beats prompts. Next: the tools that make this process executable.

Skills Intro · The CLI14 / 81

Claude Code: Your Agentic CLI

Problem The 5-phase framework needs a tool — one that reads codebases, runs tests, and loads skills.

Concept Claude Code is Anthropic's official CLI — your terminal interface to agentic engineering.

Reads Full Codebase

Analyzes architecture, dependencies, and conventions across your entire project.

Edits, Tests & Shell

Writes code, runs tests, executes bash — all within your existing workflow.

Skills via /commands

Type /req, /arch, /tdd — Claude loads your skill and follows your process.

MCP Integration

Connects to Playwright, databases, search — giving Claude eyes, arms, and memory.

Install Recommended

curl -fsSL https://claude.ai/install.sh | bash

Legacy (no auto-update): npm install -g @anthropic-ai/claude-code

Needs from you

Clear requirements · Architecture decisions · Well-scoped tasks · Your review

Surfaces

Terminal CLI · VS Code extension · JetBrains plugin · Desktop app · Web (claude.ai/code) · iOS

Auth

Claude.ai subscription (Pro / Max / Teams / Enterprise) · API key · Enterprise SSO (AWS Bedrock · Google Vertex · Azure)

→Claude Code is the interface. Your clarity is the input. Garbage in, garbage out.

Skills Intro · Layout15 / 81

The .claude/ Directory

Problem Skills, hooks, settings, rules — scattered. Without the map, you waste hours searching.

Project Layout

your-project/ ├── .claude/ │ ├── skills/ │ │ ├── req.md │ │ ├── arch.md │ │ ├── tdd.md │ │ └── review.md │ ├── agents/ │ │ ├── code-reviewer.md │ │ └── test-writer.md │ ├── hooks/ │ │ └── PreToolUse-*.sh │ ├── settings.json │ └── rules/ │ └── *.md ├── CLAUDE.md └── CLAUDE.local.md # gitignored

skills/

On-demand instructions via slash commands.

hooks/

Automatic guardrails at lifecycle events.

settings.json

Project config — committed, team standard.

rules/

Conditional instructions scoped to file paths.

agents/

Specialized subagents with focused tool sets. Invoked automatically when task matches description.

→Global: ~/.claude/settings.json + ~/.claude/CLAUDE.md. Project wins. No magic.

Skills Intro · Concept16 / 81

Skills: Automate Your Workflows

Problem Every session starts from zero. You re-explain conventions — every single time.

Solution Skills = markdown in .claude/skills/, loaded on demand via slash command.

You type /command

/req /arch /tdd /review

Signal: switch to this workflow.

Claude loads skill

.claude/skills/req.md

Process, checklists, output format.

Claude follows it

Socratic → ARCH → Tasks → TDD → Review

Consistent. Repeatable. Every time.

User-invoked

You decide when. Not the agent.

On-demand

Zero context bloat until needed.

Version-controlled

In .claude/skills/ — shared, committed.

Skills are matched by LLM reasoning over the description field — not keyword matching. Write descriptions that articulate when to use the skill, not just what it does.

Advanced: Set context: fork in skill frontmatter to run the skill in an isolated subagent context — useful for skills that dispatch parallel agents.

→A skill is a contract. Define once. Claude executes every time.

Skills Intro · Hands-on17 / 81

Build Your First Skill

Goal Create a skill Claude follows. YAML frontmatter + markdown body.

Create directory

mkdir -p .claude/skills

Write the file

.claude/skills/req.md

Define the process

Checklists + output format

Invoke it

/req

.claude/skills/req.md

--- name: req description: Socratic requirements color: green --- ## Socratic Interview Checklist - [ ] Surface assumptions - [ ] Identify edge cases - [ ] Define boundaries - [ ] WHAT not HOW

Frontmatter = label. Body = recipe.

→Good skill: clear checklists · defined outputs · discipline rules. You design. Claude executes.

Skills Intro · Cost18 / 81

Save Tokens with External Scripts

Problem Embedding bash in skills bloats context. Those tokens burn — again and again.

Pattern Extract logic to scripts. Reference from the skill. Bash loads externally.

DON'T Inline bash

## Run Tests cd src && npm run build && npx jest --coverage ... --testPathPattern='...' && npx eslint . ... --format junit -o ...

~850 tokens per load — loaded into context every time.

DO Reference external script

## Run Tests Execute: ./scripts/run-tests.sh # Full script on disk: # scripts/run-tests.sh

~18 tokens per load — script loaded by Bash tool, not context.

→The math: 850 × 20 = 17,000 wasted vs 18 × 20 = 360 tokens. 47× savings.

Skills Intro · Design19 / 81

Skill Design Patterns

Problem Skills can be ignored or bloated. Four patterns fix accuracy and tokens.

Accuracy

Lead with the Rule

LLMs over-attend to start/end. Buried instructions get lost.

Put the #1 rule at the TOP. Output format at the BOTTOM. Bookend.

Tokens

One Skill Per Workflow

Mega-skills load everything. Most is irrelevant.

req.md · arch.md · tdd.md · review.md. Load only what you need.

Tokens

Use @path, Never Paste

Pasting docs into skills bloats context permanently.

Write @docs/ARCH.md. Loaded only when needed.

Accuracy

Explicit Output Tags

Vague instructions produce vague results.

Use <output_format>, <acceptance_criteria>. Higher precision than prose.

Accuracy

Write Descriptions as Trigger Conditions

The description field is evaluated by LLM reasoning — not keyword matching — to decide when to auto-invoke the skill.

Write: "Use when the user asks for a structured requirements interview" — not: "Requirements skill" (manual invocation only).

→Design for the LLM's attention pattern. Accuracy is a decision.

Skills Intro · Communication19A / 81

Prompting for Precision

Four rules that determine whether Claude follows your process or improvises.

Accuracy

Be Explicit, Not Conversational

"Refactor src/auth/middleware.ts to extract JWT validation into validateToken(), add expired-token handling, add unit tests"

beats: "clean up the auth code"

Context

Investigate Before Answering

Add to CLAUDE.md:

"ALWAYS read relevant files before proposing edits. Never speculate about code you have not opened."

Prevents hallucinated imports and phantom file paths.

Tokens

Only What's Requested

"Only make changes directly requested. Do not refactor adjacent code."

Prevents scope drift — the #1 cause of wasted tokens.

Accuracy

Match Thinking to Complexity

/fastroutine execution

thinkstandard tasks

think hardmulti-step reasoning

ultrathinkarchitectural decisions

→The framework gives you structure. Specificity gives you precision. Process + precise prompts = disciplined engineering.

Skills Intro · Agents19B / 81

Built-in Agents: Explore Before You Build

Two read-only agents that make architecture and exploration safer — shipped inside Claude Code, no setup required.

Explore Agent

Tools: Read, Grep, Glob (read-only — cannot write or edit)

Use for: Understanding an unfamiliar codebase before proposing changes.

How to invoke: Ask Claude to explore before architecting, or use the Agent tool with subagent_type="Explore".

When: Before writing an architecture document. Before touching legacy code.

Plan Agent

Tools: Read-only — cannot modify files.

Use for: Designing architecture grounded in actual codebase state.

How to invoke: Shift+Tab cycles permission modes → Plan mode. Or type /plan.

When: Before implementing a complex feature. Before a refactor.

Both agents are read-only by definition — they literally cannot write files. Safe to run on unfamiliar codebases before you understand them.

→The best architecture is grounded architecture. Explore before you plan. Plan before you code.

The Build · Phase 1 / 520 / 81

The Craft of Requirement Engineering

Problem Vague requirements compound into wrong architecture, wrong tasks, wrong code. Ambiguity is the #1 cause of AI project failure.

Concept A requirement is a contract. Precise, verifiable, readable by anyone — without being in the room.

1 · REQRequirements

→

2 · ARCHArchitecture

→

3 · TASKSTask Gen

→

4 · TDDTDD

→

5 · REVIEWReview

Socratic Interview

Claude asks. You answer. Intent → Behaviors → Edge Cases → Acceptance → Decisions → Artifact. Six phases, every time.

Detail Without Technicality

WHAT not HOW. “The wizard auto-saves” is in. “debounced useEffect” is out. File names, schemas, code belong to Phase 2.

Sprint-Sized Scope

One digestible chunk per REQ. If the conversation reveals more, split — never push scope downstream.

Mode ARaw idea → full Socratic interview, 6 phases.

Mode BExisting PRD → gap-fill interview, diagnoses missing pieces.

→This is the craft. Next: the specific problems our skill was built to solve.

Phase 1 · Why this skill21 / 81

Why We Built plan-requirements

Problem Four recurring failures in requirement gathering that kill projects before the first line of code.

Re-explaining

The Blank Slate

Every session starts from zero. Claude has no memory of your product context — you re-explain everything, or worse, forget to.

Production bugs

Happy Path Bias

Edge cases get skipped — always. Failure modes discovered in production, not requirements. 100× costlier.

Sprint chaos

The PRD Gap

Stakeholder docs aren't sprint-ready. PRDs miss verifiable acceptance criteria, explicit scope, “what a dev gets wrong” nuance.

Deadline miss

Scope Bleed

Large features aren't split at the source. Conversations scope-creep in real time. Bloated requirements break every later phase.

→These are not user failures. They are systemic gaps. That's what a skill fixes.

Phase 1 · How it works22 / 81

plan-requirements: How It Works

Solution Six-phase Socratic flow + readiness checklist + Phase 1 Gate. Structured interview, structured output.

Solves: Blank Slate

Structured 6-phase flow (A–F)

Intent · Behaviors · Edge Cases · Acceptance · Decisions · Artifact. No re-explaining.

Solves: Happy Path Bias

Dedicated Edge Case phase (C)

Systematic probing: input edges, concurrency, dependencies, security, scale.

Solves: PRD Gap

Two entry modes + acceptance criteria

Mode B gap-fills existing PRDs. Every requirement gets a verifiable criterion.

Solves: Scope Bleed

Sprint-sizing + 8-check readiness gate

Must be sprint-sized. Must have scope boundaries. Must be cold-readable.

Phase 1 Gate

1. Can I explain every requirement without ambiguity? 2. Can any teammate read this cold and enter sprint planning with full context?

Both must be YES before the artifact is generated. Both must be YES before Phase 2 begins.

→Output: /specs/requirements/REQ-<slug>.md — Traceable IDs · Decisions log · Scope boundaries · Open questions.

Phase 1 · Live Demo23 / 81

Demo: plan-requirements

step 1Load skill

→

step 2Socratic interview

→

step 3Edge case probing

→

step 4Readiness check

→

step 5Phase 1 Gate

→

step 6REQ artifact

The Interview Style

Claude asks one question at a time. Summarizes before moving on. Offers concrete options when you're unsure.

Edge Case Probing

Watch the “first attempt” probe. “What would a developer get wrong?”

Readiness Check + Phase 1 Gate

Claude won't generate the artifact until all 8 criteria pass — and both gate questions answer YES.

Live Demo

# Command /plan-requirements # Starting prompt "I want to track my AI sessions"

Expected output

/specs/requirements/ REQ-session-tracking.md - Summary + Problem/Motivation - Functional Req (R1, R2, R3…) - Edge Cases table - Decisions Log - Scope: In / Out - Open Questions

→Watch the skill, not just the output. The process is the product.

The Build · Phase 2 / 524 / 81

The Craft of System Architecture

Problem Architecture designed without reading existing code produces designs that don't fit. Wrong patterns, wrong assumptions, wrong boundaries.

Concept Ground in reality before designing. Propose, stress-test, then walk the code.

1 · REQRequirements

→

2 · ARCHArchitecture

→

3 · TASKSTask Gen

→

4 · TDDTDD

→

5 · REVIEWReview

Ground in Reality

Read CLAUDE.md. Run file-tree + search-codebase. Understand existing patterns before proposing new ones. Design for the codebase you have.

Design Collaboratively

You propose. Claude stress-tests. Offer options with tradeoffs. Every decision traces back to a requirement ID.

Know Where It Lands

The Change Footprint: created, modified, deleted, touched-but-not-changed. A design without a footprint is a whiteboard exercise.

GreenfieldDesign phases dominate; footprint is shallow.

BrownfieldPhase D2 (footprint) is center of gravity.

RefactorChange Footprint is the primary deliverable.

→This is the craft. Next: the problems our skill was built to solve.

Phase 2 · Why this skill25 / 81

Why We Built plan-architecture-v2

Problem Four recurring failures in AI-assisted architecture that produce ungrounded, untraceable, unimplementable designs.

Token waste

Speculative Reads Burn Tokens

Claude reads 20+ files to “orient itself.” Every session reinvents discovery. Thousands of tokens on files that never inform the design.

Rewrite

Designs Ignore Existing Patterns

Proposes layered where hexagonal exists. Designs new auth where one works. Conflicts with in-flight migrations.

Unimplementable

Architecture Without Footprint

Design docs describe structure but not where it lands. No file paths. No “what changes here.” Another dev cannot implement.

Production incident

No Stress-Test Pass

Designs shipped without validating against failure scenarios. Rollback paths unclear. Regression risk never assessed.

→These are process failures, not people failures. The skill encodes discipline so the tool enforces it.

Phase 2 · How it works26 / 81

plan-architecture-v2: How It Works

Solution 3-step context gathering + 7-phase flow + Change Footprint Walk + Phase 2 Gate.

file-tree.sh

Maps the codebase shape — no speculative reads.

search-codebase.sh -m 3

Keyword calibration: >100 matches = too broad. <5 files = too narrow.

Targeted Read

2 attempts max. Then Glob. Read only on 3-line preview signal.

AContext

BStructure

CTech

DDesign

D2Footprint

ECross-Cutting

FStress-Test

GArtifact

Change Footprint Walk (D2)

+ New

What gets created, where, following which pattern.

~ Modified

What changes — one line per file.

− Deleted

What goes away, and why.

! Touched

Silent-regression hotspots.

→Phase 2 Gate: can another senior dev implement this from the doc alone, and point to every place this change lands?

Phase 2 · Live Demo27 / 81

Demo: plan-architecture-v2

step 1Load skill

→

step 2Context scripts

→

step 3Design (B–D)

→

step 4Footprint walk

→

step 5Stress-test

→

step 6ARCH artifact

Context Gathering

Watch the bash scripts run — file-tree, then search-codebase. No speculative reads. Only files surfaced by keywords get touched.

Change Footprint Walk

Brownfield project — watch Phase D2 dominate. New files, modified files, touched-but-not-changed.

Stress-Test Pass

Watch Claude challenge its own design. Forward: what breaks at runtime. Backward: what regresses silently.

Live Demo

# Command /plan-architecture from: specs/requirements/REQ-session-tracking.md

Expected output

/specs/architecture/ ARCH-session-tracking.md - Architecture Summary + Tech Choices - Data Models + API Contracts - Change Footprint - Areas of Impact + Risk - Stress-Test Scenarios - Decisions Log

→Watch the grounding. Context first, design second. No whiteboard architecture.

The Build · Phase 3 / 528 / 81

The Craft of Task Generation

Problem Vague to-do items produce vague code. Tasks without test plans produce untested code. Tasks detached from architecture produce wrong code.

Concept A task is a TDD-ready specification. Test plan first. Anchored on architecture. Sized for a single TDD cycle.

1 · REQRequirements

→

2 · ARCHArchitecture

→

3 · TASKSTask Gen

→

4 · TDDTDD

→

5 · REVIEWReview

One File, Full Context

Tasks live inside ARCH-*.md. Architecture + tasks in one document. The TDD agent reads one path. No cross-referencing, no stale links.

Test Plan Before Code

Behavior tests from REQ acceptance. Edge cases from failure modes. Resilience from ARCH stress-tests. Regression guards from touched files.

Footprint-Anchored Scope

Every task maps to a slice of the Change Footprint. Every entry claimed. No drifting off-plan.

2–4 prod filesper task (excluding tests)

3–8 scenariosbehavior + edge + stress + regression

Never xlsplit by endpoint, layer, concern, entity

→This is the craft. Next: the problems our skill was built to solve.

Phase 3 · Why this skill29 / 81

Why We Built generate-tasks

Problem Four recurring failures in task breakdown that turn architecture into chaos at implementation time.

Interpretation drift

Tasks as Vague To-Dos

“Implement auth” — no tests, no files, no boundaries. Every developer interprets differently.

Untested code

No Test-First Discipline

Tests added after to “cover” what was built. Acceptance criteria not translated. Regression risks never tested.

Divergence

Tasks Detached from Architecture

Tasks in Jira/Linear, no link to ARCH files. Footprint entries orphaned. Architecture and tasks diverge.

Silent breakage

No Regression Guard

Touched-but-not-changed files never tested. “Should not affect them” — until it does.

→The gap between architecture and code is where projects die. The skill bridges it with structure, not willpower.

Phase 3 · How it works30 / 81

generate-tasks: How It Works

Solution 5-step flow: Understand → Anchor → Draft Tests → Build Spec → Write to ARCH. Test plan before code.

1Understand

2Anchor

3Draft Tests

4Build Spec

5Write to ARCH

Footprint → Task Files Expected

+ New	→ New carry pattern forward
~ Modified	→ Modified carry “what changes” note
− Deleted	→ Modified diff shows deletion
! Touched	→ Must NOT modify add regression-guard tests

Four Test Scenario Sources

REQ acceptance criteria	→ Behavior tests
REQ edge cases / failures	→ Error / edge tests
ARCH forward stress-test	→ Resilience tests
ARCH touched-but-not-changed	→ Regression-guard tests

Task shape Status · Effort (never xl) · Priority · Dependencies · REQ-IDs · Footprint slice · High-risk callouts

→Output: tasks embedded in ARCH-*.md #Tasks. One file = architecture + decisions + contracts + tasks.

Phase 3 · Live Demo31 / 81

Demo: generate-tasks

step 1Load skill

→

step 2Read ARCH + REQ

→

step 3Draft test plan

→

step 4Anchor on footprint

→

step 5Write tasks

Test Plan Drafting

Watch Claude pull tests from four sources — REQ acceptance, edge cases, ARCH stress-tests, regression risks. Every test traces upstream.

Footprint Anchoring

Watch each task claim a slice of the Change Footprint. New, modified, must-not-modify. Every entry accounted for.

Regression-Guard Tests

Touched-but-not-changed files get explicit regression tests. “Verify existing login flow still works.” Nothing assumed.

Live Demo

# Command /generate-tasks from: specs/architecture/ARCH-session-tracking.md

Expected output

ARCH-session-tracking.md #Tasks: - Task T1: [Clear Title] - Status · Effort · Priority · REQ-IDs - Test Plan: behavior + edge + stress + regression - Files Expected: new / modified / must-not-modify - Implementation Notes + Scope

→Watch the translation. Architecture → tasks → tests. Structure, not willpower.

The Build · Phase 4 / 532 / 81

The Craft of Test-Driven Development

Problem Code written before tests produces untestable code. Batched tests hide which change broke what. Skipped refactoring accumulates debt.

Concept RED → GREEN → REFACTOR. One test at a time. Minimum code to pass.

1 · REQRequirements

→

2 · ARCHArchitecture

→

3 · TASKSTask Gen

→

4 · TDDTDD

→

5 · REVIEWReview

RED

Write one failing test. Run it. Confirm it fails for the right reason — missing function, not syntax error. Collaborative: show the developer and pause. Autonomous: verify and proceed.

GREEN

Write the minimum production code to make the test pass. No more, no less. Run the suite — new test passes, no existing tests broken.

REFACTOR

Assess: duplication, naming, structure. If warranted, propose the refactor. Run the suite again. Only when all tests pass, pick up the next test.

Collaborative (default)Pause at every red/green. Developer confirms.

/tdd autoRuns without pausing. Stops on unexpected failures.

→Both modes: one test at a time, minimum code, verify failure reason, respect Must NOT Modify, stop on ambiguity.

Phase 4 · Why this skill33 / 81

Why We Built the TDD Skill

Problem Four discipline failures that turn good task specs into bad code.

Verification theater

Jumping Ahead

Production code first, test retrofitted. Test verifies code exists, not that it meets the requirement. TDD in name only.

Archaeology

Batching Tests

Multiple tests written before any pass. When one breaks, you cannot tell which change caused it.

Fragile tests

Mocking Internals

Mocking the thing being tested. Refactoring breaks tests that should still pass. Change-detectors, not behavior-verifiers.

Scope creep

Scope Drift

Silently touching files outside the task. “I noticed this was broken so I fixed it.” No test plan, no review.

→TDD is not a technique — it is a discipline. Discipline requires enforcement, not intention.

Phase 4 · How it works34 / 81

The TDD Skill: How It Works

Solution Two modes. RED→GREEN→REFACTOR cycle. Before-you-start checklist. Footprint-respecting. Phase 4 Gate.

Test Discipline

Arrange-Act-Assert	One behavior per test
Import from prod path	Even if module doesn't exist yet
Mock boundaries	Not internals — external deps only
Verify failure reason	Missing function, not syntax error

Production Code Discipline

Minimum to pass	No more, no less
Follow existing patterns	From Implementation Notes
Respect Files Expected	Only create/modify listed files
Must NOT Modify = sacred	Regression-guard tests verify

Phase 4 Gate

Do all tests pass and does the code match the architecture decisions from Phase 2?

→That's the skill on paper. Now let's see it in action.

Phase 4 · Live Demo35 / 81

Demo: Test-Driven Development

step 1Load skill

→

step 2Before-start check

→

step 3RED test

→

step 4GREEN code

→

step 5REFACTOR?

→

step 6Full suite

Watch the RED Step

Claude writes one test. Runs it. Verifies it fails for the right reason — missing function, not syntax error. Collaborative: pauses to show you.

Watch the GREEN Step

Claude writes minimum code — just enough to pass. Run the suite. New test green, existing tests unbroken.

Watch Scope Discipline

Footprint-respecting implementation. Files Expected honored. Must NOT Modify untouched. If drift detected, skill pushes back.

Live Demo

# Command /tdd from: specs/architecture/ARCH-session-tracking.md

Expected output

- Tests pass: task test plan complete - Full suite: no regressions - Task status: done - Files created/modified per spec - Must NOT Modify files: untouched - Next: /review task

→Watch the cycle. RED → GREEN → REFACTOR. One test at a time. Discipline, not speed.

The Build · Phase 5 / 536 / 81

The Craft of Code Review

Problem Rubber-stamp approvals. Review everything or nothing. No severity triage. Reviewers who fix instead of flag. Review theater.

Concept Triage first, review second. Human confirms scope. Sub-skills review in parallel. Read-only — flag, don't fix.

1 · REQRequirements

→

2 · ARCHArchitecture

→

3 · TASKSTask Gen

→

4 · TDDTDD

→

5 · REVIEWReview

Triage, Don't Drown

Review everything = review nothing well. Propose checks; developer confirms. Task completion always in pipeline mode. Security always when user-facing.

Orchestrator Pattern

One skill dispatches many. Parallel agents, each reading its own SKILL.md. Filtered diffs. Bounded scope. Parallel execution.

Read-Only Discipline

Flag findings. Do not fix. Do not write code. The developer decides. Separation of detection and remediation.

Critical

Blocks merge

High

Strongly blocks

Medium

Should fix

Low

Suggestion

Manual

Dev checks

→This is the craft. Next: the problems the skill was built to solve.

Phase 5 · Why this skill37 / 81

Why We Built the Review Skill

Problem Four failures that turn code review into approval theater — or avoidance.

Merge lottery

Rubber-Stamp Approvals

“LGTM” on 500 lines without reading. No architecture verification. No security scan.

Inconsistent depth

Review Everything or Nothing

A typo fix gets the same depth as a new auth system. Large PRs get superficial review because nobody has time.

Signal lost

No Severity Discipline

A missing semicolon gets the same attention as a SQL injection. Style preferences drown out security findings.

Two jobs

Reviewer as Janitor

Reviewer renames a variable, pushes a commit. Developer never learns why it was wrong. Two jobs badly instead of one job well.

→Review is the last line of defense. When it fails, all prior phases' value is lost.

Phase 5 · How it works38 / 81

review: How It Works

Solution Triage → Dispatch parallel agents → Collect → Deduplicate → Compile report. Human confirms scope at every step.

Pipeline Mode

Verify implementation against ARCH + REQ. Task completion always included. Source chain: REQ → ARCH → task → code.

General Mode

PR / branch / staged / diff file. Gather diff, detect stack, propose checks, dispatch. No spec verification.

1Read changeset

2Propose checks

3Developer confirms

416 parallel agents

5Collect + dedupe

6Verdict

Parallel Dispatch

Filtered diff per agent. React → .tsx only. Each reads its own SKILL.md.

Collect + Compile

Same line → highest severity wins. Merge comments. Insights combined.

Verdict

PASS PASS-FINDINGS FAIL

Each of the 16 review agents is defined this way — .claude/agents/code-reviewer.md

--- name: code-reviewer description: Reviews code for quality, security, and correctness. Use when the user asks for a code review. tools: Read, Grep, Glob model: sonnet isolation: worktree --- You are a code reviewer. Flag findings, do not fix.

→Phase 5 Gate: would I mass-merge this without reading it? If yes — I haven't reviewed properly.

Phase 5 · Live Demo39 / 81

Demo: Code Review

step 1Load skill

→

step 2Triage

→

step 3Dispatch agents

→

step 4Collect findings

→

step 5Deduplicate

→

step 6Report + verdict

Watch the Triage

Claude proposes which checks to run. Task completion always in pipeline. Security when user-facing. You confirm or adjust.

Watch Parallel Dispatch

Multiple agents spawn simultaneously. React patterns gets .tsx. DB patterns gets migrations. Each reads its own SKILL.md.

Watch the Verdict

PASS / PASS-FINDINGS / FAIL. Every finding: severity + source chain + concrete next steps.

Live Demo

# Command /review

Expected output

- Executive Summary + Verdict - Per-file findings (severity + line) - Source chain context (REQ/ARCH) - Concrete next steps per finding - Checklist summary: OK / WARN / BLOCK - Re-review protocol for fixes

→One skill, many reviewers. Structure, not personality.

Guardrails · Problem 142 / 81

The Credential Near-Miss

⚠ NEAR-MISS REPORT sess_9a8b7c6d5e · 14:32 UTC

The Story

During TDD, Claude created a test fixture with a dummy API key. The value isn't real — but it matches the regex pattern for an OpenAI key (sk-...).

Risk

If committed, the key would live in git history forever.

What saved us: manual review. But manual review is unreliable.

The Insight

We need an automatic guard that blocks the commit before it happens.

# The dummy that almost shipped: const apiKey = "sk-aBcDeFgHiJkLmNoPqRsTuVwXyZ"; # Looks fake. Matches the regex. # Once in git history → forever.

This is where Hooks come in. →

→Manual review is the last line of defense. Hooks are the first.

Guardrails43 / 81

Hooks: Automatic Guardrails

Problem Claude can make destructive changes — commit credentials, delete files, force-push. These happen even when Claude is “trying” to help.

Solution Hooks = shell commands that fire at lifecycle events. Claude doesn't control them. They run automatically.

The blocking mechanism

Exit code 0 = allowed. Exit code 2 = BLOCKED.

Lifecycle Events

Event	Timing	Can Block?	Typical Use
PreToolUse	BEFORE tool runs	YES	Credential guard, destructive command check
PostToolUse	AFTER tool runs	No	Auto-format, logging
Notification	Claude needs input	No	Desktop alert — makes Auto mode practical
UserPromptSubmit	Before prompt processed	No	Inject git branch, project state automatically
PreCompact	BEFORE compaction runs	No	Save critical context before summary
Stop / SessionStart	Finish / Startup	No	Informational only

→PreToolUse = ultimate safety layer. Notification = what makes Auto mode practical without watching the terminal.

Guardrails44 / 81

PreToolUse in Action: Credential Guard

.claude/hooks/PreToolUse-credential-guard.sh

#!/bin/bash # PreToolUse hook: blocks commits with credentials STAGED_FILES=$(git diff --cached --name-only) for file in $STAGED_FILES; do # OpenAI keys if grep -E 'sk-[a-zA-Z0-9]{20,}' "$file" 2>&1; then echo "BLOCKED: OpenAI key"; exit 2; fi # AWS keys if grep -E 'AKIA[0-9A-Z]{16}' "$file" 2>&1; then echo "BLOCKED: AWS key"; exit 2; fi # Private keys if grep -E 'BEGIN.*PRIVATE KEY' "$file" 2>&1; then echo "BLOCKED: Private key"; exit 2; fi done; exit 0

What This Blocks

OpenAI API keys (sk- pattern)
AWS access keys (AKIA pattern)
Private keys (RSA / EC / OPENSSH)
Database URLs with passwords

🛡 Key Truth

Claude can bypass permissions — CANNOT bypass PreToolUse exit code 2. The exit code is the ultimate safety layer.

→Three regexes, three lines of defence. API keys never reach git.

Guardrails · Problem 245 / 81

More Hooks: Auto-Format & Lint

Problem Code passed tests but didn't match project formatting standards. Manual formatting is tedious and inconsistent.

.claude/hooks/PostToolUse-auto-format.sh

#!/bin/bash # PostToolUse: auto-format after Claude edits TOOL_NAME=$1 if [[ "$TOOL_NAME" == "Write" || \ "$TOOL_NAME" == "Edit" ]]; then npx prettier --write "**/*.{ts,tsx}" npx eslint --fix "src/**/*.{ts,tsx}" git add -A fi

The Effect

Every file Claude touches is auto-formatted
Lint errors are fixed automatically
Zero manual work — happens after every edit

Hook Types Summary

Type	Speed	Use Case
command	Fast	Formatting, linting
prompt	Slow	Smart validation
agent	Variable	Complex workflows
http	Network	External integrations

Prompt Hook — LLM-evaluated (Haiku by default), no script file needed

// .claude/settings.json — prompt hook { "hooks": { "PreToolUse": [{ "matcher": "Bash", "hooks": [{ "type": "prompt", "prompt": "Check if this bash command is safe. Block (exit code 2) if it contains rm -rf, DROP TABLE, or force push to main." }] }] } }

Claude (Haiku) evaluates the action semantically — two lines of config catches what regex can't.

→PostToolUse can't block — but it can clean up after Claude every single time.

Guardrails · Live Demo46 / 81

Demo: Hooks Blocking a Commit

Step-by-Step

Claude creates a test fixture with a dummy API key (sk-dummy12345…)
Claude stages the file with git add
Claude attempts to commit
PreToolUse hook fires — credential guard detects sk- pattern → BLOCKED
Error message tells Claude exactly what to fix: “Replace with clearly fake value like sk-FAKE-TEST-KEY”
Claude fixes the key → commit succeeds ✓

Before Hooks

API key committed to git
Permanent history damage
Manual cleanup required
No automated protection

After Hooks

Commit blocked automatically
Clear error message guides fix
Zero damage to git history
Claude learns from feedback

→Hooks don't just catch mistakes — they teach. The error message tells Claude exactly what to fix.

Guardrails · Reflection47 / 81

Guardrails Reflection

What Did We Learn?

PreToolUse is the one hook you cannot skip. Exit code 2 = blocked. Works even when Claude bypasses permissions.
Credential guards are non-negotiable. API keys in git = permanent damage. Block before the commit happens.
PostToolUse automates the boring stuff. Formatting, linting — every file Claude touches gets cleaned automatically.
Hooks teach, not just block. Good error messages tell Claude exactly what to fix and how.
Skills = on-demand. Hooks = always active. They complement, don't compete.

→Claude can read every line of code. But it has never seen the app. Next: giving Claude eyes.

Guardrails · Security47B / 81

Sandboxing: OS-Level Isolation

Hooks block what you anticipate. Sandboxing blocks everything else.

What It Does

macOS: seatbelt (sandbox-exec) restricts filesystem and network access
Linux: bubblewrap (bwrap) for the same guarantee
Bash commands run in an isolated process — can't touch the broader filesystem or network without explicit permission

How to Enable

{ "sandbox": { "enabled": true } }

One line in settings.json. No script file needed.

The Payoff

84%

fewer permission prompts. Not because you allowed more — because the sandbox eliminated the need to ask about low-risk, isolated commands.

Relationship to Hooks

PreToolUse hooks → block specific known-bad actions
Sandboxing → OS-level isolation for everything else

Two layers. Hooks are the allow/deny list. Sandboxing is the perimeter.

→Skills = on-demand. Hooks = always active. Sandboxing = always isolated. All three together = Auto mode you can actually trust.

UI Automation · The Problem49 / 81

Manual UI Verification Is Tedious

The Manual Checklist

1. Start dev server 2. Open browser 3. Check dashboard loads 4. Verify charts render 5. Test date filter works 6. Check console for errors 7. Test responsive on mobile 8. Verify dark mode works 9. Take screenshot 10. Close browser

Why This Is a Problem

Time

5–10 minutes per change.

Blindness

Claude reads code, not pixels.

Leaks

Visual bugs slip through.

Repeat

No reproducibility.

“We need Claude to SEE the app, not just read the code.”

→Manual UI checks are velocity killers. Time for a new tool.

UI Automation · Concept50 / 81

MCP: Giving Claude Eyes and Arms

Concept Model Context Protocol (MCP) connects Claude to external tools. Bash is powerful but limited — Claude can't browse or take screenshots. MCP changes that.

Capability	Without MCP	With Playwright MCP
Read code	Reads files, “looks correct”	Navigates URL, sees actual UI
Screenshots	Not possible	Captures viewport automatically
Console errors	Manual check only	Reads errors programmatically
DOM data	Cannot extract	Extracts accessibility tree
Verification	Manual → slow	Automated → fast, reproducible

How It Works

Claude decides

→

Calls MCP tool

→

MCP opens browser

→

Returns result

→

Claude sees!

→MCP extends Claude beyond the terminal. Playwright MCP = Claude can see the app.

UI Automation · Setup51 / 81

Setting Up Playwright MCP

3-Step Setup

Install the package

npm install -D \ @anthropic-ai/playwright-mcp

Configure .mcp.json

{ "mcpServers": { "playwright": { "command": "npx", "args": [ "@anthropic-ai/playwright-mcp" ] } }}

Verify with /context

Type /context in Claude Code to verify the Playwright MCP server is connected.

Key Playwright MCP Tools

navigate

Load a URL in the browser

screenshot

Capture the current viewport

getConsoleMessages

Read browser console errors

snapshot

Get accessibility tree of page

evaluate

Run JavaScript in the browser

click / fill / select

Interact with page elements

Where does .mcp.json live? — Three Scopes

.mcp.json in repo root

Committed to git. Whole team gets the same MCP servers.

.claude/settings.local.json

Gitignored. Personal overrides only.

~/.claude.json

Applies to all projects on this machine.

Use Project scope for Playwright — your team should verify UI the same way you do.

→One config file, one npm install. Claude now has a browser at its command.

UI Automation · Live Demo52 / 81

Demo: Playwright Takes Over

Automated UI Verification

1. Start dev server (localhost:5173) 2. Tell Claude: "Verify the dashboard looks correct" 3. Watch Claude use Playwright MCP: ✓ navigate → page loaded 200 ✓ screenshot → saved to file ✓ getConsoleMessages → 0 errors 4. More verifications: ✓ evaluate → 3 charts rendered ✓ evaluate → "129 sessions" found 5. Result: Dashboard verified in 30 seconds.

Manual Check

10 minutes per change
Human interaction required
Inconsistent coverage
Easy to skip steps

Playwright MCP

30 seconds, fully automated
Zero human interaction
Same checks every time
Fully reproducible

→Without Playwright, Claude reads code. With Playwright, Claude sees the app.

UI Automation53 / 81

More Essential MCP Servers

Server	Purpose	When to Use
Qdrant	Semantic search over docs	Search across project docs, ADRs, past decisions
Context7	Up-to-date library docs	Latest API signatures when training data is stale
PostgreSQL	Database queries	Verify schema, run queries, check migrations
Figma	Design-to-code	Access design specs programmatically from mockups
Registry	modelcontextprotocol.io	Hundreds more servers. Add as workflow evolves.

Tool Search: Save Tokens

Tool Search defers MCP tool definitions — they load on demand rather than at startup. Reduces context from ~72K to ~8.7K tokens. Controlled via the ENABLE_TOOL_SEARCH environment variable. Set to 0 to disable if you need all tools loaded upfront.

We'll talk more about managing context in the Session Hygiene section.

→MCP composes: start with Playwright + Qdrant + Context7. Add servers as workflow evolves.

UI Automation · Reflection54 / 81

UI Automation Reflection

What Did We Learn?

MCP = Claude gets superpowers. Browser, database, search, design — MCP servers extend Claude far beyond the terminal.
Playwright MCP eliminates manual UI checks. 30 seconds for full verification, not 10 minutes. Fully automated, fully reproducible.
Tool Search saves tokens. On-demand loading instead of startup loading. Reduces context from ~72K to ~8.7K tokens.
MCP servers compose. Start with Playwright + Qdrant + Context7. Add as workflow evolves.
Introduced when pain was felt. Not “just in case” — because manual UI verification was killing velocity.

→MCP gives Claude eyes. But every session, Claude still forgets your conventions. Next: persistent memory.

UI Automation · Development Loop54A / 81

Frontend Workflow: Code, Screenshot, Iterate

Playwright isn't just for verification — it's for visual iteration. Claude sees what you see.

The Pattern

Write code → Playwright screenshot → Claude sees output → Claude adjusts → repeat

Contrast

Verification: "Does it work?" → assertions, pass/fail
Iteration: "Does it look right?" → screenshot, visual judgment, adjust

Example Prompt

"Build the dashboard. After each change, take a screenshot and tell me if the layout matches this design spec."

Key Tools for Iteration

screenshot → capture visual state
evaluate → check computed styles & positions
fill_form → interact with forms to test states
browser_navigate → move between pages mid-iteration

→Claude sees the screenshot. Use that. Visual iteration closes the loop between "code runs" and "design matches."

Persistent Context56 / 81

The Goldfish Problem

You spend forty minutes teaching Claude your authentication system. You close the terminal. Open a new session. Claude has no idea. It is a stranger again.

Story 1 — Logging

During the build, Claude used console.log for debugging. Our team convention: structured logger with correlation IDs. We corrected it.

Next session — console.log again. Three sessions. Three corrections.

Story 2 — Naming

Claude generated database columns in camelCase. Our convention: snake_case. Corrected. Forgotten.

Corrected. Forgotten. The agent has no way to remember "how we do things here."

"LLMs don't have memories. They have context windows. Those windows are finite. And every session, the window starts empty."

Persistent Context57 / 81

Why Agents Forget

Five technical reasons every session starts from zero. Understanding the mechanism is the first step to fixing it.

Stateless Inference

Model weights are frozen. No real-time learning happens during conversation. Every response is computed from scratch.

Context Rebuilt Per Call

Full history is re-sent every API call. There is no persistent memory store — only the messages you see.

Context Window Overflow

Oldest messages are silently dropped when the window fills. Your early instructions are the first to disappear.

No Persistent Storage

Session ends, context evaporates. Nothing is saved between conversations unless you explicitly write it to disk.

Indexing Without Understanding

Search finds text, not meaning. Retrieval gives you chunks, not comprehension. The agent still needs to reason.

The Insight

Skills solve the process problem — Claude follows your phases. But skills don't solve the conventions problem. We need a way to tell Claude: "These are the rules. Every session. Permanently."

Persistent Context58 / 81

Claude Code's Memory System

Claude Code doesn't have one memory system. It has four — each solving a different part of the problem.

📝

CLAUDE.md

Written by you. Instructions, rules, conventions. Loaded fully into context every session. Version-controllable. The foundation of persistent memory.

🤖

MEMORY.md

Written by Claude. Learnings and observations from working with you. First 200 lines only. Machine-local, not shared across devices.

📁

Rules

Conditional instructions scoped to file paths. API rules for API files. Frontend rules for frontend. The #1 mechanism for reducing always-loaded context.

⚡

Commands

Runtime tools — /init, /memory, /compact, /context, /cost. Manage memory actively. Compact before degradation.

→Four layers, one principle: critical rules in files, not chat history.

Persistent Context59 / 81

CLAUDE.md: Your Most Powerful Lever

Hierarchy — All Files Concatenate

Level	Location	Who Writes	Scope
Project	./CLAUDE.md	You	This repository
User	~/.claude/CLAUDE.md	You	All projects — global conventions
Rules	./.claude/rules/*.md	You	Per file-type, conditional
Local	./CLAUDE.local.md	You	Gitignored — personal only
Auto	~/.claude/projects/.../MEMORY.md	Claude	Machine-local only
Subdirectory	./src/api/CLAUDE.md	You	Lazy-loaded when Claude operates in that directory

@path import syntax — reference files without bloating CLAUDE.md

Use the auth patterns from @docs/auth-standards.md See database conventions at @docs/db-standards.md

Subdirectory CLAUDE.md files load lazily — only when Claude reads files in that directory, keeping startup context lean.

✏️

The 80-Line Rule

Over 200 lines = reduced adherence. Anthropic targets ~60 lines. Models follow ~150–200 instructions; Claude's system prompt takes ~50. Every line beyond 120 competes with code context for attention.

Lost-in-the-middle effect: models over-attend to start and end. Put critical instructions at the TOP. Use @path imports to reference files without bloating CLAUDE.md.

→Loaded every session, automatically. No slash command. The single highest-leverage file in your repo.

Persistent Context67 / 81

Auto Memory: Claude Writes Its Own Notes

When you correct Claude, it detects patterns, checks if they're already known, and writes new entries to MEMORY.md.

How It Works

You correct Claude on a convention or decision.

Claude detects the pattern in your correction.

Checks if the pattern is already in MEMORY.md.

If new, writes a concise entry for future sessions.

Example MEMORY.md Entries

# Auto-generated by Claude - "The project uses bun, not npm" - "Always run typecheck before committing" - "User prefers explicit return types" - "Never use console.log — use the logger utility" - "Use snake_case for database columns"

Key Differences from CLAUDE.md

CLAUDE.md — written by you, full file loaded, version-controllable. MEMORY.md — written by Claude, first 200 lines only, machine-local, not shared. Both concatenate — they don't override.

Persistent Context68 / 81

Rules: Conditional Memory

Problem API code and frontend code need different standards. Loading all rules wastes context.

Two Types

WITH PATHS Only when editing matching files

--- name: api-standards description: Standards for API and route files paths: ["src/api/**/*.ts", "src/routes/**"] --- # API Standards - Always use async/await, never callbacks - Validate all inputs at route boundaries

Validate with zod
Return 400/500 status codes
Include rate limit headers
Use TypeScript strict mode

The #1 mechanism for reducing always-loaded context.

WITHOUT PATHS Every session, universal

Use TypeScript strict mode
No console.log in production code
Write tests for all new features
Prefer explicit return types

Load only the rules relevant to the files being edited.

Rules without paths: load every session unconditionally — use sparingly as they consume context on every task.

Decision Rule

"Does this need to be true everywhere, or only in certain parts?" Everywhere → without paths. Certain parts → with paths. Path-scoped rules are the #1 context optimization.

Persistent Context69 / 81

The Memory Commands

Five slash commands to manage memory actively. Don't wait for problems — manage context proactively.

Command	Purpose	When to Use
/init	Generate starter CLAUDE.md from codebase	Starting a new project — scans files, infers conventions
/memory	Edit memory files in system editor	Need to update CLAUDE.md or MEMORY.md manually
/compact	Summarize conversation history	Freeing context space before it degrades
/context	Visualize context usage as colored grid	Debugging what is loaded and what is not
/cost	Show token usage and cost	Monitoring spend and efficiency
/powerup	Interactive lessons built into Claude Code	Weekly skill refresh — learning by doing inside the tool

⚠️

The 60% Rule

Compact at 60%, not when warnings fire at 80–95%. By the time you see warnings, quality has already degraded. The difference between a quality summary and a degraded summary is permanent — you cannot recover lost context.

// .claude/settings.json — PreCompact hook { "hooks": { "PreCompact": [{ "hooks": [{ "type": "command", "command": "echo \"Decisions: $(date)\" >> .claude/session-log.md && cat .claude/working-notes.md >> .claude/session-log.md" }] }] } }

PreCompact fires before compaction summarizes your session. Use it to persist decisions that a summary might lose — architectural choices, rejected approaches, open questions.

Persistent Context70 / 81

Lost in the Middle

Research by Liu et al. (2023) shows language models over-attend to the start and end of context — the middle gets lost. This is not a bug. It is a property of how attention works.

Attention Pattern

START

▲

MIDDLE

▼

END

▲

High → Low → High attention

Symptoms You See

Claude duplicates code it wrote earlier in the session
Forgets start-of-session conventions and project structure
Hallucinates file paths and API signatures
Responses get slower as the session grows
Costs spike without more work being done

Three Implications

1. Critical rules go in CLAUDE.md (start of context), not conversation.

2. Compact BEFORE degradation — not after symptoms appear.

3. Must-remember rules belong in FILES, not chat history.

Context Assembly Order — why CLAUDE.md survives, chat history doesn't

System prompt (always loaded) → Output style → Git state → CLAUDE.md files ← YOUR RULES LIVE HERE → MCP tool definitions → Skill descriptions → Conversation history ← GROWS UNTIL COMPACTION

CLAUDE.md loads near the top of context — high attention zone.

Conversation history is at the bottom — the first thing compaction summarizes away.

Rules in CLAUDE.md survive. Rules in chat don't.

Persistent Context71 / 81

Best Practices: The Memory Playbook

Memory systems only work if you maintain them. Six practices that separate teams that get compounding value from those that get frustration.

Keep CLAUDE.md under 80 lines

Critical rules at the top. Every line beyond 120 competes with others for attention. Use @path imports for detail.

Confirm Good Decisions

Memory learns caution, not correctness. Confirm what works so Claude remembers it. Silence teaches nothing.

Use Path-Scoped Rules

API rules for API files. Frontend rules for frontend. The #1 context optimization. Load only what you need.

Compact at 60%

Quality summary vs. degraded summary. Do not wait for the 80% warning. The difference is permanent.

Quarterly Review

15 minutes every quarter. CLAUDE.md grows stale as your codebase evolves. What was true in January may be wrong in April.

Start with Official MCP Memory

Free. Local. Five minutes to set up. The best starting point for any team before building custom systems.

→"The teams that win aren't the ones with the most sophisticated memory system. They're the ones that maintain the system they have."

Persistent Context72 / 81

Memory Is the Differentiator

Process without memory is Groundhog Day. Memory turns repetition into compounding. Three pillars that make agentic engineering sustainable.

🏗️

Structure

CLAUDE.md + Rules + Skills = static foundation. Written by you. Version-controlled. Loaded every session.

🔄

Adaptation

MEMORY.md + MCP servers = dynamic learning. Written by Claude. Updated automatically. Learns from corrections.

🌐

Portability

AGENTS.md + file-based config = cross-tool future. Not locked to Claude Code. Works with any agent that reads files.

"Process without memory is Groundhog Day. Memory turns repetition into compounding."

Persistent Context · Reflection73 / 81

Persistent Context Reflection

What Did We Learn?

Agents have no native memory. Context windows are finite, stateless, and reset every session. Files are the only persistent layer.
CLAUDE.md is your most powerful lever. Keep it under 80 lines. Critical rules at the top. Use @path imports for detail.
Four memory systems work together. CLAUDE.md (you write), MEMORY.md (Claude writes), Rules (conditional), Commands (runtime management).
Rules scope instructions to file paths. The #1 context optimization — load only what is relevant to the files being edited.
Compact at 60%, not 80%. Quality summary vs. degraded summary. Set a PreCompact hook to protect critical context.
Maintain your memory system. Quarterly reviews, confirm good decisions, start with MCP memory. Maintenance beats sophistication.

→Three layers — CLAUDE.md always loaded, Rules conditionally loaded, Skills on demand. Structure substitutes for memory.

Configuration & Trust68 / 81

The Configuration Collision

You set TypeScript strict mode. Your colleague disables it locally. CI uses a managed policy. Who wins?

The Story

During the build, I had Claude configured for Auto mode — it acted freely. A colleague cloned the repo and Claude immediately ran rm -rf on a test directory without asking. Same project, different safety level.

The problem: there was no configuration hierarchy. No way to say “team standard overrides personal preferences, but enterprise policy overrides everything.”

Settings hierarchy

Solves: whose configuration wins when they conflict.

Permission modes

Solves: how much autonomy Claude should have per project.

Configuration & Trust69 / 81

The 5-Level Settings Hierarchy

Five levels of settings load on startup. Higher number wins.

LVL 5

Managed

Enterprise / OS level — cannot override.

LVL 4

CLI Arguments

--model, --effort — passed at command line.

LVL 3

Local

.claude/settings.local.json — gitignored, personal overrides.

LVL 2

Project

.claude/settings.json — committed, team standard.

LVL 1

User

~/.claude/settings.json — global defaults.

🔑Settings OVERRIDE — highest wins. CLAUDE.md files CONCATENATE — all applicable load together.

Configuration & Trust70 / 81

Permission Modes: The Trust Gradient

The question How much autonomy should Claude have? Too much = dangerous. Too little = slow.

RECOMMENDED

Default

Asks permission

Learning, unfamiliar codebases.

EXPLORE

Plan

Read-only exploration

Architecture understanding.

ADVANCED

Auto

Acts freely

Trusted workflows with hooks.

Trust Gradient

New project

→

Default (asks)

→

Hooks solid

→

Auto (acts freely)

The missing layer: Sandboxing

Hooks catch specific bad actions you've anticipated. Sandboxing catches everything else — OS-level process isolation (macOS seatbelt / Linux bubblewrap).

{ "sandbox": { "enabled": true } }

Result: 84% fewer permission prompts — the practical unlock for Auto mode.

Deny rules — write these before enabling Auto

{ "permissions": { "deny": [ "Bash(rm -rf *)", "Bash(git push --force *)", "Bash(DROP *)" ] } }

Deny rules are checked first. Full syntax on the next slide.

→Revised graduation: Default → hooks solid → sandbox on → deny rules set → Auto. Full permission rules syntax: next slide.

Config & Trust · Security70B / 81

Permission Rules: Allow & Deny Syntax

The question Auto mode without rules is trust without boundaries. Here's how to set them.

The permissions block — settings.json

{ "permissions": { "allow": [ "Bash(npm run *)", "Bash(git *)", "Read(**)", "Edit(./src/**)" ], "deny": [ "Bash(rm -rf *)", "Bash(git push --force *)", "Bash(DROP *)" ] } }

Rule Syntax

Tool → applies to all uses of that tool
Tool(specifier) → specific pattern only
** → any path · * → any segment

Read(**) → allow reading any file Edit(./src/**) → only inside src/ Bash(npm run *) → any npm script Bash(rm -rf *) → deny rm -rf

Evaluation Order

1. Deny rules checked first

2. Allow rules checked next

3. Prompt if neither matches

Deny wins over allow. Be explicit about what to block.

→You don't have to trust everything to trust Auto mode. Deny the dangerous, allow the routine, prompt the rest.

Operational Discipline71 / 81

The Degrading Session

Every message fills the context window. Quality degrades silently.

The Story

45 minutes into the build, Claude started referencing files that didn't exist. Responses got slower. Costs spiked. We didn't notice until Claude generated a component that imported from a hallucinated path — from a previous conversation turn that was no longer relevant.

0 – 15 min

Sharp

Accurate, fast responses.

15 – 30 min

Drifting

Gradually losing earlier context. Costs rising.

30+ min

Degraded

Hallucinations, stale references, expensive errors.

→You need operational discipline. Not glamorous — but it separates sustainable use from expensive frustration.

Operational Discipline72 / 81

Session Hygiene: Keep Sessions Lean

Command	When to Use	What It Does
/clear	Between unrelated tasks	Clears conversation. Start fresh with empty context.
/context	When session feels slow	Shows what's loaded — skills, files, MCP tools. Diagnose bloat.
/cost	Regularly	Displays token usage and cost. Track spend.
/compact	Before major new tasks	Summarizes context into a checkpoint. Reduces token load.
--continue	After closing and reopening terminal	Resume the most recent session without starting fresh.
--resume <id>	When you need a specific past session	Resume by session name or ID.

Thinking Levels — match effort to complexity

level 1/fast

→

level 2think

→

level 3think hard

→

level 4think harder

→

level 5ultrathink

→Operational rhythm: Start session → /context → work → /cost → /compact → /clear for new task.

Operational Discipline · Reflection73 / 81

Operational Discipline Reflection

What Did We Learn?

Sessions degrade silently. Quality drops, costs spike, hallucinations increase — you don't notice until something breaks.
/clear between unrelated tasks. /compact at checkpoints. /context to diagnose. /cost to track spend.
Match thinking effort to complexity. /fast for execution. ultrathink for architecture. Everything in between for everything in between.
Session hygiene is operational discipline. Not glamorous, but it separates sustainable use from expensive frustration.

→That's the operating environment — five layers, from safety to operations. Let's bring it all together.

Synthesis74 / 81

The Complete Picture

Act 1

The Hook

Why discipline matters.

Act 2

The Build

The 5-phase process in action.

Act 3

The Environment

What makes the process reliable.

Safety

Hooks & Guardrails

Capability

MCP & Playwright

Instructions

CLAUDE.md & Rules

Configuration

Settings & Permissions

Operations

Session Hygiene

“Vague instructions produce vague code. Precise instructions produce precise code.”

“Structure substitutes for memory.”

“Skills = on-demand. Hooks = always active. CLAUDE.md = always loaded.”

→From “vibe coding” to disciplined engineering with AI agents. No black boxes. No magic spells.

Synthesis · Next Steps74B / 81

Where to Go From Here

This course gave you the framework, skills, and environment. The frontier moves weekly. Here's how to stay ahead of it.

Built-in Lessons

/powerup — interactive lessons inside Claude Code itself. Free. No new tab. Run it weekly.

Stay Current

Anthropic engineering blog. docs.anthropic.com/release-notes/claude-code. New hooks, new MCP servers, new permission features drop regularly.

Community

awesome-claude-code (21.6k GitHub stars) — curated skills, hooks, workflows. Faster than docs for real-world patterns.

Deep Dives

Headless mode & Claude Agent SDK → programmatic agents
CI/CD with GitHub Actions → automate the 5-phase pipeline
Git worktrees → parallel agent isolation
Plugin development → distribute your skills

One Action Today

Run /powerup.
Check your CLAUDE.md line count. If it's over 80 lines, refactor.
Add one hook you haven't added yet.

→Stay curious. The engineers shipping fastest aren't using the newest tools — they're using the tools they understand deeply.

The Complete Course · Fin

Three Acts. From problem to mastery.

From vibe coding
to disciplined
engineering.

No black boxes. No magic spells. The framework, the skills, and the environment — yours to take into your team.

Created by

Foyzul Karim

linkedin.com/in/foyzul

[ course URL: TBD ]

[ github: TBD ]

[ community: TBD ]

Speed withoutdisciplineis not engineering.

Speed without discipline is not engineering.

The Paradigm Shift

Quality at Risk

Cost Escalation

Course Architecture

Why This Course Exists

5-Phase Framework

Extension Architecture

Live Demonstrations

Hands-On Learning

Who This Course Is For

What Is an Agent?

What Is a Large Language Model?

How LLMs Power Agents

The Agent as Orchestrator

The Problem With “Vibe Coding”

The 5-Phase Agentic Framework

Three Scenarios, One Framework

What Did We Learn?

Claude Code: Your Agentic CLI

The .claude/ Directory

Skills: Automate Your Workflows

Build Your First Skill

Save Tokens with External Scripts

Skill Design Patterns

Prompting for Precision

Built-in Agents: Explore Before You Build

The Craft of Requirement Engineering

Why We Built plan-requirements

plan-requirements: How It Works

Demo: plan-requirements

The Craft of System Architecture

Why We Built plan-architecture-v2

plan-architecture-v2: How It Works

Demo: plan-architecture-v2

The Craft of Task Generation

Why We Built generate-tasks

generate-tasks: How It Works

Demo: generate-tasks

The Craft of Test-Driven Development

Why We Built the TDD Skill

The TDD Skill: How It Works

Demo: Test-Driven Development

The Craft of Code Review

Why We Built the Review Skill

review: How It Works

Demo: Code Review

The OperatingEnvironment.

Guardrails.

The Credential Near-Miss

Hooks: Automatic Guardrails

PreToolUse in Action: Credential Guard

More Hooks: Auto-Format & Lint

Demo: Hooks Blocking a Commit

What Did We Learn?

Sandboxing: OS-Level Isolation

UI Automation.

Manual UI Verification Is Tedious

MCP: Giving Claude Eyes and Arms

Setting Up Playwright MCP

Demo: Playwright Takes Over

More Essential MCP Servers

What Did We Learn?

Frontend Workflow: Code, Screenshot, Iterate

PersistentContext.

The Goldfish Problem

Why Agents Forget

Stateless Inference

Context Rebuilt Per Call

Context Window Overflow

No Persistent Storage

Indexing Without Understanding

Claude Code's Memory System

CLAUDE.md: Your Most Powerful Lever

Auto Memory: Claude Writes Its Own Notes

Rules: Conditional Memory

The Memory Commands

Lost in the Middle

Best Practices: The Memory Playbook

Speed without
discipline
is not engineering.

The Operating
Environment.

Persistent
Context.

Configuration
& Trust.

From vibe coding
to disciplined
engineering.