A three-act course on building real software with AI agents — the 5-phase framework, the skills that drive it, and the operating environment that keeps it safe.
AI coding agents make development 10× faster. Without process, that speed produces broken code, lost context, and runaway cost.
A five-part progression: theory → practice → pain → cure → mastery.
Problem “Vibe coding” — asking AI to write code without process — produces unmaintainable, untested code that breaks in production.
| Persona | Their Problem | What They Gain |
|---|---|---|
| Software Engineers | AI writes code but they can't maintain it. | Disciplined framework + quality gates. |
| Team Leads | Team uses AI inconsistently. | Repeatable standards + review process. |
| Senior Developers | Need the full picture, not just code generation. | Architecture + context + optimization. |
| Architects & Leads | No standards exist for AI-assisted work. | Design patterns others can follow. |
Problem One-shot prompts fail on complex tasks. Every LLM call has limited context, no planning, and no verification.
Concept An Agent is an intelligent orchestrator that decomposes work and decides what to do next.
The LLM is the brain. The agent is the body.
From single-intent assistants to purpose-based orchestrators that coordinate parallel subagents.
Each phase has one owner and produces one deliverable. No phase is optional.
| # | Phase | Owner | Deliverable |
|---|---|---|---|
| 1 | Requirement Engineering | You | REQ document — what to build |
| 2 | System Architecture | You + Claude | ARCH doc — data models, APIs, modules |
| 3 | Task Generation | Claude | Task list with tests + scope + REQ trace |
| 4 | TDD Implementation | Claude | Working code + passing tests |
| 5 | Review & Merge | You + Claude | Approved PR — 16 parallel checks |
The same 5-phase pipeline — different entry points. The discipline never changes.
| Scenario | REQ | ARCH | TASKS | TDD | REVIEW | Note |
|---|---|---|---|---|---|---|
| Greenfield | 1 | 2 | 3 | 4 | 5 | Blank slate — all 5 phases. |
| New Feature | — | 2 | 3 | 4 | 5 | System exists — skip REQ. |
| Bugfix | RCA | — | 3 | 4 | 5 | Root-cause analysis first — understand why. |
Problem The 5-phase framework needs a tool — one that reads codebases, runs tests, and loads skills.
Concept Claude Code is Anthropic's official CLI — your terminal interface to agentic engineering.
Problem Skills, hooks, settings, rules — scattered. Without the map, you waste hours searching.
Problem Every session starts from zero. You re-explain conventions — every single time.
Solution Skills = markdown in .claude/skills/, loaded on demand via slash command.
Goal Create a skill Claude follows. YAML frontmatter + markdown body.
Problem Embedding bash in skills bloats context. Those tokens burn — again and again.
Pattern Extract logic to scripts. Reference from the skill. Bash loads externally.
Problem Skills can be ignored or bloated. Four patterns fix accuracy and tokens.
Four rules that determine whether Claude follows your process or improvises.
Two read-only agents that make architecture and exploration safer — shipped inside Claude Code, no setup required.
Problem Vague requirements compound into wrong architecture, wrong tasks, wrong code. Ambiguity is the #1 cause of AI project failure.
Concept A requirement is a contract. Precise, verifiable, readable by anyone — without being in the room.
Problem Four recurring failures in requirement gathering that kill projects before the first line of code.
Solution Six-phase Socratic flow + readiness checklist + Phase 1 Gate. Structured interview, structured output.
Problem Architecture designed without reading existing code produces designs that don't fit. Wrong patterns, wrong assumptions, wrong boundaries.
Concept Ground in reality before designing. Propose, stress-test, then walk the code.
Problem Four recurring failures in AI-assisted architecture that produce ungrounded, untraceable, unimplementable designs.
Solution 3-step context gathering + 7-phase flow + Change Footprint Walk + Phase 2 Gate.
Problem Vague to-do items produce vague code. Tasks without test plans produce untested code. Tasks detached from architecture produce wrong code.
Concept A task is a TDD-ready specification. Test plan first. Anchored on architecture. Sized for a single TDD cycle.
Problem Four recurring failures in task breakdown that turn architecture into chaos at implementation time.
Solution 5-step flow: Understand → Anchor → Draft Tests → Build Spec → Write to ARCH. Test plan before code.
| + New | → New carry pattern forward |
| ~ Modified | → Modified carry “what changes” note |
| − Deleted | → Modified diff shows deletion |
| ! Touched | → Must NOT modify add regression-guard tests |
| REQ acceptance criteria | → Behavior tests |
| REQ edge cases / failures | → Error / edge tests |
| ARCH forward stress-test | → Resilience tests |
| ARCH touched-but-not-changed | → Regression-guard tests |
Problem Code written before tests produces untestable code. Batched tests hide which change broke what. Skipped refactoring accumulates debt.
Concept RED → GREEN → REFACTOR. One test at a time. Minimum code to pass.
Problem Four discipline failures that turn good task specs into bad code.
Solution Two modes. RED→GREEN→REFACTOR cycle. Before-you-start checklist. Footprint-respecting. Phase 4 Gate.
| Arrange-Act-Assert | One behavior per test |
| Import from prod path | Even if module doesn't exist yet |
| Mock boundaries | Not internals — external deps only |
| Verify failure reason | Missing function, not syntax error |
| Minimum to pass | No more, no less |
| Follow existing patterns | From Implementation Notes |
| Respect Files Expected | Only create/modify listed files |
| Must NOT Modify = sacred | Regression-guard tests verify |
Problem Rubber-stamp approvals. Review everything or nothing. No severity triage. Reviewers who fix instead of flag. Review theater.
Concept Triage first, review second. Human confirms scope. Sub-skills review in parallel. Read-only — flag, don't fix.
Problem Four failures that turn code review into approval theater — or avoidance.
Solution Triage → Dispatch parallel agents → Collect → Deduplicate → Compile report. Human confirms scope at every step.
The build looked smooth. It wasn't. Credentials nearly committed. UI checked by hand. Conventions forgotten every session. Context windows overflowing. Process works — but process without environment is fragile.
The build looked smooth. It wasn't. Claude nearly committed an API key, tried to delete the migrations directory, and shipped code that passed tests but didn't match formatting. Time to add the guardrails.
Problem Claude can make destructive changes — commit credentials, delete files, force-push. These happen even when Claude is “trying” to help.
Solution Hooks = shell commands that fire at lifecycle events. Claude doesn't control them. They run automatically.
| Event | Timing | Can Block? | Typical Use |
|---|---|---|---|
| PreToolUse | BEFORE tool runs | YES | Credential guard, destructive command check |
| PostToolUse | AFTER tool runs | No | Auto-format, logging |
| Notification | Claude needs input | No | Desktop alert — makes Auto mode practical |
| UserPromptSubmit | Before prompt processed | No | Inject git branch, project state automatically |
| PreCompact | BEFORE compaction runs | No | Save critical context before summary |
| Stop / SessionStart | Finish / Startup | No | Informational only |
Problem Code passed tests but didn't match project formatting standards. Manual formatting is tedious and inconsistent.
| Type | Speed | Use Case |
|---|---|---|
| command | Fast | Formatting, linting |
| prompt | Slow | Smart validation |
| agent | Variable | Complex workflows |
| http | Network | External integrations |
Hooks block what you anticipate. Sandboxing blocks everything else.
We merged the code. But we checked the UI manually — opening the browser, clicking, watching the console. That's tedious — and Claude can't see what we see. Or can it?
Concept Model Context Protocol (MCP) connects Claude to external tools. Bash is powerful but limited — Claude can't browse or take screenshots. MCP changes that.
| Capability | Without MCP | With Playwright MCP |
|---|---|---|
| Read code | Reads files, “looks correct” | Navigates URL, sees actual UI |
| Screenshots | Not possible | Captures viewport automatically |
| Console errors | Manual check only | Reads errors programmatically |
| DOM data | Cannot extract | Extracts accessibility tree |
| Verification | Manual → slow | Automated → fast, reproducible |
| Server | Purpose | When to Use |
|---|---|---|
| Qdrant | Semantic search over docs | Search across project docs, ADRs, past decisions |
| Context7 | Up-to-date library docs | Latest API signatures when training data is stale |
| PostgreSQL | Database queries | Verify schema, run queries, check migrations |
| Figma | Design-to-code | Access design specs programmatically from mockups |
| Registry | modelcontextprotocol.io | Hundreds more servers. Add as workflow evolves. |
Playwright isn't just for verification — it's for visual iteration. Claude sees what you see.
Claude executes your process perfectly — then forgets everything next session. Your conventions, your patterns, your rules. Every session starts from zero.
You spend forty minutes teaching Claude your authentication system. You close the terminal. Open a new session. Claude has no idea. It is a stranger again.
Five technical reasons every session starts from zero. Understanding the mechanism is the first step to fixing it.
Claude Code doesn't have one memory system. It has four — each solving a different part of the problem.
| Level | Location | Who Writes | Scope |
|---|---|---|---|
| Project | ./CLAUDE.md | You | This repository |
| User | ~/.claude/CLAUDE.md | You | All projects — global conventions |
| Rules | ./.claude/rules/*.md | You | Per file-type, conditional |
| Local | ./CLAUDE.local.md | You | Gitignored — personal only |
| Auto | ~/.claude/projects/.../MEMORY.md | Claude | Machine-local only |
| Subdirectory | ./src/api/CLAUDE.md | You | Lazy-loaded when Claude operates in that directory |
When you correct Claude, it detects patterns, checks if they're already known, and writes new entries to MEMORY.md.
Problem API code and frontend code need different standards. Loading all rules wastes context.
Five slash commands to manage memory actively. Don't wait for problems — manage context proactively.
| Command | Purpose | When to Use |
|---|---|---|
| /init | Generate starter CLAUDE.md from codebase | Starting a new project — scans files, infers conventions |
| /memory | Edit memory files in system editor | Need to update CLAUDE.md or MEMORY.md manually |
| /compact | Summarize conversation history | Freeing context space before it degrades |
| /context | Visualize context usage as colored grid | Debugging what is loaded and what is not |
| /cost | Show token usage and cost | Monitoring spend and efficiency |
| /powerup | Interactive lessons built into Claude Code | Weekly skill refresh — learning by doing inside the tool |
Research by Liu et al. (2023) shows language models over-attend to the start and end of context — the middle gets lost. This is not a bug. It is a property of how attention works.
Memory systems only work if you maintain them. Six practices that separate teams that get compounding value from those that get frustration.
Process without memory is Groundhog Day. Memory turns repetition into compounding. Three pillars that make agentic engineering sustainable.
Whose settings win when they conflict? And how much autonomy should Claude have? Two questions, one hierarchy.
You set TypeScript strict mode. Your colleague disables it locally. CI uses a managed policy. Who wins?
Five levels of settings load on startup. Higher number wins.
The question How much autonomy should Claude have? Too much = dangerous. Too little = slow.
The question Auto mode without rules is trust without boundaries. Here's how to set them.
Every message fills the context window. Quality degrades silently.
| Command | When to Use | What It Does |
|---|---|---|
| /clear | Between unrelated tasks | Clears conversation. Start fresh with empty context. |
| /context | When session feels slow | Shows what's loaded — skills, files, MCP tools. Diagnose bloat. |
| /cost | Regularly | Displays token usage and cost. Track spend. |
| /compact | Before major new tasks | Summarizes context into a checkpoint. Reduces token load. |
| --continue | After closing and reopening terminal | Resume the most recent session without starting fresh. |
| --resume <id> | When you need a specific past session | Resume by session name or ID. |
This course gave you the framework, skills, and environment. The frontier moves weekly. Here's how to stay ahead of it.
No black boxes. No magic spells. The framework, the skills, and the environment — yours to take into your team.