Caelan's Domain

Claude Code Guide — Part 7: Plan, Control, Execute, Validate

claude-codeprompt-engineeringai-workflowsai-methodologyagentshooks

Created: April 27, 2026 | Modified: April 27, 2026

The problem with one-shot prompting

You ask the model for the thing, you read what comes back, you spot two problems, you ask again, it fixes one and breaks the other, twenty minutes pass, and you are doing the orchestration in your head while typing. The model is doing the easy part — generating text. You are doing the hard part — holding the plan, checking the work, deciding what comes next, remembering what was already tried. That asymmetry is what one-shot prompting fails at on real work.

The fix is not better prompts. It is a different shape. Reliable systems — the ones used by people who actually trust their output — separate four roles: someone names the work, someone routes it to the right doer, someone does it, and someone checks it against the original ask. When you collapse those roles into a single chat turn you get one-shot. When you keep them distinct you get something you can hand off, repeat, and improve. This article teaches that shape.

The four-phase method

Call them Plan, Control, Execute, Validate.

  • Plan is the role that turns a request into a structured intent — naming the outcome, the acceptance criteria, the constraints, and what is out of scope.
  • Control is the role that picks the right doer for the work and routes it there with only the context that doer needs.
  • Execute is the role that does the actual work against the named criteria.
  • Validate is the role that checks what was delivered against the original ask and returns one of three verdicts — pass, fail (delivery wrong), or revise (plan wrong).
flowchart LR
    A[Plan] --> B[Control]
    B --> C[Execute]
    C --> D[Validate]
    D -->|PASS| E[Done]
    D -.->|FAIL| B
    D -.->|REVISE| A

The dotted arrows are the revision loops, and they matter — the loop is what separates a method from a one-shot dressed up in more steps. The rest of this article shows what each phase looks like in Claude Code, and how to apply the same shape outside it.

Plan: turning a request into a structured intent

Planning names three things: the outcome, the acceptance criteria, and the boundaries. The outcome is what done looks like in your words, not a list of mechanics. The criteria are checkable claims someone could grade later. The boundaries are what is not in scope. Skip any of the three and the doer fills in their own version.

Claude Code gives you three native surfaces. Plan mode is the explicit pre-execution surface — you draft the plan in conversation with the model, and nothing gets written or executed until you exit. TodoWrite persists the plan as a visible task list that survives long sessions, compactions, and handoffs to subagents — it is the persistence surface for everything that comes after, and the tool you reach for is catalogued in Part 6. CLAUDE.md is the standing brief every plan inherits context from — stack, conventions, gotchas — so you do not retype them on every turn. (If you have not authored CLAUDE.md yet, see Part 1.)

In your own setup the planning role does not need any of those primitives. It might be a paper checklist, a Markdown brief in your repo, a Python dict assembled before an API call, or two minutes of writing in a DM to a freelancer. The role is what matters.

Control: assigning the right doer

Control is a routing decision, not a doing decision. The controller reads the plan, picks the specialist, hands over only the slice of context that doer needs, and holds the criteria against which the work will be judged. Picking the wrong doer wastes the plan. Giving them the whole transcript instead of the relevant slice wastes context.

In Claude Code, the controller is you. You decide which subagent in .claude/agents/ to invoke via the Task tool, which model to ask (cheap for routine, strong for reasoning), and which slash command to fire. Permissions (Part 2) shape what each doer is allowed to do. The Skills you authored in Part 3 are part of the routing menu too — a task Skill is a doer with a known shape.

A subagent definition is the controller's standing description of one kind of doer:

---
name: brief-reviewer
description: Reviews a diff against the original brief and returns PASS / FAIL / REVISE with reasoning. Read-only.
tools: Read, Grep, Glob
model: sonnet
---

You are a reviewer. Your job is not to fix the work — it is to judge it
against the brief.

Inputs you will receive:
- The original brief (outcome, acceptance criteria, out-of-scope items)
- The diff or the produced artifacts to review

Return exactly one of three verdicts:
- PASS — every acceptance criterion is met, no out-of-scope changes
- FAIL — delivery missed one or more criteria; name which ones and how
- REVISE — the plan itself was wrong; name the contradiction or gap

Cite specific lines or paths in your reasoning. Do not paraphrase the
criteria — quote them.

Outside Claude Code, control is the same routing decision: which model your script calls, which teammate gets the brief, whether you handle the work yourself or delegate it.

Execute: doing the actual work

Execution is bounded work against named criteria. The doer reads the brief, does the thing, reports back. In Claude Code, when you call the Task tool with a subagent, that subagent runs in an isolated context window — it does not see your transcript or inherit your in-flight scratch work. The isolation is a feature: it forces a clean handoff and keeps the doer's output legible. Long-running execution can run in the background.

The thing to internalize is that the executor only sees what the controller hands over. If the brief is missing a constraint, the executor will not magically know it. Execution quality is bounded by control quality, which is bounded by plan quality. The cheap fix at this phase is almost never possible — it is always upstream.

In a Python script wrapping the Claude API, execution is a single messages.create call with the plan-derived prompt. With a person, it is the actual work session. Same shape: bounded inputs, named criteria, one deliverable.

Validate: PASS / FAIL / REVISE against the original ask

Validation is the part most amateur setups skip, and it is the difference between a method and a slow one-shot. After the work comes back, somebody checks it against the original ask — the criteria you wrote down before the work started, not the version the executor remembered.

Three verdicts. Pass means every criterion is met and nothing out of scope was touched. Pass does not mean perfect; if the criteria were soft, that is your problem to refine. Fail means the delivery missed the bar — the plan was reasonable, execution did not satisfy it. Fix: re-execute with feedback. Revise is the more interesting one: the plan itself was wrong. The work may be perfectly executed against an objective that turns out to have been wrong. Fix: re-plan, not re-execute. Distinguishing fail from revise saves cycles — re-executing a wrong plan never converges.

In Claude Code, validation can take several shapes: a reviewer subagent with read-only tools invoked after the executor returns; a hook firing on a lifecycle event (a Stop hook checking the diff before the session ends, a PostToolUse hook running lint after every edit); a test suite; or you with the original brief in front of you, reading the output cold. Pick the heaviest validation the work earns.

{
  "hooks": {
    "Stop": [
      {
        "matcher": "",
        "hooks": [
          {
            "type": "command",
            "command": "bash -c 'if [ -f .claude/brief.md ]; then git diff --stat | head -20; echo; echo \"--- brief ---\"; cat .claude/brief.md; fi'"
          }
        ]
      }
    ]
  }
}

That hook prints the change summary and the brief side by side at the end of every session — the last thing you see before walking away is the diff next to what you asked for. It does not pass or fail the work; it forces a moment of validation before you ship. Heavier hooks call a reviewer subagent, run a test suite, or block on exit code. (Hooks come from Part 5 — Hooks; subagents come from Part 4 — Subagents. Validate leans on those two parts the most.)

If you skipped plan, you have nothing to validate against — you have degraded back to one-shot with extra steps.

The revision loop is the point

A system that cannot revise is a fancy one-shot. The dotted arrows in the diagram — fail back to control, revise back to plan — are what make this a method. Every reliable system has them: code review, hardware QA, editorial review on a manuscript. Removing the loop because it feels slow is the most common way teams accidentally break a working method.

In Claude Code, revision is built in: the validator subagent says revise, you re-enter plan mode with the feedback, you fix the contradiction, you re-run. With a person, revision is a follow-up brief naming the missing constraint. With a script, revision is a retry-with-context loop: append the validator's feedback, call the model again, validate again.

The counter-intuitive frame: cap the revisions, but do not remove them. Three to five cycles is usually enough to converge. Beyond that, the request as written cannot be satisfied — escalate to a human and rewrite from scratch.

Cap revision at 3-5 cycles, then escalate. The cap stops you from grinding indefinitely on a brief that has a hidden contradiction. The loop itself stays — what changes is who holds it after the cap.

What it looks like end-to-end in Claude Code

The shape of a single session that uses every primitive — a sketch, not a prescription:

# 1. PLAN — enter plan mode, draft the brief in conversation with Claude
#    The model writes plan.md or scratchpad notes; you exit plan mode
#    only when the plan names outcome + criteria + out-of-scope.
# 2. PLAN — TodoWrite the steps so the plan persists across compaction
#    and so any subagent you spawn can read it.
# 3. CONTROL — pick the doer. For a refactor: a coding subagent.
#    For research: a read-only researcher. Fire it via the Task tool
#    with a clean handoff (the brief + the slice of context it needs,
#    not your whole transcript).
# 4. EXECUTE — the subagent runs in its own context, makes the edits,
#    returns a report. Long-running work can go to the background.
# 5. VALIDATE — invoke the reviewer subagent with the brief + the diff,
#    OR rely on the Stop hook to print the diff next to the brief
#    before you leave the session. Read the verdict honestly.
# 6. LOOP — on FAIL, re-route to control with feedback. On REVISE,
#    re-enter plan mode and rebuild the brief. On PASS, ship.

That is the entire method, in one Claude Code session. The session writes everything to disk along the way — TodoWrite state, subagent reports, hook output — so you can walk away and come back to the record. The primitives in Parts 1-6 are the surfaces that make each phase concrete. The shape is what is yours.

Building your own

Three minimal versions, in order of how close they sit to Claude Code:

Inside Claude Code. Write the brief in plan mode. TodoWrite the steps. Delegate the heavy steps to subagents via the Task tool. Validate with a reviewer subagent or a Stop hook. On fail or revise, loop back. The primitives are already there; what you practice is the discipline of separating the phases instead of collapsing them into one chat turn.

In a script. When you wrap the Claude API directly, the four phases become four function calls. Plan is a structured prompt that returns a JSON brief; control is a function that picks which downstream prompt to fire; execute is the model call that does the work; validate is a separate model call (or a deterministic check) that returns a verdict.

brief = plan(request)               # returns {outcome, criteria, out_of_scope}
for attempt in range(MAX_REVISIONS):
    doer = control(brief)            # returns which executor + which model
    output = execute(doer, brief)    # returns the work
    verdict = validate(brief, output)
    if verdict == "PASS":
        return output
    if verdict == "FAIL":
        brief.feedback = verdict.reasoning   # re-execute with feedback
        continue
    if verdict == "REVISE":
        brief = plan(request, prior=verdict) # re-plan from the top
escalate_to_human(request, history)

Forty lines of real Python once you fill in the calls. The shape is what carries the method.

With a person. Write the brief — outcome, criteria, out of scope — and hand it off. Pick the right person (control). They do the work (execute). Review against the original brief, not your in-the-moment taste (validate). On fail, hand back specific feedback. On revise, rewrite the brief and acknowledge the contradiction was yours. The four-phase shape is older than software.

Writing requests that work in any version of this method

Whether the doer is a subagent, a Python function, or a teammate, the same prompt-craft holds.

  • State the outcome, not the mechanics. "Refactor the auth module to use the new token format" beats "first read auth.py, then write new code, then update the tests." If you are typing steps, you are squeezing the planner out of the loop.
  • Name acceptance criteria up front. Three checkable claims. "All existing tests pass." "The legacy token continues to work for thirty days." "A new test exists for the dual-mode path." The validator can only check what you said.
  • Hand over the context the doer needs. File paths, prior decisions, existing conventions. If a constraint lives in someone's head, write it down before you delegate.
  • Be explicit about scope boundaries. What is not in scope is as load-bearing as what is. Without a boundary, doers expand work to fill the available context.
  • Pick the right verb. "Design" is different from "implement," which is different from "fix." The verb tells control which doer to route to. Mismatched verbs are the most common revision trigger.

These five points take two minutes per request. They save five revision cycles on the back end.

Common failure modes

Each failure below is named in method terms, not tool terms — the same failure shows up the same way in Claude Code, in scripts, and with people.

  • Vague request. Planning has nothing to lock onto. The plan is generic, the criteria aspirational, validation impossible. Fix: state the outcome and at least three checkable criteria.
  • Conflicting constraints. "Make it shorter but include more detail." "Match the existing voice but be more direct." Validation is structurally impossible — whichever side the executor picks, the other fails. Fix: pick one, or split the work.
  • Wrong verb. Saying "design" when you wanted code is a control-phase miss. Fix: read your verb and ask whether it matches the outcome you want.
  • Missing acceptance criteria. Validation has nothing to check against, so pass becomes meaningless and the system has degraded back to one-shot.
  • Treating revision as failure. When you remove the loop because it feels slow, you remove the part that made the method work. Cap the loop; do not remove it.

Closing

The four-phase shape is older than AI. It is how reliable systems get built. Claude Code gives you native surfaces for each phase — plan mode and TodoWrite for Plan, Skills and the Task tool for Control, the file/shell tools for Execute, reviewer subagents and Stop hooks for Validate — but the method is yours, and it travels wherever the work goes next.

The prompts in the sidebar are the templates I reach for when running this method live: an outcome-first brief, a plan-mode kickoff, a hand-off, a reviewer-subagent prompt, a debrief. Copy any of them, fill in the specifics, and run. If a part of the loop is unfamiliar — subagents, hooks, the standing brief, the tool inventory — Parts 1 through 6 are where those primitives are taught.