feat: implement new claude skills and workflow

2026-01-03 11:02:30 -06:00
parent c443305007
commit 9f00797925
45 changed files with 10132 additions and 2174 deletions
--- a/.claude/skills/planner/CLAUDE.md
+++ b/.claude/skills/planner/CLAUDE.md
@@ -0,0 +1,86 @@
+# skills/planner/
+
+## Overview
+
+Planning skill with resources that must stay synced with agent prompts.
+
+## Index
+
+| File/Directory                        | Contents                                       | Read When                                    |
+| ------------------------------------- | ---------------------------------------------- | -------------------------------------------- |
+| `SKILL.md`                            | Planning workflow, phases                      | Using the planner skill                      |
+| `scripts/planner.py`                  | Step-by-step planning orchestration            | Debugging planner behavior                   |
+| `resources/plan-format.md`            | Plan template (injected by script)             | Editing plan structure                       |
+| `resources/temporal-contamination.md` | Detection heuristic for contaminated comments  | Updating TW/QR temporal contamination logic  |
+| `resources/diff-format.md`            | Unified diff spec for code changes             | Updating Developer diff consumption logic    |
+| `resources/default-conventions.md`    | Default structural conventions (4-tier system) | Updating QR RULE 2 or planner decision audit |
+
+## Resource Sync Requirements
+
+Resources are **authoritative sources**.
+
+- **SKILL.md** references resources directly (main Claude can read files)
+- **Agent prompts** embed resources 1:1 (sub-agents cannot access files
+  reliably)
+
+### plan-format.md
+
+Plan template injected by `scripts/planner.py` at planning phase completion.
+
+**No agent sync required** - the script reads and outputs the format directly,
+so editing this file takes effect immediately without updating any agent
+prompts.
+
+### temporal-contamination.md
+
+Authoritative source for temporal contamination detection. Full content embedded
+1:1.
+
+| Synced To                    | Embedded Section           |
+| ---------------------------- | -------------------------- |
+| `agents/technical-writer.md` | `<temporal_contamination>` |
+| `agents/quality-reviewer.md` | `<temporal_contamination>` |
+
+**When updating**: Modify `resources/temporal-contamination.md` first, then copy
+content into both `<temporal_contamination>` sections.
+
+### diff-format.md
+
+Authoritative source for unified diff format. Full content embedded 1:1.
+
+| Synced To             | Embedded Section |
+| --------------------- | ---------------- |
+| `agents/developer.md` | `<diff_format>`  |
+
+**When updating**: Modify `resources/diff-format.md` first, then copy content
+into `<diff_format>` section.
+
+### default-conventions.md
+
+Authoritative source for default structural conventions (four-tier decision
+backing system). Embedded 1:1 in QR for RULE 2 enforcement; referenced by
+planner.py for decision audit.
+
+| Synced To                    | Embedded Section        |
+| ---------------------------- | ----------------------- |
+| `agents/quality-reviewer.md` | `<default_conventions>` |
+
+**When updating**: Modify `resources/default-conventions.md` first, then copy
+full content verbatim into `<default_conventions>` section in QR.
+
+## Sync Verification
+
+After modifying a resource, verify sync:
+
+```bash
+# Check temporal-contamination.md references
+grep -l "temporal.contamination\|four detection questions\|change-relative\|baseline reference" agents/*.md
+
+# Check diff-format.md references
+grep -l "context lines\|AUTHORITATIVE\|APPROXIMATE\|context anchor" agents/*.md
+
+# Check default-conventions.md references
+grep -l "default_conventions\|domain: god-object\|domain: test-organization" agents/*.md
+```
+
+If grep finds files not listed in sync tables above, update this document.
--- a/.claude/skills/planner/README.md
+++ b/.claude/skills/planner/README.md
@@ -0,0 +1,80 @@
+# Planner
+
+LLM-generated plans have gaps. I have seen missing error handling, vague
+acceptance criteria, specs that nobody can implement. I built this skill with
+two workflows -- planning and execution -- connected by quality gates that catch
+these problems early.
+
+## Planning Workflow
+
+```
+  Planning ----+
+      |        |
+      v        |
+     QR -------+  [fail: restart planning]
+      |
+      v
+     TW -------+
+      |        |
+      v        |
+   QR-Docs ----+  [fail: restart TW]
+      |
+      v
+   APPROVED
+```
+
+| Step                    | Actions                                                                    |
+| ----------------------- | -------------------------------------------------------------------------- |
+| Context & Scope         | Confirm path, define scope, identify approaches, list constraints          |
+| Decision & Architecture | Evaluate approaches, select with reasoning, diagram, break into milestones |
+| Refinement              | Document risks, add uncertainty flags, specify paths and criteria          |
+| Final Verification      | Verify completeness, check specs, write to file                            |
+| QR-Completeness         | Verify Decision Log complete, policy defaults confirmed, plan structure    |
+| QR-Code                 | Read codebase, verify diff context, apply RULE 0/1/2 to proposed code      |
+| Technical Writer        | Scrub temporal comments, add WHY comments, enrich rationale                |
+| QR-Docs                 | Verify no temporal contamination, comments explain WHY not WHAT            |
+
+So, why all the feedback loops? QR-Completeness and QR-Code run before TW to
+catch structural issues early. QR-Docs runs after TW to validate documentation
+quality. Doc issues restart only TW; structure issues restart planning. The loop
+runs until both pass.
+
+## Execution Workflow
+
+```
+  Plan --> Milestones --> QR --> Docs --> Retrospective
+               ^          |
+               +- [fail] -+
+
+  * Reconciliation phase precedes Milestones when resuming partial work
+```
+
+After planning completes and context clears (`/clear`), execution proceeds:
+
+| Step                   | Purpose                                                         |
+| ---------------------- | --------------------------------------------------------------- |
+| Execution Planning     | Analyze plan, detect reconciliation signals, output strategy    |
+| Reconciliation         | (conditional) Validate existing code against plan               |
+| Milestone Execution    | Delegate to agents, run tests; repeat until all complete        |
+| Post-Implementation QR | Quality review of implemented code                              |
+| Issue Resolution       | (conditional) Present issues, collect decisions, delegate fixes |
+| Documentation          | Technical writer updates CLAUDE.md/README.md                    |
+| Retrospective          | Present execution summary                                       |
+
+I designed the coordinator to never write code directly -- it delegates to
+developers. Separating coordination from implementation produces cleaner
+results. The coordinator:
+
+- Parallelizes independent work across up to 4 developers per milestone
+- Runs quality review after all milestones complete
+- Loops through issue resolution until QR passes
+- Invokes technical writer only after QR passes
+
+**Reconciliation** handles resume scenarios. When the user request contains
+signals like "already implemented", "resume", or "partially complete", the
+workflow validates existing code against plan requirements before executing
+remaining milestones. Building on unverified code means rework.
+
+**Issue Resolution** presents each QR finding individually with options (Fix /
+Skip / Alternative). Fixes delegate to developers or technical writers, then QR
+runs again. This cycle repeats until QR passes.
--- a/.claude/skills/planner/SKILL.md
+++ b/.claude/skills/planner/SKILL.md
@@ -0,0 +1,59 @@
+---
+name: planner
+description: Interactive planning and execution for complex tasks. Use when user asks to use or invoke planner skill.
+---
+
+# Planner Skill
+
+Two-phase workflow: **planning** (create plans) and **execution** (implement
+plans).
+
+## Invocation Routing
+
+| User Intent                                 | Script      | Invocation                                                                         |
+| ------------------------------------------- | ----------- | ---------------------------------------------------------------------------------- |
+| "plan", "design", "architect", "break down" | planner.py  | `python3 scripts/planner.py --step-number 1 --total-steps 4 --thoughts "..."`      |
+| "review plan" (after plan written)          | planner.py  | `python3 scripts/planner.py --phase review --step-number 1 --total-steps 2 ...`    |
+| "execute", "implement", "run plan"          | executor.py | `python3 scripts/executor.py --plan-file PATH --step-number 1 --total-steps 7 ...` |
+
+Scripts inject step-specific guidance via JIT prompt injection. Invoke the
+script and follow its REQUIRED ACTIONS output.
+
+## When to Use
+
+Use when task has:
+
+- Multiple milestones with dependencies
+- Architectural decisions requiring documentation
+- Complexity benefiting from forced reflection pauses
+
+Skip when task is:
+
+- Single-step with obvious implementation
+- Quick fix or minor change
+- Already well-specified by user
+
+## Resources
+
+| Resource                              | Contents                                   | Read When                                       |
+| ------------------------------------- | ------------------------------------------ | ----------------------------------------------- |
+| `resources/diff-format.md`            | Unified diff specification for plans       | Writing code changes in milestones              |
+| `resources/temporal-contamination.md` | Comment hygiene detection heuristics       | Writing comments in code snippets               |
+| `resources/default-conventions.md`    | Priority hierarchy, structural conventions | Making decisions without explicit user guidance |
+| `resources/plan-format.md`            | Plan template structure                    | Completing planning phase (injected by script)  |
+
+**Resource loading rule**: Scripts will prompt you to read specific resources at
+decision points. When prompted, read the full resource before proceeding.
+
+## Workflow Summary
+
+**Planning phase**: Steps 1-N explore context, evaluate approaches, refine
+milestones. Final step writes plan to file. Review phase (TW scrub -> QR
+validation) follows.
+
+**Execution phase**: 7 steps -- analyze plan, reconcile existing code, delegate
+milestones to agents, QR validation, issue resolution, documentation,
+retrospective.
+
+All procedural details are injected by the scripts. Invoke the appropriate
+script and follow its output.
--- a/.claude/skills/planner/resources/default-conventions.md
+++ b/.claude/skills/planner/resources/default-conventions.md
@@ -0,0 +1,156 @@
+# Default Conventions
+
+These conventions apply when project documentation does not specify otherwise.
+
+## MotoVaultPro Project Conventions
+
+**Naming**:
+- Database columns: snake_case (`user_id`, `created_at`)
+- TypeScript types: camelCase (`userId`, `createdAt`)
+- API responses: camelCase
+- Files: kebab-case (`vehicle-repository.ts`)
+
+**Architecture**:
+- Feature capsules: `backend/src/features/{feature}/`
+- Repository pattern with mapRow() for case conversion
+- Single-tenant, user-scoped data
+
+**Frontend**:
+- Mobile + desktop validation required (320px, 768px, 1920px)
+- Touch targets >= 44px
+- No hover-only interactions
+
+**Development**:
+- Local node development (`npm install`, `npm run dev`, `npm test`)
+- CI/CD pipeline validates containers and integration tests
+- Plans stored in Gitea Issue comments
+
+---
+
+## Priority Hierarchy
+
+Higher tiers override lower. Cite backing source when auditing.
+
+| Tier | Source          | Action                           |
+| ---- | --------------- | -------------------------------- |
+| 1    | user-specified  | Explicit user instruction: apply |
+| 2    | doc-derived     | CLAUDE.md / project docs: apply  |
+| 3    | default-derived | This document: apply             |
+| 4    | assumption      | No backing: CONFIRM WITH USER    |
+
+## Severity Levels
+
+| Level      | Meaning                          | Action          |
+| ---------- | -------------------------------- | --------------- |
+| SHOULD_FIX | Likely to cause maintenance debt | Flag for fixing |
+| SUGGESTION | Improvement opportunity          | Note if time    |
+
+---
+
+## Structural Conventions
+
+<default-conventions domain="god-object">
+**God Object**: >15 public methods OR >10 dependencies OR mixed concerns (networking + UI + data)
+Severity: SHOULD_FIX
+</default-conventions>
+
+<default-conventions domain="god-function">
+**God Function**: >50 lines OR multiple abstraction levels OR >3 nesting levels
+Severity: SHOULD_FIX
+Exception: Inherently sequential algorithms or state machines
+</default-conventions>
+
+<default-conventions domain="duplicate-logic">
+**Duplicate Logic**: Copy-pasted blocks, repeated error handling, parallel near-identical functions
+Severity: SHOULD_FIX
+</default-conventions>
+
+<default-conventions domain="dead-code">
+**Dead Code**: No callers, impossible branches, unread variables, unused imports
+Severity: SUGGESTION
+</default-conventions>
+
+<default-conventions domain="inconsistent-error-handling">
+**Inconsistent Error Handling**: Mixed exceptions/error codes, inconsistent types, swallowed errors
+Severity: SUGGESTION
+Exception: Project specifies different handling per error category
+</default-conventions>
+
+---
+
+## File Organization Conventions
+
+<default-conventions domain="test-organization">
+**Test Organization**: Extend existing test files; create new only when:
+- Distinct module boundary OR >500 lines OR different fixtures required
+Severity: SHOULD_FIX (for unnecessary fragmentation)
+</default-conventions>
+
+<default-conventions domain="file-creation">
+**File Creation**: Prefer extending existing files; create new only when:
+- Clear module boundary OR >300-500 lines OR distinct responsibility
+Severity: SUGGESTION
+</default-conventions>
+
+---
+
+## Testing Conventions
+
+<default-conventions domain="testing">
+**Principle**: Test behavior, not implementation. Fast feedback.
+
+**Test Type Hierarchy** (preference order):
+
+1. **Integration tests** (highest value)
+   - Test end-user verifiable behavior
+   - Use real systems/dependencies (e.g., testcontainers)
+   - Verify component interaction at boundaries
+   - This is where the real value lies
+
+2. **Property-based / generative tests** (preferred)
+   - Cover wide input space with invariant assertions
+   - Catch edge cases humans miss
+   - Use for functions with clear input/output contracts
+
+3. **Unit tests** (use sparingly)
+   - Only for highly complex or critical logic
+   - Risk: maintenance liability, brittleness to refactoring
+   - Prefer integration tests that cover same behavior
+
+**Test Placement**: Tests are part of implementation milestones, not separate
+milestones. A milestone is not complete until its tests pass. This creates fast
+feedback during development.
+
+**DO**:
+
+- Integration tests with real dependencies (testcontainers, etc.)
+- Property-based tests for invariant-rich functions
+- Parameterized fixtures over duplicate test bodies
+- Test behavior observable by end users
+
+**DON'T**:
+
+- Test external library/dependency behavior (out of scope)
+- Unit test simple code (maintenance liability exceeds value)
+- Mock owned dependencies (use real implementations)
+- Test implementation details that may change
+- One-test-per-variant when parametrization applies
+
+Severity: SHOULD_FIX (violations), SUGGESTION (missed opportunities)
+</default-conventions>
+
+---
+
+## Modernization Conventions
+
+<default-conventions domain="version-constraints">
+**Version Constraint Violation**: Features unavailable in project's documented target version
+Requires: Documented target version
+Severity: SHOULD_FIX
+</default-conventions>
+
+<default-conventions domain="modernization">
+**Modernization Opportunity**: Legacy APIs, verbose patterns, manual stdlib reimplementations
+Severity: SUGGESTION
+Exception: Project requires legacy pattern
+</default-conventions>
--- a/.claude/skills/planner/resources/diff-format.md
+++ b/.claude/skills/planner/resources/diff-format.md
@@ -0,0 +1,201 @@
+# Unified Diff Format for Plan Code Changes
+
+This document is the authoritative specification for code changes in implementation plans.
+
+## Purpose
+
+Unified diff format encodes both **location** and **content** in a single structure. This eliminates the need for location directives in comments (e.g., "insert at line 42") and provides reliable anchoring even when line numbers drift.
+
+## Anatomy
+
+```diff
+--- a/path/to/file.py
+++ b/path/to/file.py
+@@ -123,6 +123,15 @@ def existing_function(ctx):
+    # Context lines (unchanged) serve as location anchors
+    existing_code()
+
+   # NEW: Comments explain WHY - transcribed verbatim by Developer
+   # Guard against race condition when messages arrive out-of-order
+   new_code()
+
+    # More context to anchor the insertion point
+    more_existing_code()
+```
+
+## Components
+
+| Component                                  | Authority                 | Purpose                                                    |
+| ------------------------------------------ | ------------------------- | ---------------------------------------------------------- |
+| File path (`--- a/path/to/file.py`)        | **AUTHORITATIVE**         | Exact target file                                          |
+| Line numbers (`@@ -123,6 +123,15 @@`)      | **APPROXIMATE**           | May drift as earlier milestones modify the file            |
+| Function context (`@@ ... @@ def func():`) | **SCOPE HINT**            | Function/method containing the change                      |
+| Context lines (unchanged)                  | **AUTHORITATIVE ANCHORS** | Developer matches these patterns to locate insertion point |
+| `+` lines                                  | **NEW CODE**              | Code to add, including WHY comments                        |
+| `-` lines                                  | **REMOVED CODE**          | Code to delete                                             |
+
+## Two-Layer Location Strategy
+
+Code changes use two complementary layers for location:
+
+1. **Prose scope hint** (optional): Natural language describing conceptual location
+2. **Diff with context**: Precise insertion point via context line matching
+
+### Layer 1: Prose Scope Hints
+
+For complex changes, add a prose description before the diff block:
+
+````markdown
+Add validation after input sanitization in `UserService.validate()`:
+
+```diff
+@@ -123,6 +123,15 @@ def validate(self, user):
+     sanitized = sanitize(user.input)
+
+    # Validate format before proceeding
+    if not is_valid_format(sanitized):
+        raise ValidationError("Invalid format")
+
+     return process(sanitized)
+`` `
+```
+````
+
+The prose tells Developer **where conceptually** (which method, what operation precedes it). The diff tells Developer **where exactly** (context lines to match).
+
+**When to use prose hints:**
+
+- Changes to large files (>300 lines)
+- Multiple changes to the same file in one milestone
+- Complex nested structures where function context alone is ambiguous
+- When the surrounding code logic matters for understanding placement
+
+**When prose is optional:**
+
+- Small files with obvious structure
+- Single change with unique context lines
+- Function context in @@ line provides sufficient scope
+
+### Layer 2: Function Context in @@ Line
+
+The `@@` line can include function/method context after the line numbers:
+
+```diff
+@@ -123,6 +123,15 @@ def validate(self, user):
+```
+
+This follows standard unified diff format (git generates this automatically). It tells Developer which function contains the change, aiding navigation even when line numbers drift.
+
+## Why Context Lines Matter
+
+When a plan has multiple milestones that modify the same file, earlier milestones shift line numbers. The `@@ -123` in Milestone 3 may no longer be accurate after Milestones 1 and 2 execute.
+
+**Context lines solve this**: Developer searches for the unchanged context patterns in the actual file. These patterns are stable anchors that survive line number drift.
+
+Include 2-3 context lines before and after changes for reliable matching.
+
+## Comment Placement
+
+Comments in `+` lines explain **WHY**, not **WHAT**. These comments:
+
+- Are transcribed verbatim by Developer
+- Source rationale from Planning Context (Decision Log, Rejected Alternatives)
+- Use concrete terms without hidden baselines
+- Must pass temporal contamination review (see `temporal-contamination.md`)
+
+**Important**: Comments written during planning often contain temporal contamination -- change-relative language, baseline references, or location directives. @agent-technical-writer reviews and fixes these before @agent-developer transcribes them.
+
+<example type="CORRECT" category="why_comment">
+```diff
+   # Polling chosen over webhooks: 30% webhook delivery failures in third-party API
+   # WebSocket rejected to preserve stateless architecture
+   updates = poll_api(interval=30)
+```
+Explains WHY this approach was chosen.
+</example>
+
+<example type="INCORRECT" category="what_comment">
+```diff
+   # Poll the API every 30 seconds
+   updates = poll_api(interval=30)
+```
+Restates WHAT the code does - redundant with the code itself.
+</example>
+
+<example type="INCORRECT" category="hidden_baseline">
+```diff
+   # Generous timeout for slow networks
+   REQUEST_TIMEOUT = 60
+```
+"Generous" compared to what? Hidden baseline provides no actionable information.
+</example>
+
+<example type="CORRECT" category="concrete_justification">
+```diff
+   # 60s accommodates 95th percentile upstream response times
+   REQUEST_TIMEOUT = 60
+```
+Concrete justification that explains why this specific value.
+</example>
+
+## Location Directives: Forbidden
+
+The diff structure handles location. Location directives in comments are redundant and error-prone.
+
+<example type="INCORRECT" category="location_directive">
+```python
+# Insert this BEFORE the retry loop (line 716)
+# Timestamp guard: prevent older data from overwriting newer
+get_ctx, get_cancel = context.with_timeout(ctx, 500)
+```
+Location directive leaked into comment - line numbers become stale.
+</example>
+
+<example type="CORRECT" category="location_directive">
+```diff
+@@ -714,6 +714,10 @@ def put(self, ctx, tags):
+    for tag in tags:
+        subject = tag.subject
+
+-       # Timestamp guard: prevent older data from overwriting newer
+-       # due to network delays, retries, or concurrent writes
+-       get_ctx, get_cancel = context.with_timeout(ctx, 500)
+
+        # Retry loop for Put operations
+        for attempt in range(max_retries):
+
+```
+Context lines (`for tag in tags`, `# Retry loop`) are stable anchors that survive line number drift.
+</example>
+
+## When to Use Diff Format
+
+<diff_format_decision>
+
+| Code Characteristic                     | Use Diff? | Boundary Test                            |
+| --------------------------------------- | --------- | ---------------------------------------- |
+| Conditionals, loops, error handling,    | YES       | Has branching logic                      |
+| state machines                          |           |                                          |
+| Multiple insertions same file           | YES       | >1 change location                       |
+| Deletions or replacements               | YES       | Removing/changing existing code          |
+| Pure assignment/return (CRUD, getters)  | NO        | Single statement, no branching           |
+| Boilerplate from template               | NO        | Developer can generate from pattern name |
+
+The boundary test: "Does Developer need to see exact placement and context to implement correctly?"
+
+- YES -> diff format
+- NO (can implement from description alone) -> prose sufficient
+
+</diff_format_decision>
+
+## Validation Checklist
+
+Before finalizing code changes in a plan:
+
+- [ ] File path is exact (not "auth files" but `src/auth/handler.py`)
+- [ ] Context lines exist in target file (validate patterns match actual code)
+- [ ] Comments explain WHY, not WHAT
+- [ ] No location directives in comments
+- [ ] No hidden baselines (test: "[adjective] compared to what?")
+- [ ] 2-3 context lines for reliable anchoring
+```
--- a/.claude/skills/planner/resources/plan-format.md
+++ b/.claude/skills/planner/resources/plan-format.md
@@ -0,0 +1,250 @@
+# Plan Format
+
+Write your plan using this structure:
+
+```markdown
+# [Plan Title]
+
+## Overview
+
+[Problem statement, chosen approach, and key decisions in 1-2 paragraphs]
+
+## Planning Context
+
+This section is consumed VERBATIM by downstream agents (Technical Writer,
+Quality Reviewer). Quality matters: vague entries here produce poor annotations
+and missed risks.
+
+### Decision Log
+
+| Decision           | Reasoning Chain                                              |
+| ------------------ | ------------------------------------------------------------ |
+| [What you decided] | [Multi-step reasoning: premise -> implication -> conclusion] |
+
+Each rationale must contain at least 2 reasoning steps. Single-step rationales
+are insufficient.
+
+INSUFFICIENT: "Polling over webhooks | Webhooks are unreliable" SUFFICIENT:
+"Polling over webhooks | Third-party API has 30% webhook delivery failure in
+testing -> unreliable delivery would require fallback polling anyway -> simpler
+to use polling as primary mechanism"
+
+INSUFFICIENT: "500ms timeout | Matches upstream latency" SUFFICIENT: "500ms
+timeout | Upstream 95th percentile is 450ms -> 500ms covers 95% of requests
+without timeout -> remaining 5% should fail fast rather than queue"
+
+Include BOTH architectural decisions AND implementation-level micro-decisions:
+
+- Architectural: "Event sourcing over CRUD | Need audit trail + replay
+  capability -> CRUD would require separate audit log -> event sourcing provides
+  both natively"
+- Implementation: "Mutex over channel | Single-writer case -> channel
+  coordination adds complexity without benefit -> mutex is simpler with
+  equivalent safety"
+
+Technical Writer sources ALL code comments from this table. If a micro-decision
+isn't here, TW cannot document it.
+
+### Rejected Alternatives
+
+| Alternative          | Why Rejected                                                        |
+| -------------------- | ------------------------------------------------------------------- |
+| [Approach not taken] | [Concrete reason: performance, complexity, doesn't fit constraints] |
+
+Technical Writer uses this to add "why not X" context to code comments.
+
+### Constraints & Assumptions
+
+- [Technical: API limits, language version, existing patterns to follow]
+- [Organizational: timeline, team expertise, approval requirements]
+- [Dependencies: external services, libraries, data formats]
+- [Default conventions applied: cite any `<default-conventions domain="...">`
+  used]
+
+### Known Risks
+
+| Risk            | Mitigation                                    | Anchor                                     |
+| --------------- | --------------------------------------------- | ------------------------------------------ |
+| [Specific risk] | [Concrete mitigation or "Accepted: [reason]"] | [file:L###-L### if claiming code behavior] |
+
+**Anchor requirement**: If mitigation claims existing code behavior ("no change
+needed", "already handles X"), cite the file:line + brief excerpt that proves
+the claim. Skip anchors for hypothetical risks or external unknowns.
+
+Quality Reviewer excludes these from findings but will challenge unverified
+behavioral claims.
+
+## Invisible Knowledge
+
+This section captures knowledge NOT deducible from reading the code alone.
+Technical Writer uses this for README.md documentation during
+post-implementation.
+
+**The test**: Would a new team member understand this from reading the source
+files? If no, it belongs here.
+
+**Categories** (not exhaustive -- apply the principle):
+
+1. **Architectural decisions**: Component relationships, data flow, module
+   boundaries
+2. **Business rules**: Domain constraints that shape implementation choices
+3. **System invariants**: Properties that must hold but are not enforced by
+   types/compiler
+4. **Historical context**: Why alternatives were rejected (links to Decision
+   Log)
+5. **Performance characteristics**: Non-obvious efficiency properties or
+   requirements
+6. **Tradeoffs**: Costs and benefits of chosen approaches
+
+### Architecture
+```
+
+[ASCII diagram showing component relationships]
+
+Example: User Request | v +----------+ +-------+ | Auth |---->| Cache |
+----------+ +-------+ | v +----------+ +------+ | Handler |---->| DB |
+----------+ +------+
+
+```
+
+### Data Flow
+
+```
+
+[How data moves through the system - inputs, transformations, outputs]
+
+Example: HTTP Request --> Validate --> Transform --> Store --> Response | v Log
+(async)
+
+````
+
+### Why This Structure
+
+[Reasoning behind module organization that isn't obvious from file names]
+
+- Why these boundaries exist
+- What would break if reorganized differently
+
+### Invariants
+
+[Rules that must be maintained but aren't enforced by code]
+
+- Ordering requirements
+- State consistency rules
+- Implicit contracts between components
+
+### Tradeoffs
+
+[Key decisions with their costs and benefits]
+
+- What was sacrificed for what gain
+- Performance vs. readability choices
+- Consistency vs. flexibility choices
+
+## Milestones
+
+### Milestone 1: [Name]
+
+**Files**: [exact paths - e.g., src/auth/handler.py, not "auth files"]
+
+**Flags** (if applicable): [needs TW rationale, needs error handling review, needs conformance check]
+
+**Requirements**:
+
+- [Specific: "Add retry with exponential backoff", not "improve error handling"]
+
+**Acceptance Criteria**:
+
+- [Testable: "Returns 429 after 3 failed attempts" - QR can verify pass/fail]
+- [Avoid vague: "Works correctly" or "Handles errors properly"]
+
+**Tests** (milestone not complete until tests pass):
+
+- **Test files**: [exact paths, e.g., tests/test_retry.py]
+- **Test type**: [integration | property-based | unit] - see default-conventions
+- **Backing**: [user-specified | doc-derived | default-derived]
+- **Scenarios**:
+  - Normal: [e.g., "successful retry after transient failure"]
+  - Edge: [e.g., "max retries exhausted", "zero delay"]
+  - Error: [e.g., "non-retryable error returns immediately"]
+
+Skip tests when: user explicitly stated no tests, OR milestone is documentation-only,
+OR project docs prohibit tests for this component. State skip reason explicitly.
+
+**Code Changes** (for non-trivial logic, use unified diff format):
+
+See `resources/diff-format.md` for specification.
+
+```diff
+--- a/path/to/file.py
+++ b/path/to/file.py
+@@ -123,6 +123,15 @@ def existing_function(ctx):
+   # Context lines (unchanged) serve as location anchors
+   existing_code()
+
+  # WHY comment explaining rationale - transcribed verbatim by Developer
+  new_code()
+
+   # More context to anchor the insertion point
+   more_existing_code()
+````
+
+### Milestone N: ...
+
+### Milestone [Last]: Documentation
+
+**Files**:
+
+- `path/to/CLAUDE.md` (index updates)
+- `path/to/README.md` (if Invisible Knowledge section has content)
+
+**Requirements**:
+
+- Update CLAUDE.md index entries for all new/modified files
+- Each entry has WHAT (contents) and WHEN (task triggers)
+- If plan's Invisible Knowledge section is non-empty:
+  - Create/update README.md with architecture diagrams from plan
+  - Include tradeoffs, invariants, "why this structure" content
+  - Verify diagrams match actual implementation
+
+**Acceptance Criteria**:
+
+- CLAUDE.md enables LLM to locate relevant code for debugging/modification tasks
+- README.md captures knowledge not discoverable from reading source files
+- Architecture diagrams in README.md match plan's Invisible Knowledge section
+
+**Source Material**: `## Invisible Knowledge` section of this plan
+
+### Cross-Milestone Integration Tests
+
+When integration tests require components from multiple milestones:
+
+1. Place integration tests in the LAST milestone that provides a required
+   component
+2. List dependencies explicitly in that milestone's **Tests** section
+3. Integration test milestone is not complete until all dependencies are
+   implemented
+
+Example:
+
+- M1: Auth handler (property tests for auth logic)
+- M2: Database layer (property tests for queries)
+- M3: API endpoint (integration tests covering M1 + M2 + M3 with testcontainers)
+
+The integration tests in M3 verify the full flow that end users would exercise,
+using real dependencies. This creates fast feedback as soon as all components
+exist.
+
+## Milestone Dependencies (if applicable)
+
+```
+M1 ---> M2
+   \
+    --> M3 --> M4
+```
+
+Independent milestones can execute in parallel during /plan-execution.
+
+```
+
+```
--- a/.claude/skills/planner/resources/temporal-contamination.md
+++ b/.claude/skills/planner/resources/temporal-contamination.md
@@ -0,0 +1,135 @@
+# Temporal Contamination in Code Comments
+
+This document defines terminology for identifying comments that leak information
+about code history, change processes, or planning artifacts. Both
+@agent-technical-writer and @agent-quality-reviewer reference this
+specification.
+
+## The Core Principle
+
+> **Timeless Present Rule**: Comments must be written from the perspective of a
+> reader encountering the code for the first time, with no knowledge of what
+> came before or how it got here. The code simply _is_.
+
+**Why this matters**: Change-narrative comments are an LLM artifact -- a
+category error, not merely a style issue. The change process is ephemeral and
+irrelevant to the code's ongoing existence. Humans writing comments naturally
+describe what code IS, not what they DID to create it. Referencing the change
+that created a comment is fundamentally confused about what belongs in
+documentation.
+
+Think of it this way: a novel's narrator never describes the author's typing
+process. Similarly, code comments should never describe the developer's editing
+process. The code simply exists; the path to its existence is invisible.
+
+In a plan, this means comments are written _as if the plan was already
+executed_.
+
+## Detection Heuristic
+
+Evaluate each comment against these five questions. Signal words are examples --
+extrapolate to semantically similar constructs.
+
+### 1. Does it describe an action taken rather than what exists?
+
+**Category**: Change-relative
+
+| Contaminated                           | Timeless Present                                            |
+| -------------------------------------- | ----------------------------------------------------------- |
+| `// Added mutex to fix race condition` | `// Mutex serializes cache access from concurrent requests` |
+| `// New validation for the edge case`  | `// Rejects negative values (downstream assumes unsigned)`  |
+| `// Changed to use batch API`          | `// Batch API reduces round-trips from N to 1`              |
+
+Signal words (non-exhaustive): "Added", "Replaced", "Now uses", "Changed to",
+"New", "Updated", "Refactored"
+
+### 2. Does it compare to something not in the code?
+
+**Category**: Baseline reference
+
+| Contaminated                                      | Timeless Present                                                    |
+| ------------------------------------------------- | ------------------------------------------------------------------- |
+| `// Replaces per-tag logging with summary`        | `// Single summary line; per-tag logging would produce 1500+ lines` |
+| `// Unlike the old approach, this is thread-safe` | `// Thread-safe: each goroutine gets independent state`             |
+| `// Previously handled in caller`                 | `// Encapsulated here; caller should not manage lifecycle`          |
+
+Signal words (non-exhaustive): "Instead of", "Rather than", "Previously",
+"Replaces", "Unlike the old", "No longer"
+
+### 3. Does it describe where to put code rather than what code does?
+
+**Category**: Location directive
+
+| Contaminated                  | Timeless Present                              |
+| ----------------------------- | --------------------------------------------- |
+| `// After the SendAsync call` | _(delete -- diff structure encodes location)_ |
+| `// Insert before validation` | _(delete -- diff structure encodes location)_ |
+| `// Add this at line 425`     | _(delete -- diff structure encodes location)_ |
+
+Signal words (non-exhaustive): "After", "Before", "Insert", "At line", "Here:",
+"Below", "Above"
+
+**Action**: Always delete. Location is encoded in diff structure, not comments.
+
+### 4. Does it describe intent rather than behavior?
+
+**Category**: Planning artifact
+
+| Contaminated                           | Timeless Present                                         |
+| -------------------------------------- | -------------------------------------------------------- |
+| `// TODO: add retry logic later`       | _(delete, or implement retry now)_                       |
+| `// Will be extended for batch mode`   | _(delete -- do not document hypothetical futures)_       |
+| `// Temporary workaround until API v2` | `// API v1 lacks filtering; client-side filter required` |
+
+Signal words (non-exhaustive): "Will", "TODO", "Planned", "Eventually", "For
+future", "Temporary", "Workaround until"
+
+**Action**: Delete, implement the feature, or reframe as current constraint.
+
+### 5. Does it describe the author's choice rather than code behavior?
+
+**Category**: Intent leakage
+
+| Contaminated                               | Timeless Present                                     |
+| ------------------------------------------ | ---------------------------------------------------- |
+| `// Intentionally placed after validation` | `// Runs after validation completes`                 |
+| `// Deliberately using mutex over channel` | `// Mutex serializes access (single-writer pattern)` |
+| `// Chose polling for reliability`         | `// Polling: 30% webhook delivery failures observed` |
+| `// We decided to cache at this layer`     | `// Cache here: reduces DB round-trips for hot path` |
+
+Signal words (non-exhaustive): "intentionally", "deliberately", "chose",
+"decided", "on purpose", "by design", "we opted"
+
+**Action**: Extract the technical justification; discard the decision narrative.
+The reader doesn't need to know someone "decided" -- they need to know WHY this
+approach works.
+
+**The test**: Can you delete the intent word and the comment still makes sense?
+If yes, delete the intent word. If no, reframe around the technical reason.
+
+---
+
+**Catch-all**: If a comment only makes sense to someone who knows the code's
+history, it is temporally contaminated -- even if it does not match any category
+above.
+
+## Subtle Cases
+
+Same word, different verdict -- demonstrates that detection requires semantic
+judgment, not keyword matching.
+
+| Comment                                | Verdict      | Reasoning                                        |
+| -------------------------------------- | ------------ | ------------------------------------------------ |
+| `// Now handles edge cases properly`   | Contaminated | "properly" implies it was improper before        |
+| `// Now blocks until connection ready` | Clean        | "now" describes runtime moment, not code history |
+| `// Fixed the null pointer issue`      | Contaminated | Describes a fix, not behavior                    |
+| `// Returns null when key not found`   | Clean        | Describes behavior                               |
+
+## The Transformation Pattern
+
+> **Extract the technical justification, discard the change narrative.**
+
+1. What useful info is buried? (problem, behavior)
+2. Reframe as timeless present
+
+Example: "Added mutex to fix race" -> "Mutex serializes concurrent access"
--- a/.claude/skills/planner/scripts/executor.py
+++ b/.claude/skills/planner/scripts/executor.py
@@ -0,0 +1,682 @@
+#!/usr/bin/env python3
+"""
+Plan Executor - Execute approved plans through delegation.
+
+Seven-phase execution workflow with JIT prompt injection:
+  Step 1: Execution Planning (analyze plan, detect reconciliation)
+  Step 2: Reconciliation (conditional, validate existing code)
+  Step 3: Milestone Execution (delegate to agents, run tests)
+  Step 4: Post-Implementation QR (quality review)
+  Step 5: QR Issue Resolution (conditional, fix issues)
+  Step 6: Documentation (TW pass)
+  Step 7: Retrospective (present summary)
+
+Usage:
+    python3 executor.py --plan-file PATH --step-number 1 --total-steps 7 --thoughts "..."
+"""
+
+import argparse
+import re
+import sys
+
+
+def detect_reconciliation_signals(thoughts: str) -> bool:
+    """Check if user's thoughts contain reconciliation triggers."""
+    triggers = [
+        r"\balready\s+(implemented|done|complete)",
+        r"\bpartially\s+complete",
+        r"\bhalfway\s+done",
+        r"\bresume\b",
+        r"\bcontinue\s+from\b",
+        r"\bpick\s+up\s+where\b",
+        r"\bcheck\s+what'?s\s+done\b",
+        r"\bverify\s+existing\b",
+        r"\bprior\s+work\b",
+    ]
+    thoughts_lower = thoughts.lower()
+    return any(re.search(pattern, thoughts_lower) for pattern in triggers)
+
+
+def get_step_1_guidance(plan_file: str, thoughts: str) -> dict:
+    """Step 1: Execution Planning - analyze plan, detect reconciliation."""
+    reconciliation_detected = detect_reconciliation_signals(thoughts)
+
+    actions = [
+        "EXECUTION PLANNING",
+        "",
+        f"Plan file: {plan_file}",
+        "",
+        "Read the plan file and analyze:",
+        "  1. Count milestones and their dependencies",
+        "  2. Identify file targets per milestone",
+        "  3. Determine parallelization opportunities",
+        "  4. Set up TodoWrite tracking for all milestones",
+        "",
+        "<execution_rules>",
+        "",
+        "RULE 0 (ABSOLUTE): Delegate ALL code work to specialized agents",
+        "",
+        "Your role: coordinate, validate, orchestrate. Agents implement code.",
+        "",
+        "Delegation routing:",
+        "  - New function needed -> @agent-developer",
+        "  - Bug to fix -> @agent-debugger (diagnose) then @agent-developer (fix)",
+        "  - Any source file modification -> @agent-developer",
+        "  - Documentation files -> @agent-technical-writer",
+        "",
+        "Exception (trivial only): Fixes under 5 lines where delegation overhead",
+        "exceeds fix complexity (missing import, typo correction).",
+        "",
+        "---",
+        "",
+        "RULE 1: Execution Protocol",
+        "",
+        "Before ANY phase:",
+        "  1. Use TodoWrite to track all plan phases",
+        "  2. Analyze dependencies to identify parallelizable work",
+        "  3. Delegate implementation to specialized agents",
+        "  4. Validate each increment before proceeding",
+        "",
+        "You plan HOW to execute (parallelization, sequencing). You do NOT plan",
+        "WHAT to execute -- that's the plan's job.",
+        "",
+        "---",
+        "",
+        "RULE 1.5: Model Selection",
+        "",
+        "Agent defaults (sonnet) are calibrated for quality. Adjust upward only.",
+        "",
+        "  | Action               | Allowed | Rationale                        |",
+        "  |----------------------|---------|----------------------------------|",
+        "  | Upgrade to opus      | YES     | Challenging tasks need reasoning |",
+        "  | Use default (sonnet) | YES     | Baseline for all delegations     |",
+        "  | Keep at sonnet+      | ALWAYS  | Maintains quality baseline       |",
+        "",
+        "</execution_rules>",
+        "",
+        "<dependency_analysis>",
+        "",
+        "Parallelizable when ALL conditions met:",
+        "  - Different target files",
+        "  - No data dependencies",
+        "  - No shared state (globals, configs, resources)",
+        "",
+        "Sequential when ANY condition true:",
+        "  - Same file modified by multiple tasks",
+        "  - Task B imports or depends on Task A's output",
+        "  - Shared database tables or external resources",
+        "",
+        "Before delegating ANY batch:",
+        "  1. List tasks with their target files",
+        "  2. Identify file dependencies (same file = sequential)",
+        "  3. Identify data dependencies (imports = sequential)",
+        "  4. Group independent tasks into parallel batches",
+        "  5. Separate batches with sync points",
+        "",
+        "</dependency_analysis>",
+        "",
+        "<milestone_type_detection>",
+        "",
+        "Before delegating ANY milestone, identify its type from file extensions:",
+        "",
+        "  | Milestone Type | Recognition Signal              | Delegate To             |",
+        "  |----------------|--------------------------------|-------------------------|",
+        "  | Documentation  | ALL files are *.md or *.rst    | @agent-technical-writer |",
+        "  | Code           | ANY file is source code        | @agent-developer        |",
+        "",
+        "Mixed milestones: Split delegation -- @agent-developer first (code),",
+        "then @agent-technical-writer (docs) after code completes.",
+        "",
+        "</milestone_type_detection>",
+        "",
+        "<delegation_format>",
+        "",
+        "EVERY delegation MUST use this structure:",
+        "",
+        "  <delegation>",
+        "    <agent>@agent-[developer|debugger|technical-writer|quality-reviewer]</agent>",
+        "    <mode>[For TW/QR: plan-scrub|post-implementation|plan-review|reconciliation]</mode>",
+        "    <plan_source>[Absolute path to plan file]</plan_source>",
+        "    <milestone>[Milestone number and name]</milestone>",
+        "    <files>[Exact file paths from milestone]</files>",
+        "    <task>[Specific task description]</task>",
+        "    <acceptance_criteria>",
+        "      - [Criterion 1 from plan]",
+        "      - [Criterion 2 from plan]",
+        "    </acceptance_criteria>",
+        "  </delegation>",
+        "",
+        "For parallel delegations, wrap multiple blocks:",
+        "",
+        "  <parallel_batch>",
+        "    <rationale>[Why these can run in parallel]</rationale>",
+        "    <sync_point>[Command to run after all complete]</sync_point>",
+        "    <delegation>...</delegation>",
+        "    <delegation>...</delegation>",
+        "  </parallel_batch>",
+        "",
+        "Agent limits:",
+        "  - @agent-developer: Maximum 4 parallel",
+        "  - @agent-debugger: Maximum 2 parallel",
+        "  - @agent-quality-reviewer: ALWAYS sequential",
+        "  - @agent-technical-writer: Can parallel across independent modules",
+        "",
+        "</delegation_format>",
+    ]
+
+    if reconciliation_detected:
+        next_step = (
+            "RECONCILIATION SIGNALS DETECTED in your thoughts.\n\n"
+            "Invoke step 2 to validate existing code against plan requirements:\n"
+            f'  python3 executor.py --plan-file "{plan_file}" --step-number 2 '
+            '--total-steps 7 --thoughts "Starting reconciliation..."'
+        )
+    else:
+        next_step = (
+            "No reconciliation signals detected. Proceed to milestone execution.\n\n"
+            "Invoke step 3 to begin delegating milestones:\n"
+            f'  python3 executor.py --plan-file "{plan_file}" --step-number 3 '
+            '--total-steps 7 --thoughts "Analyzed plan: N milestones, '
+            'parallel batches: [describe], starting execution..."'
+        )
+
+    return {
+        "actions": actions,
+        "next": next_step,
+    }
+
+
+def get_step_2_guidance(plan_file: str) -> dict:
+    """Step 2: Reconciliation - validate existing code against plan."""
+    return {
+        "actions": [
+            "RECONCILIATION PHASE",
+            "",
+            f"Plan file: {plan_file}",
+            "",
+            "Validate existing code against plan requirements BEFORE executing.",
+            "",
+            "<reconciliation_protocol>",
+            "",
+            "Delegate to @agent-quality-reviewer for each milestone:",
+            "",
+            "  Task for @agent-quality-reviewer:",
+            "  Mode: reconciliation",
+            "  Plan Source: [plan_file.md]",
+            "  Milestone: [N]",
+            "",
+            "  Check if the acceptance criteria for Milestone [N] are ALREADY",
+            "  satisfied in the current codebase. Validate REQUIREMENTS, not just",
+            "  code presence.",
+            "",
+            "  Return: SATISFIED | NOT_SATISFIED | PARTIALLY_SATISFIED",
+            "",
+            "---",
+            "",
+            "Execution based on reconciliation result:",
+            "",
+            "  | Result              | Action                                    |",
+            "  |---------------------|-------------------------------------------|",
+            "  | SATISFIED           | Skip execution, record as already complete|",
+            "  | NOT_SATISFIED       | Execute milestone normally                |",
+            "  | PARTIALLY_SATISFIED | Execute only the missing parts            |",
+            "",
+            "---",
+            "",
+            "Why requirements-based (not diff-based):",
+            "",
+            "Checking if code from the diff exists misses critical cases:",
+            "  - Code added but incorrect (doesn't meet acceptance criteria)",
+            "  - Code added but incomplete (partial implementation)",
+            "  - Requirements met by different code than planned (valid alternative)",
+            "",
+            "Checking acceptance criteria catches all of these.",
+            "",
+            "</reconciliation_protocol>",
+        ],
+        "next": (
+            "After collecting reconciliation results for all milestones, "
+            "invoke step 3:\n\n"
+            f'  python3 executor.py --plan-file "{plan_file}" --step-number 3 '
+            "--total-steps 7 --thoughts \"Reconciliation complete: "
+            'M1: SATISFIED, M2: NOT_SATISFIED, ..."'
+        ),
+    }
+
+
+def get_step_3_guidance(plan_file: str) -> dict:
+    """Step 3: Milestone Execution - delegate to agents, run tests."""
+    return {
+        "actions": [
+            "MILESTONE EXECUTION",
+            "",
+            f"Plan file: {plan_file}",
+            "",
+            "Execute milestones through delegation. Parallelize independent work.",
+            "",
+            "<diff_compliance_validation>",
+            "",
+            "BEFORE delegating each milestone with code changes:",
+            "  1. Read resources/diff-format.md if not already in context",
+            "  2. Verify plan's diffs meet specification:",
+            "     - Context lines are VERBATIM from actual files (not placeholders)",
+            "     - WHY comments explain rationale (not WHAT code does)",
+            "     - No location directives in comments",
+            "",
+            "AFTER @agent-developer completes, verify:",
+            "  - Context lines from plan were found in target file",
+            "  - WHY comments were transcribed verbatim to code",
+            "  - No location directives remain in implemented code",
+            "  - No temporal contamination leaked (change-relative language)",
+            "",
+            "If Developer reports context lines not found, check drift table below.",
+            "",
+            "</diff_compliance_validation>",
+            "",
+            "<error_handling>",
+            "",
+            "Error classification:",
+            "",
+            "  | Severity | Signals                          | Action                  |",
+            "  |----------|----------------------------------|-------------------------|",
+            "  | Critical | Segfault, data corruption        | STOP, @agent-debugger   |",
+            "  | High     | Test failures, missing deps      | @agent-debugger         |",
+            "  | Medium   | Type errors, lint failures       | Auto-fix, then debugger |",
+            "  | Low      | Warnings, style issues           | Note and continue       |",
+            "",
+            "Escalation triggers -- STOP and report when:",
+            "  - Fix would change fundamental approach",
+            "  - Three attempted solutions failed",
+            "  - Performance or safety characteristics affected",
+            "  - Confidence < 80%",
+            "",
+            "Context anchor mismatch protocol:",
+            "",
+            "When @agent-developer reports context lines don't match actual code:",
+            "",
+            "  | Mismatch Type               | Action                         |",
+            "  |-----------------------------|--------------------------------|",
+            "  | Whitespace/formatting only  | Proceed with normalized match  |",
+            "  | Minor variable rename       | Proceed, note in execution log |",
+            "  | Code restructured           | Proceed, note deviation        |",
+            "  | Context lines not found     | STOP - escalate to planner     |",
+            "  | Logic fundamentally changed | STOP - escalate to planner     |",
+            "",
+            "</error_handling>",
+            "",
+            "<acceptance_testing>",
+            "",
+            "Run after each milestone:",
+            "",
+            "  # Python",
+            "  pytest --strict-markers --strict-config",
+            "  mypy --strict",
+            "",
+            "  # JavaScript/TypeScript",
+            "  tsc --strict --noImplicitAny",
+            "  eslint --max-warnings=0",
+            "",
+            "  # Go",
+            "  go test -race -cover -vet=all",
+            "",
+            "Pass criteria: 100% tests pass, zero linter warnings.",
+            "",
+            "Self-consistency check (for milestones with >3 files):",
+            "  1. Developer's implementation notes claim: [what was implemented]",
+            "  2. Test results demonstrate: [what behavior was verified]",
+            "  3. Acceptance criteria state: [what was required]",
+            "",
+            "All three must align. Discrepancy = investigate before proceeding.",
+            "",
+            "</acceptance_testing>",
+        ],
+        "next": (
+            "CONTINUE in step 3 until ALL milestones complete:\n"
+            f'  python3 executor.py --plan-file "{plan_file}" --step-number 3 '
+            '--total-steps 7 --thoughts "Completed M1, M2. Executing M3..."'
+            "\n\n"
+            "When ALL milestones are complete, invoke step 4 for quality review:\n"
+            f'  python3 executor.py --plan-file "{plan_file}" --step-number 4 '
+            '--total-steps 7 --thoughts "All milestones complete. '
+            'Modified files: [list]. Ready for QR."'
+        ),
+    }
+
+
+def get_step_4_guidance(plan_file: str) -> dict:
+    """Step 4: Post-Implementation QR - quality review."""
+    return {
+        "actions": [
+            "POST-IMPLEMENTATION QUALITY REVIEW",
+            "",
+            f"Plan file: {plan_file}",
+            "",
+            "Delegate to @agent-quality-reviewer for comprehensive review.",
+            "",
+            "<qr_delegation>",
+            "",
+            "  Task for @agent-quality-reviewer:",
+            "  Mode: post-implementation",
+            "  Plan Source: [plan_file.md]",
+            "  Files Modified: [list]",
+            "  Reconciled Milestones: [list milestones that were SATISFIED]",
+            "",
+            "  Priority order for findings:",
+            "    1. Issues in reconciled milestones (bypassed execution validation)",
+            "    2. Issues in newly implemented milestones",
+            "    3. Cross-cutting issues",
+            "",
+            "  Checklist:",
+            "    - Every requirement implemented",
+            "    - No unauthorized deviations",
+            "    - Edge cases handled",
+            "    - Performance requirements met",
+            "",
+            "</qr_delegation>",
+            "",
+            "Expected output: PASS or issues list sorted by severity.",
+        ],
+        "next": (
+            "After QR completes:\n\n"
+            "If QR returns ISSUES -> invoke step 5:\n"
+            f'  python3 executor.py --plan-file "{plan_file}" --step-number 5 '
+            '--total-steps 7 --thoughts "QR found N issues: [summary]"'
+            "\n\n"
+            "If QR returns PASS -> invoke step 6:\n"
+            f'  python3 executor.py --plan-file "{plan_file}" --step-number 6 '
+            '--total-steps 7 --thoughts "QR passed. Proceeding to documentation."'
+        ),
+    }
+
+
+def get_step_5_guidance(plan_file: str) -> dict:
+    """Step 5: QR Issue Resolution - present issues, collect decisions, fix."""
+    return {
+        "actions": [
+            "QR ISSUE RESOLUTION",
+            "",
+            f"Plan file: {plan_file}",
+            "",
+            "Present issues to user, collect decisions, delegate fixes.",
+            "",
+            "<issue_resolution_protocol>",
+            "",
+            "Phase 1: Collect Decisions",
+            "",
+            "Sort findings by severity (critical -> high -> medium -> low).",
+            "For EACH issue, present:",
+            "",
+            "  ## Issue [N] of [Total] ([severity])",
+            "",
+            "  **Category**: [production-reliability | project-conformance | structural-quality]",
+            "  **File**: [affected file path]",
+            "  **Location**: [function/line if applicable]",
+            "",
+            "  **Problem**:",
+            "  [Clear description of what is wrong and why it matters]",
+            "",
+            "  **Evidence**:",
+            "  [Specific code/behavior that demonstrates the issue]",
+            "",
+            "Then use AskUserQuestion with options:",
+            "  - **Fix**: Delegate to @agent-developer to resolve",
+            "  - **Skip**: Accept the issue as-is",
+            "  - **Alternative**: User provides different approach",
+            "",
+            "Repeat for each issue. Do NOT execute any fixes during this phase.",
+            "",
+            "---",
+            "",
+            "Phase 2: Execute Decisions",
+            "",
+            "After ALL decisions are collected:",
+            "",
+            "  1. Summarize the decisions",
+            "  2. Execute fixes:",
+            "     - 'Fix' decisions: Delegate to @agent-developer",
+            "     - 'Skip' decisions: Record in retrospective as accepted risk",
+            "     - 'Alternative' decisions: Apply user's specified approach",
+            "  3. Parallelize where possible (different files, no dependencies)",
+            "",
+            "</issue_resolution_protocol>",
+        ],
+        "next": (
+            "After ALL fixes are applied, return to step 4 for re-validation:\n\n"
+            f'  python3 executor.py --plan-file "{plan_file}" --step-number 4 '
+            '--total-steps 7 --thoughts "Applied fixes for issues X, Y, Z. '
+            'Re-running QR."'
+            "\n\n"
+            "This creates a validation loop until QR passes."
+        ),
+    }
+
+
+def get_step_6_guidance(plan_file: str) -> dict:
+    """Step 6: Documentation - TW pass for CLAUDE.md, README.md."""
+    return {
+        "actions": [
+            "POST-IMPLEMENTATION DOCUMENTATION",
+            "",
+            f"Plan file: {plan_file}",
+            "",
+            "Delegate to @agent-technical-writer for documentation updates.",
+            "",
+            "<tw_delegation>",
+            "",
+            "Skip condition: If ALL milestones contained only documentation files",
+            "(*.md/*.rst), TW already handled this during milestone execution.",
+            "Proceed directly to step 7.",
+            "",
+            "For code-primary plans:",
+            "",
+            "  Task for @agent-technical-writer:",
+            "  Mode: post-implementation",
+            "  Plan Source: [plan_file.md]",
+            "  Files Modified: [list]",
+            "",
+            "  Requirements:",
+            "    - Create/update CLAUDE.md index entries",
+            "    - Create README.md if architectural complexity warrants",
+            "    - Add module-level docstrings where missing",
+            "    - Verify transcribed comments are accurate",
+            "",
+            "</tw_delegation>",
+            "",
+            "<final_checklist>",
+            "",
+            "Execution is NOT complete until:",
+            "  - [ ] All todos completed",
+            "  - [ ] Quality review passed (no unresolved issues)",
+            "  - [ ] Documentation delegated for ALL modified files",
+            "  - [ ] Documentation tasks completed",
+            "  - [ ] Self-consistency checks passed for complex milestones",
+            "",
+            "</final_checklist>",
+        ],
+        "next": (
+            "After documentation is complete, invoke step 7 for retrospective:\n\n"
+            f'  python3 executor.py --plan-file "{plan_file}" --step-number 7 '
+            '--total-steps 7 --thoughts "Documentation complete. '
+            'Generating retrospective."'
+        ),
+    }
+
+
+def get_step_7_guidance(plan_file: str) -> dict:
+    """Step 7: Retrospective - present execution summary."""
+    return {
+        "actions": [
+            "EXECUTION RETROSPECTIVE",
+            "",
+            f"Plan file: {plan_file}",
+            "",
+            "Generate and PRESENT the retrospective to the user.",
+            "Do NOT write to a file -- present it directly so the user sees it.",
+            "",
+            "<retrospective_format>",
+            "",
+            "================================================================================",
+            "EXECUTION RETROSPECTIVE",
+            "================================================================================",
+            "",
+            "Plan: [plan file path]",
+            "Status: COMPLETED | BLOCKED | ABORTED",
+            "",
+            "## Milestone Outcomes",
+            "",
+            "| Milestone  | Status               | Notes                              |",
+            "| ---------- | -------------------- | ---------------------------------- |",
+            "| 1: [name]  | EXECUTED             | -                                  |",
+            "| 2: [name]  | SKIPPED (RECONCILED) | Already satisfied before execution |",
+            "| 3: [name]  | BLOCKED              | [reason]                           |",
+            "",
+            "## Reconciliation Summary",
+            "",
+            "If reconciliation was run:",
+            "  - Milestones already complete: [count]",
+            "  - Milestones executed: [count]",
+            "  - Milestones with partial work detected: [count]",
+            "",
+            "If reconciliation was skipped:",
+            '  - "Reconciliation skipped (no prior work indicated)"',
+            "",
+            "## Plan Accuracy Issues",
+            "",
+            "[List any problems with the plan discovered during execution]",
+            "  - [file] Context anchor drift: expected X, found Y",
+            "  - Milestone [N] requirements were ambiguous: [what]",
+            "  - Missing dependency: [what was assumed but didn't exist]",
+            "",
+            'If none: "No plan accuracy issues encountered."',
+            "",
+            "## Deviations from Plan",
+            "",
+            "| Deviation      | Category        | Approved By      |",
+            "| -------------- | --------------- | ---------------- |",
+            "| [what changed] | Trivial / Minor | [who or 'auto']  |",
+            "",
+            'If none: "No deviations from plan."',
+            "",
+            "## Quality Review Summary",
+            "",
+            "  - Production reliability: [count] issues",
+            "  - Project conformance: [count] issues",
+            "  - Structural quality: [count] suggestions",
+            "",
+            "## Feedback for Future Plans",
+            "",
+            "[Actionable improvements based on execution experience]",
+            "  - [ ] [specific suggestion]",
+            "  - [ ] [specific suggestion]",
+            "",
+            "================================================================================",
+            "",
+            "</retrospective_format>",
+        ],
+        "next": "EXECUTION COMPLETE.\n\nPresent the retrospective to the user.",
+    }
+
+
+def get_step_guidance(step_number: int, plan_file: str, thoughts: str) -> dict:
+    """Route to appropriate step guidance."""
+    if step_number == 1:
+        return get_step_1_guidance(plan_file, thoughts)
+    elif step_number == 2:
+        return get_step_2_guidance(plan_file)
+    elif step_number == 3:
+        return get_step_3_guidance(plan_file)
+    elif step_number == 4:
+        return get_step_4_guidance(plan_file)
+    elif step_number == 5:
+        return get_step_5_guidance(plan_file)
+    elif step_number == 6:
+        return get_step_6_guidance(plan_file)
+    elif step_number == 7:
+        return get_step_7_guidance(plan_file)
+    else:
+        return {
+            "actions": [f"Unknown step {step_number}. Valid steps are 1-7."],
+            "next": "Re-invoke with a valid step number.",
+        }
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Plan Executor - Execute approved plans through delegation",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Start execution
+  python3 executor.py --plan-file plans/auth.md --step-number 1 --total-steps 7 \\
+    --thoughts "Execute the auth implementation plan"
+
+  # Continue milestone execution
+  python3 executor.py --plan-file plans/auth.md --step-number 3 --total-steps 7 \\
+    --thoughts "Completed M1, M2. Executing M3..."
+
+  # After QR finds issues
+  python3 executor.py --plan-file plans/auth.md --step-number 5 --total-steps 7 \\
+    --thoughts "QR found 2 issues: missing error handling, incorrect return type"
+""",
+    )
+
+    parser.add_argument(
+        "--plan-file", type=str, required=True, help="Path to the plan file to execute"
+    )
+    parser.add_argument("--step-number", type=int, required=True, help="Current step (1-7)")
+    parser.add_argument(
+        "--total-steps", type=int, required=True, help="Total steps (always 7)"
+    )
+    parser.add_argument(
+        "--thoughts", type=str, required=True, help="Your current thinking and status"
+    )
+
+    args = parser.parse_args()
+
+    if args.step_number < 1 or args.step_number > 7:
+        print("Error: step-number must be between 1 and 7", file=sys.stderr)
+        sys.exit(1)
+
+    if args.total_steps != 7:
+        print("Warning: total-steps should be 7 for executor", file=sys.stderr)
+
+    guidance = get_step_guidance(args.step_number, args.plan_file, args.thoughts)
+    is_complete = args.step_number >= 7
+
+    step_names = {
+        1: "Execution Planning",
+        2: "Reconciliation",
+        3: "Milestone Execution",
+        4: "Post-Implementation QR",
+        5: "QR Issue Resolution",
+        6: "Documentation",
+        7: "Retrospective",
+    }
+
+    print("=" * 80)
+    print(
+        f"EXECUTOR - Step {args.step_number} of 7: {step_names.get(args.step_number, 'Unknown')}"
+    )
+    print("=" * 80)
+    print()
+    print(f"STATUS: {'execution_complete' if is_complete else 'in_progress'}")
+    print()
+    print("YOUR THOUGHTS:")
+    print(args.thoughts)
+    print()
+
+    if guidance["actions"]:
+        print("GUIDANCE:")
+        print()
+        for action in guidance["actions"]:
+            print(action)
+        print()
+
+    print("NEXT:")
+    print(guidance["next"])
+    print()
+    print("=" * 80)
+
+
+if __name__ == "__main__":
+    main()
--- a/.claude/skills/planner/scripts/planner.py
+++ b/.claude/skills/planner/scripts/planner.py