feat: implement new claude skills and workflow
All checks were successful
Deploy to Staging / Build Images (push) Successful in 23s
Deploy to Staging / Deploy to Staging (push) Successful in 36s
Deploy to Staging / Verify Staging (push) Successful in 6s
Deploy to Staging / Notify Staging Ready (push) Successful in 6s
Deploy to Staging / Notify Staging Failure (push) Has been skipped

This commit is contained in:
Eric Gullickson
2026-01-03 11:02:30 -06:00
parent c443305007
commit 9f00797925
45 changed files with 10132 additions and 2174 deletions

View File

@@ -0,0 +1,86 @@
# skills/planner/
## Overview
Planning skill with resources that must stay synced with agent prompts.
## Index
| File/Directory | Contents | Read When |
| ------------------------------------- | ---------------------------------------------- | -------------------------------------------- |
| `SKILL.md` | Planning workflow, phases | Using the planner skill |
| `scripts/planner.py` | Step-by-step planning orchestration | Debugging planner behavior |
| `resources/plan-format.md` | Plan template (injected by script) | Editing plan structure |
| `resources/temporal-contamination.md` | Detection heuristic for contaminated comments | Updating TW/QR temporal contamination logic |
| `resources/diff-format.md` | Unified diff spec for code changes | Updating Developer diff consumption logic |
| `resources/default-conventions.md` | Default structural conventions (4-tier system) | Updating QR RULE 2 or planner decision audit |
## Resource Sync Requirements
Resources are **authoritative sources**.
- **SKILL.md** references resources directly (main Claude can read files)
- **Agent prompts** embed resources 1:1 (sub-agents cannot access files
reliably)
### plan-format.md
Plan template injected by `scripts/planner.py` at planning phase completion.
**No agent sync required** - the script reads and outputs the format directly,
so editing this file takes effect immediately without updating any agent
prompts.
### temporal-contamination.md
Authoritative source for temporal contamination detection. Full content embedded
1:1.
| Synced To | Embedded Section |
| ---------------------------- | -------------------------- |
| `agents/technical-writer.md` | `<temporal_contamination>` |
| `agents/quality-reviewer.md` | `<temporal_contamination>` |
**When updating**: Modify `resources/temporal-contamination.md` first, then copy
content into both `<temporal_contamination>` sections.
### diff-format.md
Authoritative source for unified diff format. Full content embedded 1:1.
| Synced To | Embedded Section |
| --------------------- | ---------------- |
| `agents/developer.md` | `<diff_format>` |
**When updating**: Modify `resources/diff-format.md` first, then copy content
into `<diff_format>` section.
### default-conventions.md
Authoritative source for default structural conventions (four-tier decision
backing system). Embedded 1:1 in QR for RULE 2 enforcement; referenced by
planner.py for decision audit.
| Synced To | Embedded Section |
| ---------------------------- | ----------------------- |
| `agents/quality-reviewer.md` | `<default_conventions>` |
**When updating**: Modify `resources/default-conventions.md` first, then copy
full content verbatim into `<default_conventions>` section in QR.
## Sync Verification
After modifying a resource, verify sync:
```bash
# Check temporal-contamination.md references
grep -l "temporal.contamination\|four detection questions\|change-relative\|baseline reference" agents/*.md
# Check diff-format.md references
grep -l "context lines\|AUTHORITATIVE\|APPROXIMATE\|context anchor" agents/*.md
# Check default-conventions.md references
grep -l "default_conventions\|domain: god-object\|domain: test-organization" agents/*.md
```
If grep finds files not listed in sync tables above, update this document.

View File

@@ -0,0 +1,80 @@
# Planner
LLM-generated plans have gaps. I have seen missing error handling, vague
acceptance criteria, specs that nobody can implement. I built this skill with
two workflows -- planning and execution -- connected by quality gates that catch
these problems early.
## Planning Workflow
```
Planning ----+
| |
v |
QR -------+ [fail: restart planning]
|
v
TW -------+
| |
v |
QR-Docs ----+ [fail: restart TW]
|
v
APPROVED
```
| Step | Actions |
| ----------------------- | -------------------------------------------------------------------------- |
| Context & Scope | Confirm path, define scope, identify approaches, list constraints |
| Decision & Architecture | Evaluate approaches, select with reasoning, diagram, break into milestones |
| Refinement | Document risks, add uncertainty flags, specify paths and criteria |
| Final Verification | Verify completeness, check specs, write to file |
| QR-Completeness | Verify Decision Log complete, policy defaults confirmed, plan structure |
| QR-Code | Read codebase, verify diff context, apply RULE 0/1/2 to proposed code |
| Technical Writer | Scrub temporal comments, add WHY comments, enrich rationale |
| QR-Docs | Verify no temporal contamination, comments explain WHY not WHAT |
So, why all the feedback loops? QR-Completeness and QR-Code run before TW to
catch structural issues early. QR-Docs runs after TW to validate documentation
quality. Doc issues restart only TW; structure issues restart planning. The loop
runs until both pass.
## Execution Workflow
```
Plan --> Milestones --> QR --> Docs --> Retrospective
^ |
+- [fail] -+
* Reconciliation phase precedes Milestones when resuming partial work
```
After planning completes and context clears (`/clear`), execution proceeds:
| Step | Purpose |
| ---------------------- | --------------------------------------------------------------- |
| Execution Planning | Analyze plan, detect reconciliation signals, output strategy |
| Reconciliation | (conditional) Validate existing code against plan |
| Milestone Execution | Delegate to agents, run tests; repeat until all complete |
| Post-Implementation QR | Quality review of implemented code |
| Issue Resolution | (conditional) Present issues, collect decisions, delegate fixes |
| Documentation | Technical writer updates CLAUDE.md/README.md |
| Retrospective | Present execution summary |
I designed the coordinator to never write code directly -- it delegates to
developers. Separating coordination from implementation produces cleaner
results. The coordinator:
- Parallelizes independent work across up to 4 developers per milestone
- Runs quality review after all milestones complete
- Loops through issue resolution until QR passes
- Invokes technical writer only after QR passes
**Reconciliation** handles resume scenarios. When the user request contains
signals like "already implemented", "resume", or "partially complete", the
workflow validates existing code against plan requirements before executing
remaining milestones. Building on unverified code means rework.
**Issue Resolution** presents each QR finding individually with options (Fix /
Skip / Alternative). Fixes delegate to developers or technical writers, then QR
runs again. This cycle repeats until QR passes.

View File

@@ -0,0 +1,59 @@
---
name: planner
description: Interactive planning and execution for complex tasks. Use when user asks to use or invoke planner skill.
---
# Planner Skill
Two-phase workflow: **planning** (create plans) and **execution** (implement
plans).
## Invocation Routing
| User Intent | Script | Invocation |
| ------------------------------------------- | ----------- | ---------------------------------------------------------------------------------- |
| "plan", "design", "architect", "break down" | planner.py | `python3 scripts/planner.py --step-number 1 --total-steps 4 --thoughts "..."` |
| "review plan" (after plan written) | planner.py | `python3 scripts/planner.py --phase review --step-number 1 --total-steps 2 ...` |
| "execute", "implement", "run plan" | executor.py | `python3 scripts/executor.py --plan-file PATH --step-number 1 --total-steps 7 ...` |
Scripts inject step-specific guidance via JIT prompt injection. Invoke the
script and follow its REQUIRED ACTIONS output.
## When to Use
Use when task has:
- Multiple milestones with dependencies
- Architectural decisions requiring documentation
- Complexity benefiting from forced reflection pauses
Skip when task is:
- Single-step with obvious implementation
- Quick fix or minor change
- Already well-specified by user
## Resources
| Resource | Contents | Read When |
| ------------------------------------- | ------------------------------------------ | ----------------------------------------------- |
| `resources/diff-format.md` | Unified diff specification for plans | Writing code changes in milestones |
| `resources/temporal-contamination.md` | Comment hygiene detection heuristics | Writing comments in code snippets |
| `resources/default-conventions.md` | Priority hierarchy, structural conventions | Making decisions without explicit user guidance |
| `resources/plan-format.md` | Plan template structure | Completing planning phase (injected by script) |
**Resource loading rule**: Scripts will prompt you to read specific resources at
decision points. When prompted, read the full resource before proceeding.
## Workflow Summary
**Planning phase**: Steps 1-N explore context, evaluate approaches, refine
milestones. Final step writes plan to file. Review phase (TW scrub -> QR
validation) follows.
**Execution phase**: 7 steps -- analyze plan, reconcile existing code, delegate
milestones to agents, QR validation, issue resolution, documentation,
retrospective.
All procedural details are injected by the scripts. Invoke the appropriate
script and follow its output.

View File

@@ -0,0 +1,156 @@
# Default Conventions
These conventions apply when project documentation does not specify otherwise.
## MotoVaultPro Project Conventions
**Naming**:
- Database columns: snake_case (`user_id`, `created_at`)
- TypeScript types: camelCase (`userId`, `createdAt`)
- API responses: camelCase
- Files: kebab-case (`vehicle-repository.ts`)
**Architecture**:
- Feature capsules: `backend/src/features/{feature}/`
- Repository pattern with mapRow() for case conversion
- Single-tenant, user-scoped data
**Frontend**:
- Mobile + desktop validation required (320px, 768px, 1920px)
- Touch targets >= 44px
- No hover-only interactions
**Development**:
- Local node development (`npm install`, `npm run dev`, `npm test`)
- CI/CD pipeline validates containers and integration tests
- Plans stored in Gitea Issue comments
---
## Priority Hierarchy
Higher tiers override lower. Cite backing source when auditing.
| Tier | Source | Action |
| ---- | --------------- | -------------------------------- |
| 1 | user-specified | Explicit user instruction: apply |
| 2 | doc-derived | CLAUDE.md / project docs: apply |
| 3 | default-derived | This document: apply |
| 4 | assumption | No backing: CONFIRM WITH USER |
## Severity Levels
| Level | Meaning | Action |
| ---------- | -------------------------------- | --------------- |
| SHOULD_FIX | Likely to cause maintenance debt | Flag for fixing |
| SUGGESTION | Improvement opportunity | Note if time |
---
## Structural Conventions
<default-conventions domain="god-object">
**God Object**: >15 public methods OR >10 dependencies OR mixed concerns (networking + UI + data)
Severity: SHOULD_FIX
</default-conventions>
<default-conventions domain="god-function">
**God Function**: >50 lines OR multiple abstraction levels OR >3 nesting levels
Severity: SHOULD_FIX
Exception: Inherently sequential algorithms or state machines
</default-conventions>
<default-conventions domain="duplicate-logic">
**Duplicate Logic**: Copy-pasted blocks, repeated error handling, parallel near-identical functions
Severity: SHOULD_FIX
</default-conventions>
<default-conventions domain="dead-code">
**Dead Code**: No callers, impossible branches, unread variables, unused imports
Severity: SUGGESTION
</default-conventions>
<default-conventions domain="inconsistent-error-handling">
**Inconsistent Error Handling**: Mixed exceptions/error codes, inconsistent types, swallowed errors
Severity: SUGGESTION
Exception: Project specifies different handling per error category
</default-conventions>
---
## File Organization Conventions
<default-conventions domain="test-organization">
**Test Organization**: Extend existing test files; create new only when:
- Distinct module boundary OR >500 lines OR different fixtures required
Severity: SHOULD_FIX (for unnecessary fragmentation)
</default-conventions>
<default-conventions domain="file-creation">
**File Creation**: Prefer extending existing files; create new only when:
- Clear module boundary OR >300-500 lines OR distinct responsibility
Severity: SUGGESTION
</default-conventions>
---
## Testing Conventions
<default-conventions domain="testing">
**Principle**: Test behavior, not implementation. Fast feedback.
**Test Type Hierarchy** (preference order):
1. **Integration tests** (highest value)
- Test end-user verifiable behavior
- Use real systems/dependencies (e.g., testcontainers)
- Verify component interaction at boundaries
- This is where the real value lies
2. **Property-based / generative tests** (preferred)
- Cover wide input space with invariant assertions
- Catch edge cases humans miss
- Use for functions with clear input/output contracts
3. **Unit tests** (use sparingly)
- Only for highly complex or critical logic
- Risk: maintenance liability, brittleness to refactoring
- Prefer integration tests that cover same behavior
**Test Placement**: Tests are part of implementation milestones, not separate
milestones. A milestone is not complete until its tests pass. This creates fast
feedback during development.
**DO**:
- Integration tests with real dependencies (testcontainers, etc.)
- Property-based tests for invariant-rich functions
- Parameterized fixtures over duplicate test bodies
- Test behavior observable by end users
**DON'T**:
- Test external library/dependency behavior (out of scope)
- Unit test simple code (maintenance liability exceeds value)
- Mock owned dependencies (use real implementations)
- Test implementation details that may change
- One-test-per-variant when parametrization applies
Severity: SHOULD_FIX (violations), SUGGESTION (missed opportunities)
</default-conventions>
---
## Modernization Conventions
<default-conventions domain="version-constraints">
**Version Constraint Violation**: Features unavailable in project's documented target version
Requires: Documented target version
Severity: SHOULD_FIX
</default-conventions>
<default-conventions domain="modernization">
**Modernization Opportunity**: Legacy APIs, verbose patterns, manual stdlib reimplementations
Severity: SUGGESTION
Exception: Project requires legacy pattern
</default-conventions>

View File

@@ -0,0 +1,201 @@
# Unified Diff Format for Plan Code Changes
This document is the authoritative specification for code changes in implementation plans.
## Purpose
Unified diff format encodes both **location** and **content** in a single structure. This eliminates the need for location directives in comments (e.g., "insert at line 42") and provides reliable anchoring even when line numbers drift.
## Anatomy
```diff
--- a/path/to/file.py
+++ b/path/to/file.py
@@ -123,6 +123,15 @@ def existing_function(ctx):
# Context lines (unchanged) serve as location anchors
existing_code()
+ # NEW: Comments explain WHY - transcribed verbatim by Developer
+ # Guard against race condition when messages arrive out-of-order
+ new_code()
# More context to anchor the insertion point
more_existing_code()
```
## Components
| Component | Authority | Purpose |
| ------------------------------------------ | ------------------------- | ---------------------------------------------------------- |
| File path (`--- a/path/to/file.py`) | **AUTHORITATIVE** | Exact target file |
| Line numbers (`@@ -123,6 +123,15 @@`) | **APPROXIMATE** | May drift as earlier milestones modify the file |
| Function context (`@@ ... @@ def func():`) | **SCOPE HINT** | Function/method containing the change |
| Context lines (unchanged) | **AUTHORITATIVE ANCHORS** | Developer matches these patterns to locate insertion point |
| `+` lines | **NEW CODE** | Code to add, including WHY comments |
| `-` lines | **REMOVED CODE** | Code to delete |
## Two-Layer Location Strategy
Code changes use two complementary layers for location:
1. **Prose scope hint** (optional): Natural language describing conceptual location
2. **Diff with context**: Precise insertion point via context line matching
### Layer 1: Prose Scope Hints
For complex changes, add a prose description before the diff block:
````markdown
Add validation after input sanitization in `UserService.validate()`:
```diff
@@ -123,6 +123,15 @@ def validate(self, user):
sanitized = sanitize(user.input)
+ # Validate format before proceeding
+ if not is_valid_format(sanitized):
+ raise ValidationError("Invalid format")
+
return process(sanitized)
`` `
```
````
The prose tells Developer **where conceptually** (which method, what operation precedes it). The diff tells Developer **where exactly** (context lines to match).
**When to use prose hints:**
- Changes to large files (>300 lines)
- Multiple changes to the same file in one milestone
- Complex nested structures where function context alone is ambiguous
- When the surrounding code logic matters for understanding placement
**When prose is optional:**
- Small files with obvious structure
- Single change with unique context lines
- Function context in @@ line provides sufficient scope
### Layer 2: Function Context in @@ Line
The `@@` line can include function/method context after the line numbers:
```diff
@@ -123,6 +123,15 @@ def validate(self, user):
```
This follows standard unified diff format (git generates this automatically). It tells Developer which function contains the change, aiding navigation even when line numbers drift.
## Why Context Lines Matter
When a plan has multiple milestones that modify the same file, earlier milestones shift line numbers. The `@@ -123` in Milestone 3 may no longer be accurate after Milestones 1 and 2 execute.
**Context lines solve this**: Developer searches for the unchanged context patterns in the actual file. These patterns are stable anchors that survive line number drift.
Include 2-3 context lines before and after changes for reliable matching.
## Comment Placement
Comments in `+` lines explain **WHY**, not **WHAT**. These comments:
- Are transcribed verbatim by Developer
- Source rationale from Planning Context (Decision Log, Rejected Alternatives)
- Use concrete terms without hidden baselines
- Must pass temporal contamination review (see `temporal-contamination.md`)
**Important**: Comments written during planning often contain temporal contamination -- change-relative language, baseline references, or location directives. @agent-technical-writer reviews and fixes these before @agent-developer transcribes them.
<example type="CORRECT" category="why_comment">
```diff
+ # Polling chosen over webhooks: 30% webhook delivery failures in third-party API
+ # WebSocket rejected to preserve stateless architecture
+ updates = poll_api(interval=30)
```
Explains WHY this approach was chosen.
</example>
<example type="INCORRECT" category="what_comment">
```diff
+ # Poll the API every 30 seconds
+ updates = poll_api(interval=30)
```
Restates WHAT the code does - redundant with the code itself.
</example>
<example type="INCORRECT" category="hidden_baseline">
```diff
+ # Generous timeout for slow networks
+ REQUEST_TIMEOUT = 60
```
"Generous" compared to what? Hidden baseline provides no actionable information.
</example>
<example type="CORRECT" category="concrete_justification">
```diff
+ # 60s accommodates 95th percentile upstream response times
+ REQUEST_TIMEOUT = 60
```
Concrete justification that explains why this specific value.
</example>
## Location Directives: Forbidden
The diff structure handles location. Location directives in comments are redundant and error-prone.
<example type="INCORRECT" category="location_directive">
```python
# Insert this BEFORE the retry loop (line 716)
# Timestamp guard: prevent older data from overwriting newer
get_ctx, get_cancel = context.with_timeout(ctx, 500)
```
Location directive leaked into comment - line numbers become stale.
</example>
<example type="CORRECT" category="location_directive">
```diff
@@ -714,6 +714,10 @@ def put(self, ctx, tags):
for tag in tags:
subject = tag.subject
- # Timestamp guard: prevent older data from overwriting newer
- # due to network delays, retries, or concurrent writes
- get_ctx, get_cancel = context.with_timeout(ctx, 500)
# Retry loop for Put operations
for attempt in range(max_retries):
```
Context lines (`for tag in tags`, `# Retry loop`) are stable anchors that survive line number drift.
</example>
## When to Use Diff Format
<diff_format_decision>
| Code Characteristic | Use Diff? | Boundary Test |
| --------------------------------------- | --------- | ---------------------------------------- |
| Conditionals, loops, error handling, | YES | Has branching logic |
| state machines | | |
| Multiple insertions same file | YES | >1 change location |
| Deletions or replacements | YES | Removing/changing existing code |
| Pure assignment/return (CRUD, getters) | NO | Single statement, no branching |
| Boilerplate from template | NO | Developer can generate from pattern name |
The boundary test: "Does Developer need to see exact placement and context to implement correctly?"
- YES -> diff format
- NO (can implement from description alone) -> prose sufficient
</diff_format_decision>
## Validation Checklist
Before finalizing code changes in a plan:
- [ ] File path is exact (not "auth files" but `src/auth/handler.py`)
- [ ] Context lines exist in target file (validate patterns match actual code)
- [ ] Comments explain WHY, not WHAT
- [ ] No location directives in comments
- [ ] No hidden baselines (test: "[adjective] compared to what?")
- [ ] 2-3 context lines for reliable anchoring
```

View File

@@ -0,0 +1,250 @@
# Plan Format
Write your plan using this structure:
```markdown
# [Plan Title]
## Overview
[Problem statement, chosen approach, and key decisions in 1-2 paragraphs]
## Planning Context
This section is consumed VERBATIM by downstream agents (Technical Writer,
Quality Reviewer). Quality matters: vague entries here produce poor annotations
and missed risks.
### Decision Log
| Decision | Reasoning Chain |
| ------------------ | ------------------------------------------------------------ |
| [What you decided] | [Multi-step reasoning: premise -> implication -> conclusion] |
Each rationale must contain at least 2 reasoning steps. Single-step rationales
are insufficient.
INSUFFICIENT: "Polling over webhooks | Webhooks are unreliable" SUFFICIENT:
"Polling over webhooks | Third-party API has 30% webhook delivery failure in
testing -> unreliable delivery would require fallback polling anyway -> simpler
to use polling as primary mechanism"
INSUFFICIENT: "500ms timeout | Matches upstream latency" SUFFICIENT: "500ms
timeout | Upstream 95th percentile is 450ms -> 500ms covers 95% of requests
without timeout -> remaining 5% should fail fast rather than queue"
Include BOTH architectural decisions AND implementation-level micro-decisions:
- Architectural: "Event sourcing over CRUD | Need audit trail + replay
capability -> CRUD would require separate audit log -> event sourcing provides
both natively"
- Implementation: "Mutex over channel | Single-writer case -> channel
coordination adds complexity without benefit -> mutex is simpler with
equivalent safety"
Technical Writer sources ALL code comments from this table. If a micro-decision
isn't here, TW cannot document it.
### Rejected Alternatives
| Alternative | Why Rejected |
| -------------------- | ------------------------------------------------------------------- |
| [Approach not taken] | [Concrete reason: performance, complexity, doesn't fit constraints] |
Technical Writer uses this to add "why not X" context to code comments.
### Constraints & Assumptions
- [Technical: API limits, language version, existing patterns to follow]
- [Organizational: timeline, team expertise, approval requirements]
- [Dependencies: external services, libraries, data formats]
- [Default conventions applied: cite any `<default-conventions domain="...">`
used]
### Known Risks
| Risk | Mitigation | Anchor |
| --------------- | --------------------------------------------- | ------------------------------------------ |
| [Specific risk] | [Concrete mitigation or "Accepted: [reason]"] | [file:L###-L### if claiming code behavior] |
**Anchor requirement**: If mitigation claims existing code behavior ("no change
needed", "already handles X"), cite the file:line + brief excerpt that proves
the claim. Skip anchors for hypothetical risks or external unknowns.
Quality Reviewer excludes these from findings but will challenge unverified
behavioral claims.
## Invisible Knowledge
This section captures knowledge NOT deducible from reading the code alone.
Technical Writer uses this for README.md documentation during
post-implementation.
**The test**: Would a new team member understand this from reading the source
files? If no, it belongs here.
**Categories** (not exhaustive -- apply the principle):
1. **Architectural decisions**: Component relationships, data flow, module
boundaries
2. **Business rules**: Domain constraints that shape implementation choices
3. **System invariants**: Properties that must hold but are not enforced by
types/compiler
4. **Historical context**: Why alternatives were rejected (links to Decision
Log)
5. **Performance characteristics**: Non-obvious efficiency properties or
requirements
6. **Tradeoffs**: Costs and benefits of chosen approaches
### Architecture
```
[ASCII diagram showing component relationships]
Example: User Request | v +----------+ +-------+ | Auth |---->| Cache |
+----------+ +-------+ | v +----------+ +------+ | Handler |---->| DB |
+----------+ +------+
```
### Data Flow
```
[How data moves through the system - inputs, transformations, outputs]
Example: HTTP Request --> Validate --> Transform --> Store --> Response | v Log
(async)
````
### Why This Structure
[Reasoning behind module organization that isn't obvious from file names]
- Why these boundaries exist
- What would break if reorganized differently
### Invariants
[Rules that must be maintained but aren't enforced by code]
- Ordering requirements
- State consistency rules
- Implicit contracts between components
### Tradeoffs
[Key decisions with their costs and benefits]
- What was sacrificed for what gain
- Performance vs. readability choices
- Consistency vs. flexibility choices
## Milestones
### Milestone 1: [Name]
**Files**: [exact paths - e.g., src/auth/handler.py, not "auth files"]
**Flags** (if applicable): [needs TW rationale, needs error handling review, needs conformance check]
**Requirements**:
- [Specific: "Add retry with exponential backoff", not "improve error handling"]
**Acceptance Criteria**:
- [Testable: "Returns 429 after 3 failed attempts" - QR can verify pass/fail]
- [Avoid vague: "Works correctly" or "Handles errors properly"]
**Tests** (milestone not complete until tests pass):
- **Test files**: [exact paths, e.g., tests/test_retry.py]
- **Test type**: [integration | property-based | unit] - see default-conventions
- **Backing**: [user-specified | doc-derived | default-derived]
- **Scenarios**:
- Normal: [e.g., "successful retry after transient failure"]
- Edge: [e.g., "max retries exhausted", "zero delay"]
- Error: [e.g., "non-retryable error returns immediately"]
Skip tests when: user explicitly stated no tests, OR milestone is documentation-only,
OR project docs prohibit tests for this component. State skip reason explicitly.
**Code Changes** (for non-trivial logic, use unified diff format):
See `resources/diff-format.md` for specification.
```diff
--- a/path/to/file.py
+++ b/path/to/file.py
@@ -123,6 +123,15 @@ def existing_function(ctx):
# Context lines (unchanged) serve as location anchors
existing_code()
+ # WHY comment explaining rationale - transcribed verbatim by Developer
+ new_code()
# More context to anchor the insertion point
more_existing_code()
````
### Milestone N: ...
### Milestone [Last]: Documentation
**Files**:
- `path/to/CLAUDE.md` (index updates)
- `path/to/README.md` (if Invisible Knowledge section has content)
**Requirements**:
- Update CLAUDE.md index entries for all new/modified files
- Each entry has WHAT (contents) and WHEN (task triggers)
- If plan's Invisible Knowledge section is non-empty:
- Create/update README.md with architecture diagrams from plan
- Include tradeoffs, invariants, "why this structure" content
- Verify diagrams match actual implementation
**Acceptance Criteria**:
- CLAUDE.md enables LLM to locate relevant code for debugging/modification tasks
- README.md captures knowledge not discoverable from reading source files
- Architecture diagrams in README.md match plan's Invisible Knowledge section
**Source Material**: `## Invisible Knowledge` section of this plan
### Cross-Milestone Integration Tests
When integration tests require components from multiple milestones:
1. Place integration tests in the LAST milestone that provides a required
component
2. List dependencies explicitly in that milestone's **Tests** section
3. Integration test milestone is not complete until all dependencies are
implemented
Example:
- M1: Auth handler (property tests for auth logic)
- M2: Database layer (property tests for queries)
- M3: API endpoint (integration tests covering M1 + M2 + M3 with testcontainers)
The integration tests in M3 verify the full flow that end users would exercise,
using real dependencies. This creates fast feedback as soon as all components
exist.
## Milestone Dependencies (if applicable)
```
M1 ---> M2
\
--> M3 --> M4
```
Independent milestones can execute in parallel during /plan-execution.
```
```

View File

@@ -0,0 +1,135 @@
# Temporal Contamination in Code Comments
This document defines terminology for identifying comments that leak information
about code history, change processes, or planning artifacts. Both
@agent-technical-writer and @agent-quality-reviewer reference this
specification.
## The Core Principle
> **Timeless Present Rule**: Comments must be written from the perspective of a
> reader encountering the code for the first time, with no knowledge of what
> came before or how it got here. The code simply _is_.
**Why this matters**: Change-narrative comments are an LLM artifact -- a
category error, not merely a style issue. The change process is ephemeral and
irrelevant to the code's ongoing existence. Humans writing comments naturally
describe what code IS, not what they DID to create it. Referencing the change
that created a comment is fundamentally confused about what belongs in
documentation.
Think of it this way: a novel's narrator never describes the author's typing
process. Similarly, code comments should never describe the developer's editing
process. The code simply exists; the path to its existence is invisible.
In a plan, this means comments are written _as if the plan was already
executed_.
## Detection Heuristic
Evaluate each comment against these five questions. Signal words are examples --
extrapolate to semantically similar constructs.
### 1. Does it describe an action taken rather than what exists?
**Category**: Change-relative
| Contaminated | Timeless Present |
| -------------------------------------- | ----------------------------------------------------------- |
| `// Added mutex to fix race condition` | `// Mutex serializes cache access from concurrent requests` |
| `// New validation for the edge case` | `// Rejects negative values (downstream assumes unsigned)` |
| `// Changed to use batch API` | `// Batch API reduces round-trips from N to 1` |
Signal words (non-exhaustive): "Added", "Replaced", "Now uses", "Changed to",
"New", "Updated", "Refactored"
### 2. Does it compare to something not in the code?
**Category**: Baseline reference
| Contaminated | Timeless Present |
| ------------------------------------------------- | ------------------------------------------------------------------- |
| `// Replaces per-tag logging with summary` | `// Single summary line; per-tag logging would produce 1500+ lines` |
| `// Unlike the old approach, this is thread-safe` | `// Thread-safe: each goroutine gets independent state` |
| `// Previously handled in caller` | `// Encapsulated here; caller should not manage lifecycle` |
Signal words (non-exhaustive): "Instead of", "Rather than", "Previously",
"Replaces", "Unlike the old", "No longer"
### 3. Does it describe where to put code rather than what code does?
**Category**: Location directive
| Contaminated | Timeless Present |
| ----------------------------- | --------------------------------------------- |
| `// After the SendAsync call` | _(delete -- diff structure encodes location)_ |
| `// Insert before validation` | _(delete -- diff structure encodes location)_ |
| `// Add this at line 425` | _(delete -- diff structure encodes location)_ |
Signal words (non-exhaustive): "After", "Before", "Insert", "At line", "Here:",
"Below", "Above"
**Action**: Always delete. Location is encoded in diff structure, not comments.
### 4. Does it describe intent rather than behavior?
**Category**: Planning artifact
| Contaminated | Timeless Present |
| -------------------------------------- | -------------------------------------------------------- |
| `// TODO: add retry logic later` | _(delete, or implement retry now)_ |
| `// Will be extended for batch mode` | _(delete -- do not document hypothetical futures)_ |
| `// Temporary workaround until API v2` | `// API v1 lacks filtering; client-side filter required` |
Signal words (non-exhaustive): "Will", "TODO", "Planned", "Eventually", "For
future", "Temporary", "Workaround until"
**Action**: Delete, implement the feature, or reframe as current constraint.
### 5. Does it describe the author's choice rather than code behavior?
**Category**: Intent leakage
| Contaminated | Timeless Present |
| ------------------------------------------ | ---------------------------------------------------- |
| `// Intentionally placed after validation` | `// Runs after validation completes` |
| `// Deliberately using mutex over channel` | `// Mutex serializes access (single-writer pattern)` |
| `// Chose polling for reliability` | `// Polling: 30% webhook delivery failures observed` |
| `// We decided to cache at this layer` | `// Cache here: reduces DB round-trips for hot path` |
Signal words (non-exhaustive): "intentionally", "deliberately", "chose",
"decided", "on purpose", "by design", "we opted"
**Action**: Extract the technical justification; discard the decision narrative.
The reader doesn't need to know someone "decided" -- they need to know WHY this
approach works.
**The test**: Can you delete the intent word and the comment still makes sense?
If yes, delete the intent word. If no, reframe around the technical reason.
---
**Catch-all**: If a comment only makes sense to someone who knows the code's
history, it is temporally contaminated -- even if it does not match any category
above.
## Subtle Cases
Same word, different verdict -- demonstrates that detection requires semantic
judgment, not keyword matching.
| Comment | Verdict | Reasoning |
| -------------------------------------- | ------------ | ------------------------------------------------ |
| `// Now handles edge cases properly` | Contaminated | "properly" implies it was improper before |
| `// Now blocks until connection ready` | Clean | "now" describes runtime moment, not code history |
| `// Fixed the null pointer issue` | Contaminated | Describes a fix, not behavior |
| `// Returns null when key not found` | Clean | Describes behavior |
## The Transformation Pattern
> **Extract the technical justification, discard the change narrative.**
1. What useful info is buried? (problem, behavior)
2. Reframe as timeless present
Example: "Added mutex to fix race" -> "Mutex serializes concurrent access"

View File

@@ -0,0 +1,682 @@
#!/usr/bin/env python3
"""
Plan Executor - Execute approved plans through delegation.
Seven-phase execution workflow with JIT prompt injection:
Step 1: Execution Planning (analyze plan, detect reconciliation)
Step 2: Reconciliation (conditional, validate existing code)
Step 3: Milestone Execution (delegate to agents, run tests)
Step 4: Post-Implementation QR (quality review)
Step 5: QR Issue Resolution (conditional, fix issues)
Step 6: Documentation (TW pass)
Step 7: Retrospective (present summary)
Usage:
python3 executor.py --plan-file PATH --step-number 1 --total-steps 7 --thoughts "..."
"""
import argparse
import re
import sys
def detect_reconciliation_signals(thoughts: str) -> bool:
"""Check if user's thoughts contain reconciliation triggers."""
triggers = [
r"\balready\s+(implemented|done|complete)",
r"\bpartially\s+complete",
r"\bhalfway\s+done",
r"\bresume\b",
r"\bcontinue\s+from\b",
r"\bpick\s+up\s+where\b",
r"\bcheck\s+what'?s\s+done\b",
r"\bverify\s+existing\b",
r"\bprior\s+work\b",
]
thoughts_lower = thoughts.lower()
return any(re.search(pattern, thoughts_lower) for pattern in triggers)
def get_step_1_guidance(plan_file: str, thoughts: str) -> dict:
"""Step 1: Execution Planning - analyze plan, detect reconciliation."""
reconciliation_detected = detect_reconciliation_signals(thoughts)
actions = [
"EXECUTION PLANNING",
"",
f"Plan file: {plan_file}",
"",
"Read the plan file and analyze:",
" 1. Count milestones and their dependencies",
" 2. Identify file targets per milestone",
" 3. Determine parallelization opportunities",
" 4. Set up TodoWrite tracking for all milestones",
"",
"<execution_rules>",
"",
"RULE 0 (ABSOLUTE): Delegate ALL code work to specialized agents",
"",
"Your role: coordinate, validate, orchestrate. Agents implement code.",
"",
"Delegation routing:",
" - New function needed -> @agent-developer",
" - Bug to fix -> @agent-debugger (diagnose) then @agent-developer (fix)",
" - Any source file modification -> @agent-developer",
" - Documentation files -> @agent-technical-writer",
"",
"Exception (trivial only): Fixes under 5 lines where delegation overhead",
"exceeds fix complexity (missing import, typo correction).",
"",
"---",
"",
"RULE 1: Execution Protocol",
"",
"Before ANY phase:",
" 1. Use TodoWrite to track all plan phases",
" 2. Analyze dependencies to identify parallelizable work",
" 3. Delegate implementation to specialized agents",
" 4. Validate each increment before proceeding",
"",
"You plan HOW to execute (parallelization, sequencing). You do NOT plan",
"WHAT to execute -- that's the plan's job.",
"",
"---",
"",
"RULE 1.5: Model Selection",
"",
"Agent defaults (sonnet) are calibrated for quality. Adjust upward only.",
"",
" | Action | Allowed | Rationale |",
" |----------------------|---------|----------------------------------|",
" | Upgrade to opus | YES | Challenging tasks need reasoning |",
" | Use default (sonnet) | YES | Baseline for all delegations |",
" | Keep at sonnet+ | ALWAYS | Maintains quality baseline |",
"",
"</execution_rules>",
"",
"<dependency_analysis>",
"",
"Parallelizable when ALL conditions met:",
" - Different target files",
" - No data dependencies",
" - No shared state (globals, configs, resources)",
"",
"Sequential when ANY condition true:",
" - Same file modified by multiple tasks",
" - Task B imports or depends on Task A's output",
" - Shared database tables or external resources",
"",
"Before delegating ANY batch:",
" 1. List tasks with their target files",
" 2. Identify file dependencies (same file = sequential)",
" 3. Identify data dependencies (imports = sequential)",
" 4. Group independent tasks into parallel batches",
" 5. Separate batches with sync points",
"",
"</dependency_analysis>",
"",
"<milestone_type_detection>",
"",
"Before delegating ANY milestone, identify its type from file extensions:",
"",
" | Milestone Type | Recognition Signal | Delegate To |",
" |----------------|--------------------------------|-------------------------|",
" | Documentation | ALL files are *.md or *.rst | @agent-technical-writer |",
" | Code | ANY file is source code | @agent-developer |",
"",
"Mixed milestones: Split delegation -- @agent-developer first (code),",
"then @agent-technical-writer (docs) after code completes.",
"",
"</milestone_type_detection>",
"",
"<delegation_format>",
"",
"EVERY delegation MUST use this structure:",
"",
" <delegation>",
" <agent>@agent-[developer|debugger|technical-writer|quality-reviewer]</agent>",
" <mode>[For TW/QR: plan-scrub|post-implementation|plan-review|reconciliation]</mode>",
" <plan_source>[Absolute path to plan file]</plan_source>",
" <milestone>[Milestone number and name]</milestone>",
" <files>[Exact file paths from milestone]</files>",
" <task>[Specific task description]</task>",
" <acceptance_criteria>",
" - [Criterion 1 from plan]",
" - [Criterion 2 from plan]",
" </acceptance_criteria>",
" </delegation>",
"",
"For parallel delegations, wrap multiple blocks:",
"",
" <parallel_batch>",
" <rationale>[Why these can run in parallel]</rationale>",
" <sync_point>[Command to run after all complete]</sync_point>",
" <delegation>...</delegation>",
" <delegation>...</delegation>",
" </parallel_batch>",
"",
"Agent limits:",
" - @agent-developer: Maximum 4 parallel",
" - @agent-debugger: Maximum 2 parallel",
" - @agent-quality-reviewer: ALWAYS sequential",
" - @agent-technical-writer: Can parallel across independent modules",
"",
"</delegation_format>",
]
if reconciliation_detected:
next_step = (
"RECONCILIATION SIGNALS DETECTED in your thoughts.\n\n"
"Invoke step 2 to validate existing code against plan requirements:\n"
f' python3 executor.py --plan-file "{plan_file}" --step-number 2 '
'--total-steps 7 --thoughts "Starting reconciliation..."'
)
else:
next_step = (
"No reconciliation signals detected. Proceed to milestone execution.\n\n"
"Invoke step 3 to begin delegating milestones:\n"
f' python3 executor.py --plan-file "{plan_file}" --step-number 3 '
'--total-steps 7 --thoughts "Analyzed plan: N milestones, '
'parallel batches: [describe], starting execution..."'
)
return {
"actions": actions,
"next": next_step,
}
def get_step_2_guidance(plan_file: str) -> dict:
"""Step 2: Reconciliation - validate existing code against plan."""
return {
"actions": [
"RECONCILIATION PHASE",
"",
f"Plan file: {plan_file}",
"",
"Validate existing code against plan requirements BEFORE executing.",
"",
"<reconciliation_protocol>",
"",
"Delegate to @agent-quality-reviewer for each milestone:",
"",
" Task for @agent-quality-reviewer:",
" Mode: reconciliation",
" Plan Source: [plan_file.md]",
" Milestone: [N]",
"",
" Check if the acceptance criteria for Milestone [N] are ALREADY",
" satisfied in the current codebase. Validate REQUIREMENTS, not just",
" code presence.",
"",
" Return: SATISFIED | NOT_SATISFIED | PARTIALLY_SATISFIED",
"",
"---",
"",
"Execution based on reconciliation result:",
"",
" | Result | Action |",
" |---------------------|-------------------------------------------|",
" | SATISFIED | Skip execution, record as already complete|",
" | NOT_SATISFIED | Execute milestone normally |",
" | PARTIALLY_SATISFIED | Execute only the missing parts |",
"",
"---",
"",
"Why requirements-based (not diff-based):",
"",
"Checking if code from the diff exists misses critical cases:",
" - Code added but incorrect (doesn't meet acceptance criteria)",
" - Code added but incomplete (partial implementation)",
" - Requirements met by different code than planned (valid alternative)",
"",
"Checking acceptance criteria catches all of these.",
"",
"</reconciliation_protocol>",
],
"next": (
"After collecting reconciliation results for all milestones, "
"invoke step 3:\n\n"
f' python3 executor.py --plan-file "{plan_file}" --step-number 3 '
"--total-steps 7 --thoughts \"Reconciliation complete: "
'M1: SATISFIED, M2: NOT_SATISFIED, ..."'
),
}
def get_step_3_guidance(plan_file: str) -> dict:
"""Step 3: Milestone Execution - delegate to agents, run tests."""
return {
"actions": [
"MILESTONE EXECUTION",
"",
f"Plan file: {plan_file}",
"",
"Execute milestones through delegation. Parallelize independent work.",
"",
"<diff_compliance_validation>",
"",
"BEFORE delegating each milestone with code changes:",
" 1. Read resources/diff-format.md if not already in context",
" 2. Verify plan's diffs meet specification:",
" - Context lines are VERBATIM from actual files (not placeholders)",
" - WHY comments explain rationale (not WHAT code does)",
" - No location directives in comments",
"",
"AFTER @agent-developer completes, verify:",
" - Context lines from plan were found in target file",
" - WHY comments were transcribed verbatim to code",
" - No location directives remain in implemented code",
" - No temporal contamination leaked (change-relative language)",
"",
"If Developer reports context lines not found, check drift table below.",
"",
"</diff_compliance_validation>",
"",
"<error_handling>",
"",
"Error classification:",
"",
" | Severity | Signals | Action |",
" |----------|----------------------------------|-------------------------|",
" | Critical | Segfault, data corruption | STOP, @agent-debugger |",
" | High | Test failures, missing deps | @agent-debugger |",
" | Medium | Type errors, lint failures | Auto-fix, then debugger |",
" | Low | Warnings, style issues | Note and continue |",
"",
"Escalation triggers -- STOP and report when:",
" - Fix would change fundamental approach",
" - Three attempted solutions failed",
" - Performance or safety characteristics affected",
" - Confidence < 80%",
"",
"Context anchor mismatch protocol:",
"",
"When @agent-developer reports context lines don't match actual code:",
"",
" | Mismatch Type | Action |",
" |-----------------------------|--------------------------------|",
" | Whitespace/formatting only | Proceed with normalized match |",
" | Minor variable rename | Proceed, note in execution log |",
" | Code restructured | Proceed, note deviation |",
" | Context lines not found | STOP - escalate to planner |",
" | Logic fundamentally changed | STOP - escalate to planner |",
"",
"</error_handling>",
"",
"<acceptance_testing>",
"",
"Run after each milestone:",
"",
" # Python",
" pytest --strict-markers --strict-config",
" mypy --strict",
"",
" # JavaScript/TypeScript",
" tsc --strict --noImplicitAny",
" eslint --max-warnings=0",
"",
" # Go",
" go test -race -cover -vet=all",
"",
"Pass criteria: 100% tests pass, zero linter warnings.",
"",
"Self-consistency check (for milestones with >3 files):",
" 1. Developer's implementation notes claim: [what was implemented]",
" 2. Test results demonstrate: [what behavior was verified]",
" 3. Acceptance criteria state: [what was required]",
"",
"All three must align. Discrepancy = investigate before proceeding.",
"",
"</acceptance_testing>",
],
"next": (
"CONTINUE in step 3 until ALL milestones complete:\n"
f' python3 executor.py --plan-file "{plan_file}" --step-number 3 '
'--total-steps 7 --thoughts "Completed M1, M2. Executing M3..."'
"\n\n"
"When ALL milestones are complete, invoke step 4 for quality review:\n"
f' python3 executor.py --plan-file "{plan_file}" --step-number 4 '
'--total-steps 7 --thoughts "All milestones complete. '
'Modified files: [list]. Ready for QR."'
),
}
def get_step_4_guidance(plan_file: str) -> dict:
"""Step 4: Post-Implementation QR - quality review."""
return {
"actions": [
"POST-IMPLEMENTATION QUALITY REVIEW",
"",
f"Plan file: {plan_file}",
"",
"Delegate to @agent-quality-reviewer for comprehensive review.",
"",
"<qr_delegation>",
"",
" Task for @agent-quality-reviewer:",
" Mode: post-implementation",
" Plan Source: [plan_file.md]",
" Files Modified: [list]",
" Reconciled Milestones: [list milestones that were SATISFIED]",
"",
" Priority order for findings:",
" 1. Issues in reconciled milestones (bypassed execution validation)",
" 2. Issues in newly implemented milestones",
" 3. Cross-cutting issues",
"",
" Checklist:",
" - Every requirement implemented",
" - No unauthorized deviations",
" - Edge cases handled",
" - Performance requirements met",
"",
"</qr_delegation>",
"",
"Expected output: PASS or issues list sorted by severity.",
],
"next": (
"After QR completes:\n\n"
"If QR returns ISSUES -> invoke step 5:\n"
f' python3 executor.py --plan-file "{plan_file}" --step-number 5 '
'--total-steps 7 --thoughts "QR found N issues: [summary]"'
"\n\n"
"If QR returns PASS -> invoke step 6:\n"
f' python3 executor.py --plan-file "{plan_file}" --step-number 6 '
'--total-steps 7 --thoughts "QR passed. Proceeding to documentation."'
),
}
def get_step_5_guidance(plan_file: str) -> dict:
"""Step 5: QR Issue Resolution - present issues, collect decisions, fix."""
return {
"actions": [
"QR ISSUE RESOLUTION",
"",
f"Plan file: {plan_file}",
"",
"Present issues to user, collect decisions, delegate fixes.",
"",
"<issue_resolution_protocol>",
"",
"Phase 1: Collect Decisions",
"",
"Sort findings by severity (critical -> high -> medium -> low).",
"For EACH issue, present:",
"",
" ## Issue [N] of [Total] ([severity])",
"",
" **Category**: [production-reliability | project-conformance | structural-quality]",
" **File**: [affected file path]",
" **Location**: [function/line if applicable]",
"",
" **Problem**:",
" [Clear description of what is wrong and why it matters]",
"",
" **Evidence**:",
" [Specific code/behavior that demonstrates the issue]",
"",
"Then use AskUserQuestion with options:",
" - **Fix**: Delegate to @agent-developer to resolve",
" - **Skip**: Accept the issue as-is",
" - **Alternative**: User provides different approach",
"",
"Repeat for each issue. Do NOT execute any fixes during this phase.",
"",
"---",
"",
"Phase 2: Execute Decisions",
"",
"After ALL decisions are collected:",
"",
" 1. Summarize the decisions",
" 2. Execute fixes:",
" - 'Fix' decisions: Delegate to @agent-developer",
" - 'Skip' decisions: Record in retrospective as accepted risk",
" - 'Alternative' decisions: Apply user's specified approach",
" 3. Parallelize where possible (different files, no dependencies)",
"",
"</issue_resolution_protocol>",
],
"next": (
"After ALL fixes are applied, return to step 4 for re-validation:\n\n"
f' python3 executor.py --plan-file "{plan_file}" --step-number 4 '
'--total-steps 7 --thoughts "Applied fixes for issues X, Y, Z. '
'Re-running QR."'
"\n\n"
"This creates a validation loop until QR passes."
),
}
def get_step_6_guidance(plan_file: str) -> dict:
"""Step 6: Documentation - TW pass for CLAUDE.md, README.md."""
return {
"actions": [
"POST-IMPLEMENTATION DOCUMENTATION",
"",
f"Plan file: {plan_file}",
"",
"Delegate to @agent-technical-writer for documentation updates.",
"",
"<tw_delegation>",
"",
"Skip condition: If ALL milestones contained only documentation files",
"(*.md/*.rst), TW already handled this during milestone execution.",
"Proceed directly to step 7.",
"",
"For code-primary plans:",
"",
" Task for @agent-technical-writer:",
" Mode: post-implementation",
" Plan Source: [plan_file.md]",
" Files Modified: [list]",
"",
" Requirements:",
" - Create/update CLAUDE.md index entries",
" - Create README.md if architectural complexity warrants",
" - Add module-level docstrings where missing",
" - Verify transcribed comments are accurate",
"",
"</tw_delegation>",
"",
"<final_checklist>",
"",
"Execution is NOT complete until:",
" - [ ] All todos completed",
" - [ ] Quality review passed (no unresolved issues)",
" - [ ] Documentation delegated for ALL modified files",
" - [ ] Documentation tasks completed",
" - [ ] Self-consistency checks passed for complex milestones",
"",
"</final_checklist>",
],
"next": (
"After documentation is complete, invoke step 7 for retrospective:\n\n"
f' python3 executor.py --plan-file "{plan_file}" --step-number 7 '
'--total-steps 7 --thoughts "Documentation complete. '
'Generating retrospective."'
),
}
def get_step_7_guidance(plan_file: str) -> dict:
"""Step 7: Retrospective - present execution summary."""
return {
"actions": [
"EXECUTION RETROSPECTIVE",
"",
f"Plan file: {plan_file}",
"",
"Generate and PRESENT the retrospective to the user.",
"Do NOT write to a file -- present it directly so the user sees it.",
"",
"<retrospective_format>",
"",
"================================================================================",
"EXECUTION RETROSPECTIVE",
"================================================================================",
"",
"Plan: [plan file path]",
"Status: COMPLETED | BLOCKED | ABORTED",
"",
"## Milestone Outcomes",
"",
"| Milestone | Status | Notes |",
"| ---------- | -------------------- | ---------------------------------- |",
"| 1: [name] | EXECUTED | - |",
"| 2: [name] | SKIPPED (RECONCILED) | Already satisfied before execution |",
"| 3: [name] | BLOCKED | [reason] |",
"",
"## Reconciliation Summary",
"",
"If reconciliation was run:",
" - Milestones already complete: [count]",
" - Milestones executed: [count]",
" - Milestones with partial work detected: [count]",
"",
"If reconciliation was skipped:",
' - "Reconciliation skipped (no prior work indicated)"',
"",
"## Plan Accuracy Issues",
"",
"[List any problems with the plan discovered during execution]",
" - [file] Context anchor drift: expected X, found Y",
" - Milestone [N] requirements were ambiguous: [what]",
" - Missing dependency: [what was assumed but didn't exist]",
"",
'If none: "No plan accuracy issues encountered."',
"",
"## Deviations from Plan",
"",
"| Deviation | Category | Approved By |",
"| -------------- | --------------- | ---------------- |",
"| [what changed] | Trivial / Minor | [who or 'auto'] |",
"",
'If none: "No deviations from plan."',
"",
"## Quality Review Summary",
"",
" - Production reliability: [count] issues",
" - Project conformance: [count] issues",
" - Structural quality: [count] suggestions",
"",
"## Feedback for Future Plans",
"",
"[Actionable improvements based on execution experience]",
" - [ ] [specific suggestion]",
" - [ ] [specific suggestion]",
"",
"================================================================================",
"",
"</retrospective_format>",
],
"next": "EXECUTION COMPLETE.\n\nPresent the retrospective to the user.",
}
def get_step_guidance(step_number: int, plan_file: str, thoughts: str) -> dict:
"""Route to appropriate step guidance."""
if step_number == 1:
return get_step_1_guidance(plan_file, thoughts)
elif step_number == 2:
return get_step_2_guidance(plan_file)
elif step_number == 3:
return get_step_3_guidance(plan_file)
elif step_number == 4:
return get_step_4_guidance(plan_file)
elif step_number == 5:
return get_step_5_guidance(plan_file)
elif step_number == 6:
return get_step_6_guidance(plan_file)
elif step_number == 7:
return get_step_7_guidance(plan_file)
else:
return {
"actions": [f"Unknown step {step_number}. Valid steps are 1-7."],
"next": "Re-invoke with a valid step number.",
}
def main():
parser = argparse.ArgumentParser(
description="Plan Executor - Execute approved plans through delegation",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Start execution
python3 executor.py --plan-file plans/auth.md --step-number 1 --total-steps 7 \\
--thoughts "Execute the auth implementation plan"
# Continue milestone execution
python3 executor.py --plan-file plans/auth.md --step-number 3 --total-steps 7 \\
--thoughts "Completed M1, M2. Executing M3..."
# After QR finds issues
python3 executor.py --plan-file plans/auth.md --step-number 5 --total-steps 7 \\
--thoughts "QR found 2 issues: missing error handling, incorrect return type"
""",
)
parser.add_argument(
"--plan-file", type=str, required=True, help="Path to the plan file to execute"
)
parser.add_argument("--step-number", type=int, required=True, help="Current step (1-7)")
parser.add_argument(
"--total-steps", type=int, required=True, help="Total steps (always 7)"
)
parser.add_argument(
"--thoughts", type=str, required=True, help="Your current thinking and status"
)
args = parser.parse_args()
if args.step_number < 1 or args.step_number > 7:
print("Error: step-number must be between 1 and 7", file=sys.stderr)
sys.exit(1)
if args.total_steps != 7:
print("Warning: total-steps should be 7 for executor", file=sys.stderr)
guidance = get_step_guidance(args.step_number, args.plan_file, args.thoughts)
is_complete = args.step_number >= 7
step_names = {
1: "Execution Planning",
2: "Reconciliation",
3: "Milestone Execution",
4: "Post-Implementation QR",
5: "QR Issue Resolution",
6: "Documentation",
7: "Retrospective",
}
print("=" * 80)
print(
f"EXECUTOR - Step {args.step_number} of 7: {step_names.get(args.step_number, 'Unknown')}"
)
print("=" * 80)
print()
print(f"STATUS: {'execution_complete' if is_complete else 'in_progress'}")
print()
print("YOUR THOUGHTS:")
print(args.thoughts)
print()
if guidance["actions"]:
print("GUIDANCE:")
print()
for action in guidance["actions"]:
print(action)
print()
print("NEXT:")
print(guidance["next"])
print()
print("=" * 80)
if __name__ == "__main__":
main()

File diff suppressed because it is too large Load Diff