feat: implement new claude skills and workflow
All checks were successful
Deploy to Staging / Build Images (push) Successful in 23s
Deploy to Staging / Deploy to Staging (push) Successful in 36s
Deploy to Staging / Verify Staging (push) Successful in 6s
Deploy to Staging / Notify Staging Ready (push) Successful in 6s
Deploy to Staging / Notify Staging Failure (push) Has been skipped

This commit is contained in:
Eric Gullickson
2026-01-03 11:02:30 -06:00
parent c443305007
commit 9f00797925
45 changed files with 10132 additions and 2174 deletions

View File

@@ -0,0 +1,156 @@
# Default Conventions
These conventions apply when project documentation does not specify otherwise.
## MotoVaultPro Project Conventions
**Naming**:
- Database columns: snake_case (`user_id`, `created_at`)
- TypeScript types: camelCase (`userId`, `createdAt`)
- API responses: camelCase
- Files: kebab-case (`vehicle-repository.ts`)
**Architecture**:
- Feature capsules: `backend/src/features/{feature}/`
- Repository pattern with mapRow() for case conversion
- Single-tenant, user-scoped data
**Frontend**:
- Mobile + desktop validation required (320px, 768px, 1920px)
- Touch targets >= 44px
- No hover-only interactions
**Development**:
- Local node development (`npm install`, `npm run dev`, `npm test`)
- CI/CD pipeline validates containers and integration tests
- Plans stored in Gitea Issue comments
---
## Priority Hierarchy
Higher tiers override lower. Cite backing source when auditing.
| Tier | Source | Action |
| ---- | --------------- | -------------------------------- |
| 1 | user-specified | Explicit user instruction: apply |
| 2 | doc-derived | CLAUDE.md / project docs: apply |
| 3 | default-derived | This document: apply |
| 4 | assumption | No backing: CONFIRM WITH USER |
## Severity Levels
| Level | Meaning | Action |
| ---------- | -------------------------------- | --------------- |
| SHOULD_FIX | Likely to cause maintenance debt | Flag for fixing |
| SUGGESTION | Improvement opportunity | Note if time |
---
## Structural Conventions
<default-conventions domain="god-object">
**God Object**: >15 public methods OR >10 dependencies OR mixed concerns (networking + UI + data)
Severity: SHOULD_FIX
</default-conventions>
<default-conventions domain="god-function">
**God Function**: >50 lines OR multiple abstraction levels OR >3 nesting levels
Severity: SHOULD_FIX
Exception: Inherently sequential algorithms or state machines
</default-conventions>
<default-conventions domain="duplicate-logic">
**Duplicate Logic**: Copy-pasted blocks, repeated error handling, parallel near-identical functions
Severity: SHOULD_FIX
</default-conventions>
<default-conventions domain="dead-code">
**Dead Code**: No callers, impossible branches, unread variables, unused imports
Severity: SUGGESTION
</default-conventions>
<default-conventions domain="inconsistent-error-handling">
**Inconsistent Error Handling**: Mixed exceptions/error codes, inconsistent types, swallowed errors
Severity: SUGGESTION
Exception: Project specifies different handling per error category
</default-conventions>
---
## File Organization Conventions
<default-conventions domain="test-organization">
**Test Organization**: Extend existing test files; create new only when:
- Distinct module boundary OR >500 lines OR different fixtures required
Severity: SHOULD_FIX (for unnecessary fragmentation)
</default-conventions>
<default-conventions domain="file-creation">
**File Creation**: Prefer extending existing files; create new only when:
- Clear module boundary OR >300-500 lines OR distinct responsibility
Severity: SUGGESTION
</default-conventions>
---
## Testing Conventions
<default-conventions domain="testing">
**Principle**: Test behavior, not implementation. Fast feedback.
**Test Type Hierarchy** (preference order):
1. **Integration tests** (highest value)
- Test end-user verifiable behavior
- Use real systems/dependencies (e.g., testcontainers)
- Verify component interaction at boundaries
- This is where the real value lies
2. **Property-based / generative tests** (preferred)
- Cover wide input space with invariant assertions
- Catch edge cases humans miss
- Use for functions with clear input/output contracts
3. **Unit tests** (use sparingly)
- Only for highly complex or critical logic
- Risk: maintenance liability, brittleness to refactoring
- Prefer integration tests that cover same behavior
**Test Placement**: Tests are part of implementation milestones, not separate
milestones. A milestone is not complete until its tests pass. This creates fast
feedback during development.
**DO**:
- Integration tests with real dependencies (testcontainers, etc.)
- Property-based tests for invariant-rich functions
- Parameterized fixtures over duplicate test bodies
- Test behavior observable by end users
**DON'T**:
- Test external library/dependency behavior (out of scope)
- Unit test simple code (maintenance liability exceeds value)
- Mock owned dependencies (use real implementations)
- Test implementation details that may change
- One-test-per-variant when parametrization applies
Severity: SHOULD_FIX (violations), SUGGESTION (missed opportunities)
</default-conventions>
---
## Modernization Conventions
<default-conventions domain="version-constraints">
**Version Constraint Violation**: Features unavailable in project's documented target version
Requires: Documented target version
Severity: SHOULD_FIX
</default-conventions>
<default-conventions domain="modernization">
**Modernization Opportunity**: Legacy APIs, verbose patterns, manual stdlib reimplementations
Severity: SUGGESTION
Exception: Project requires legacy pattern
</default-conventions>

View File

@@ -0,0 +1,201 @@
# Unified Diff Format for Plan Code Changes
This document is the authoritative specification for code changes in implementation plans.
## Purpose
Unified diff format encodes both **location** and **content** in a single structure. This eliminates the need for location directives in comments (e.g., "insert at line 42") and provides reliable anchoring even when line numbers drift.
## Anatomy
```diff
--- a/path/to/file.py
+++ b/path/to/file.py
@@ -123,6 +123,15 @@ def existing_function(ctx):
# Context lines (unchanged) serve as location anchors
existing_code()
+ # NEW: Comments explain WHY - transcribed verbatim by Developer
+ # Guard against race condition when messages arrive out-of-order
+ new_code()
# More context to anchor the insertion point
more_existing_code()
```
## Components
| Component | Authority | Purpose |
| ------------------------------------------ | ------------------------- | ---------------------------------------------------------- |
| File path (`--- a/path/to/file.py`) | **AUTHORITATIVE** | Exact target file |
| Line numbers (`@@ -123,6 +123,15 @@`) | **APPROXIMATE** | May drift as earlier milestones modify the file |
| Function context (`@@ ... @@ def func():`) | **SCOPE HINT** | Function/method containing the change |
| Context lines (unchanged) | **AUTHORITATIVE ANCHORS** | Developer matches these patterns to locate insertion point |
| `+` lines | **NEW CODE** | Code to add, including WHY comments |
| `-` lines | **REMOVED CODE** | Code to delete |
## Two-Layer Location Strategy
Code changes use two complementary layers for location:
1. **Prose scope hint** (optional): Natural language describing conceptual location
2. **Diff with context**: Precise insertion point via context line matching
### Layer 1: Prose Scope Hints
For complex changes, add a prose description before the diff block:
````markdown
Add validation after input sanitization in `UserService.validate()`:
```diff
@@ -123,6 +123,15 @@ def validate(self, user):
sanitized = sanitize(user.input)
+ # Validate format before proceeding
+ if not is_valid_format(sanitized):
+ raise ValidationError("Invalid format")
+
return process(sanitized)
`` `
```
````
The prose tells Developer **where conceptually** (which method, what operation precedes it). The diff tells Developer **where exactly** (context lines to match).
**When to use prose hints:**
- Changes to large files (>300 lines)
- Multiple changes to the same file in one milestone
- Complex nested structures where function context alone is ambiguous
- When the surrounding code logic matters for understanding placement
**When prose is optional:**
- Small files with obvious structure
- Single change with unique context lines
- Function context in @@ line provides sufficient scope
### Layer 2: Function Context in @@ Line
The `@@` line can include function/method context after the line numbers:
```diff
@@ -123,6 +123,15 @@ def validate(self, user):
```
This follows standard unified diff format (git generates this automatically). It tells Developer which function contains the change, aiding navigation even when line numbers drift.
## Why Context Lines Matter
When a plan has multiple milestones that modify the same file, earlier milestones shift line numbers. The `@@ -123` in Milestone 3 may no longer be accurate after Milestones 1 and 2 execute.
**Context lines solve this**: Developer searches for the unchanged context patterns in the actual file. These patterns are stable anchors that survive line number drift.
Include 2-3 context lines before and after changes for reliable matching.
## Comment Placement
Comments in `+` lines explain **WHY**, not **WHAT**. These comments:
- Are transcribed verbatim by Developer
- Source rationale from Planning Context (Decision Log, Rejected Alternatives)
- Use concrete terms without hidden baselines
- Must pass temporal contamination review (see `temporal-contamination.md`)
**Important**: Comments written during planning often contain temporal contamination -- change-relative language, baseline references, or location directives. @agent-technical-writer reviews and fixes these before @agent-developer transcribes them.
<example type="CORRECT" category="why_comment">
```diff
+ # Polling chosen over webhooks: 30% webhook delivery failures in third-party API
+ # WebSocket rejected to preserve stateless architecture
+ updates = poll_api(interval=30)
```
Explains WHY this approach was chosen.
</example>
<example type="INCORRECT" category="what_comment">
```diff
+ # Poll the API every 30 seconds
+ updates = poll_api(interval=30)
```
Restates WHAT the code does - redundant with the code itself.
</example>
<example type="INCORRECT" category="hidden_baseline">
```diff
+ # Generous timeout for slow networks
+ REQUEST_TIMEOUT = 60
```
"Generous" compared to what? Hidden baseline provides no actionable information.
</example>
<example type="CORRECT" category="concrete_justification">
```diff
+ # 60s accommodates 95th percentile upstream response times
+ REQUEST_TIMEOUT = 60
```
Concrete justification that explains why this specific value.
</example>
## Location Directives: Forbidden
The diff structure handles location. Location directives in comments are redundant and error-prone.
<example type="INCORRECT" category="location_directive">
```python
# Insert this BEFORE the retry loop (line 716)
# Timestamp guard: prevent older data from overwriting newer
get_ctx, get_cancel = context.with_timeout(ctx, 500)
```
Location directive leaked into comment - line numbers become stale.
</example>
<example type="CORRECT" category="location_directive">
```diff
@@ -714,6 +714,10 @@ def put(self, ctx, tags):
for tag in tags:
subject = tag.subject
- # Timestamp guard: prevent older data from overwriting newer
- # due to network delays, retries, or concurrent writes
- get_ctx, get_cancel = context.with_timeout(ctx, 500)
# Retry loop for Put operations
for attempt in range(max_retries):
```
Context lines (`for tag in tags`, `# Retry loop`) are stable anchors that survive line number drift.
</example>
## When to Use Diff Format
<diff_format_decision>
| Code Characteristic | Use Diff? | Boundary Test |
| --------------------------------------- | --------- | ---------------------------------------- |
| Conditionals, loops, error handling, | YES | Has branching logic |
| state machines | | |
| Multiple insertions same file | YES | >1 change location |
| Deletions or replacements | YES | Removing/changing existing code |
| Pure assignment/return (CRUD, getters) | NO | Single statement, no branching |
| Boilerplate from template | NO | Developer can generate from pattern name |
The boundary test: "Does Developer need to see exact placement and context to implement correctly?"
- YES -> diff format
- NO (can implement from description alone) -> prose sufficient
</diff_format_decision>
## Validation Checklist
Before finalizing code changes in a plan:
- [ ] File path is exact (not "auth files" but `src/auth/handler.py`)
- [ ] Context lines exist in target file (validate patterns match actual code)
- [ ] Comments explain WHY, not WHAT
- [ ] No location directives in comments
- [ ] No hidden baselines (test: "[adjective] compared to what?")
- [ ] 2-3 context lines for reliable anchoring
```

View File

@@ -0,0 +1,250 @@
# Plan Format
Write your plan using this structure:
```markdown
# [Plan Title]
## Overview
[Problem statement, chosen approach, and key decisions in 1-2 paragraphs]
## Planning Context
This section is consumed VERBATIM by downstream agents (Technical Writer,
Quality Reviewer). Quality matters: vague entries here produce poor annotations
and missed risks.
### Decision Log
| Decision | Reasoning Chain |
| ------------------ | ------------------------------------------------------------ |
| [What you decided] | [Multi-step reasoning: premise -> implication -> conclusion] |
Each rationale must contain at least 2 reasoning steps. Single-step rationales
are insufficient.
INSUFFICIENT: "Polling over webhooks | Webhooks are unreliable" SUFFICIENT:
"Polling over webhooks | Third-party API has 30% webhook delivery failure in
testing -> unreliable delivery would require fallback polling anyway -> simpler
to use polling as primary mechanism"
INSUFFICIENT: "500ms timeout | Matches upstream latency" SUFFICIENT: "500ms
timeout | Upstream 95th percentile is 450ms -> 500ms covers 95% of requests
without timeout -> remaining 5% should fail fast rather than queue"
Include BOTH architectural decisions AND implementation-level micro-decisions:
- Architectural: "Event sourcing over CRUD | Need audit trail + replay
capability -> CRUD would require separate audit log -> event sourcing provides
both natively"
- Implementation: "Mutex over channel | Single-writer case -> channel
coordination adds complexity without benefit -> mutex is simpler with
equivalent safety"
Technical Writer sources ALL code comments from this table. If a micro-decision
isn't here, TW cannot document it.
### Rejected Alternatives
| Alternative | Why Rejected |
| -------------------- | ------------------------------------------------------------------- |
| [Approach not taken] | [Concrete reason: performance, complexity, doesn't fit constraints] |
Technical Writer uses this to add "why not X" context to code comments.
### Constraints & Assumptions
- [Technical: API limits, language version, existing patterns to follow]
- [Organizational: timeline, team expertise, approval requirements]
- [Dependencies: external services, libraries, data formats]
- [Default conventions applied: cite any `<default-conventions domain="...">`
used]
### Known Risks
| Risk | Mitigation | Anchor |
| --------------- | --------------------------------------------- | ------------------------------------------ |
| [Specific risk] | [Concrete mitigation or "Accepted: [reason]"] | [file:L###-L### if claiming code behavior] |
**Anchor requirement**: If mitigation claims existing code behavior ("no change
needed", "already handles X"), cite the file:line + brief excerpt that proves
the claim. Skip anchors for hypothetical risks or external unknowns.
Quality Reviewer excludes these from findings but will challenge unverified
behavioral claims.
## Invisible Knowledge
This section captures knowledge NOT deducible from reading the code alone.
Technical Writer uses this for README.md documentation during
post-implementation.
**The test**: Would a new team member understand this from reading the source
files? If no, it belongs here.
**Categories** (not exhaustive -- apply the principle):
1. **Architectural decisions**: Component relationships, data flow, module
boundaries
2. **Business rules**: Domain constraints that shape implementation choices
3. **System invariants**: Properties that must hold but are not enforced by
types/compiler
4. **Historical context**: Why alternatives were rejected (links to Decision
Log)
5. **Performance characteristics**: Non-obvious efficiency properties or
requirements
6. **Tradeoffs**: Costs and benefits of chosen approaches
### Architecture
```
[ASCII diagram showing component relationships]
Example: User Request | v +----------+ +-------+ | Auth |---->| Cache |
+----------+ +-------+ | v +----------+ +------+ | Handler |---->| DB |
+----------+ +------+
```
### Data Flow
```
[How data moves through the system - inputs, transformations, outputs]
Example: HTTP Request --> Validate --> Transform --> Store --> Response | v Log
(async)
````
### Why This Structure
[Reasoning behind module organization that isn't obvious from file names]
- Why these boundaries exist
- What would break if reorganized differently
### Invariants
[Rules that must be maintained but aren't enforced by code]
- Ordering requirements
- State consistency rules
- Implicit contracts between components
### Tradeoffs
[Key decisions with their costs and benefits]
- What was sacrificed for what gain
- Performance vs. readability choices
- Consistency vs. flexibility choices
## Milestones
### Milestone 1: [Name]
**Files**: [exact paths - e.g., src/auth/handler.py, not "auth files"]
**Flags** (if applicable): [needs TW rationale, needs error handling review, needs conformance check]
**Requirements**:
- [Specific: "Add retry with exponential backoff", not "improve error handling"]
**Acceptance Criteria**:
- [Testable: "Returns 429 after 3 failed attempts" - QR can verify pass/fail]
- [Avoid vague: "Works correctly" or "Handles errors properly"]
**Tests** (milestone not complete until tests pass):
- **Test files**: [exact paths, e.g., tests/test_retry.py]
- **Test type**: [integration | property-based | unit] - see default-conventions
- **Backing**: [user-specified | doc-derived | default-derived]
- **Scenarios**:
- Normal: [e.g., "successful retry after transient failure"]
- Edge: [e.g., "max retries exhausted", "zero delay"]
- Error: [e.g., "non-retryable error returns immediately"]
Skip tests when: user explicitly stated no tests, OR milestone is documentation-only,
OR project docs prohibit tests for this component. State skip reason explicitly.
**Code Changes** (for non-trivial logic, use unified diff format):
See `resources/diff-format.md` for specification.
```diff
--- a/path/to/file.py
+++ b/path/to/file.py
@@ -123,6 +123,15 @@ def existing_function(ctx):
# Context lines (unchanged) serve as location anchors
existing_code()
+ # WHY comment explaining rationale - transcribed verbatim by Developer
+ new_code()
# More context to anchor the insertion point
more_existing_code()
````
### Milestone N: ...
### Milestone [Last]: Documentation
**Files**:
- `path/to/CLAUDE.md` (index updates)
- `path/to/README.md` (if Invisible Knowledge section has content)
**Requirements**:
- Update CLAUDE.md index entries for all new/modified files
- Each entry has WHAT (contents) and WHEN (task triggers)
- If plan's Invisible Knowledge section is non-empty:
- Create/update README.md with architecture diagrams from plan
- Include tradeoffs, invariants, "why this structure" content
- Verify diagrams match actual implementation
**Acceptance Criteria**:
- CLAUDE.md enables LLM to locate relevant code for debugging/modification tasks
- README.md captures knowledge not discoverable from reading source files
- Architecture diagrams in README.md match plan's Invisible Knowledge section
**Source Material**: `## Invisible Knowledge` section of this plan
### Cross-Milestone Integration Tests
When integration tests require components from multiple milestones:
1. Place integration tests in the LAST milestone that provides a required
component
2. List dependencies explicitly in that milestone's **Tests** section
3. Integration test milestone is not complete until all dependencies are
implemented
Example:
- M1: Auth handler (property tests for auth logic)
- M2: Database layer (property tests for queries)
- M3: API endpoint (integration tests covering M1 + M2 + M3 with testcontainers)
The integration tests in M3 verify the full flow that end users would exercise,
using real dependencies. This creates fast feedback as soon as all components
exist.
## Milestone Dependencies (if applicable)
```
M1 ---> M2
\
--> M3 --> M4
```
Independent milestones can execute in parallel during /plan-execution.
```
```

View File

@@ -0,0 +1,135 @@
# Temporal Contamination in Code Comments
This document defines terminology for identifying comments that leak information
about code history, change processes, or planning artifacts. Both
@agent-technical-writer and @agent-quality-reviewer reference this
specification.
## The Core Principle
> **Timeless Present Rule**: Comments must be written from the perspective of a
> reader encountering the code for the first time, with no knowledge of what
> came before or how it got here. The code simply _is_.
**Why this matters**: Change-narrative comments are an LLM artifact -- a
category error, not merely a style issue. The change process is ephemeral and
irrelevant to the code's ongoing existence. Humans writing comments naturally
describe what code IS, not what they DID to create it. Referencing the change
that created a comment is fundamentally confused about what belongs in
documentation.
Think of it this way: a novel's narrator never describes the author's typing
process. Similarly, code comments should never describe the developer's editing
process. The code simply exists; the path to its existence is invisible.
In a plan, this means comments are written _as if the plan was already
executed_.
## Detection Heuristic
Evaluate each comment against these five questions. Signal words are examples --
extrapolate to semantically similar constructs.
### 1. Does it describe an action taken rather than what exists?
**Category**: Change-relative
| Contaminated | Timeless Present |
| -------------------------------------- | ----------------------------------------------------------- |
| `// Added mutex to fix race condition` | `// Mutex serializes cache access from concurrent requests` |
| `// New validation for the edge case` | `// Rejects negative values (downstream assumes unsigned)` |
| `// Changed to use batch API` | `// Batch API reduces round-trips from N to 1` |
Signal words (non-exhaustive): "Added", "Replaced", "Now uses", "Changed to",
"New", "Updated", "Refactored"
### 2. Does it compare to something not in the code?
**Category**: Baseline reference
| Contaminated | Timeless Present |
| ------------------------------------------------- | ------------------------------------------------------------------- |
| `// Replaces per-tag logging with summary` | `// Single summary line; per-tag logging would produce 1500+ lines` |
| `// Unlike the old approach, this is thread-safe` | `// Thread-safe: each goroutine gets independent state` |
| `// Previously handled in caller` | `// Encapsulated here; caller should not manage lifecycle` |
Signal words (non-exhaustive): "Instead of", "Rather than", "Previously",
"Replaces", "Unlike the old", "No longer"
### 3. Does it describe where to put code rather than what code does?
**Category**: Location directive
| Contaminated | Timeless Present |
| ----------------------------- | --------------------------------------------- |
| `// After the SendAsync call` | _(delete -- diff structure encodes location)_ |
| `// Insert before validation` | _(delete -- diff structure encodes location)_ |
| `// Add this at line 425` | _(delete -- diff structure encodes location)_ |
Signal words (non-exhaustive): "After", "Before", "Insert", "At line", "Here:",
"Below", "Above"
**Action**: Always delete. Location is encoded in diff structure, not comments.
### 4. Does it describe intent rather than behavior?
**Category**: Planning artifact
| Contaminated | Timeless Present |
| -------------------------------------- | -------------------------------------------------------- |
| `// TODO: add retry logic later` | _(delete, or implement retry now)_ |
| `// Will be extended for batch mode` | _(delete -- do not document hypothetical futures)_ |
| `// Temporary workaround until API v2` | `// API v1 lacks filtering; client-side filter required` |
Signal words (non-exhaustive): "Will", "TODO", "Planned", "Eventually", "For
future", "Temporary", "Workaround until"
**Action**: Delete, implement the feature, or reframe as current constraint.
### 5. Does it describe the author's choice rather than code behavior?
**Category**: Intent leakage
| Contaminated | Timeless Present |
| ------------------------------------------ | ---------------------------------------------------- |
| `// Intentionally placed after validation` | `// Runs after validation completes` |
| `// Deliberately using mutex over channel` | `// Mutex serializes access (single-writer pattern)` |
| `// Chose polling for reliability` | `// Polling: 30% webhook delivery failures observed` |
| `// We decided to cache at this layer` | `// Cache here: reduces DB round-trips for hot path` |
Signal words (non-exhaustive): "intentionally", "deliberately", "chose",
"decided", "on purpose", "by design", "we opted"
**Action**: Extract the technical justification; discard the decision narrative.
The reader doesn't need to know someone "decided" -- they need to know WHY this
approach works.
**The test**: Can you delete the intent word and the comment still makes sense?
If yes, delete the intent word. If no, reframe around the technical reason.
---
**Catch-all**: If a comment only makes sense to someone who knows the code's
history, it is temporally contaminated -- even if it does not match any category
above.
## Subtle Cases
Same word, different verdict -- demonstrates that detection requires semantic
judgment, not keyword matching.
| Comment | Verdict | Reasoning |
| -------------------------------------- | ------------ | ------------------------------------------------ |
| `// Now handles edge cases properly` | Contaminated | "properly" implies it was improper before |
| `// Now blocks until connection ready` | Clean | "now" describes runtime moment, not code history |
| `// Fixed the null pointer issue` | Contaminated | Describes a fix, not behavior |
| `// Returns null when key not found` | Clean | Describes behavior |
## The Transformation Pattern
> **Extract the technical justification, discard the change narrative.**
1. What useful info is buried? (problem, behavior)
2. Reframe as timeless present
Example: "Added mutex to fix race" -> "Mutex serializes concurrent access"