feat: Expand OCR with fuel receipt scanning and owners manual maintenance extraction #129

Closed
opened 2026-02-11 02:24:30 +00:00 by egullickson · 7 comments
Owner

Summary

Expand the OCR functionality with two new scanning capabilities that leverage the existing OCR pipeline and Google cloud services:

  1. Fuel Receipt OCR Scanning - Take a photo of a fuel receipt during fuel log creation to auto-extract station, gallons/liters, and cost per unit
  2. Owners Manual Maintenance Schedule Extraction - Scan an uploaded owners manual to automatically extract routine maintenance schedules

Google Vision API 1,000 calls/month limit established in #127.


Feature 1: Fuel Receipt OCR Scanning

Description

When adding a fuel log, the user can take a photo of their fuel receipt (mirroring the existing VIN OCR decode UX pattern). The OCR extracts fields from the receipt image and pre-fills the fuel log form with editable values.

Requirements

  • UX Pattern: Mirror the existing VIN OCR decode screen (take photo -> OCR extract -> pre-fill editable fields -> user confirms/edits)
  • Extracted Fields:
    • Gas station name (matched via Google Places API lookup to link a real station object)
    • Gallons or liters (fuel quantity)
    • Cost per gallon/liter (unit price)
    • Total cost (if visible)
  • Fuel type: If the fuel type/grade cannot be detected from the receipt, that is acceptable - leave it for manual selection
  • All pre-filled fields must be editable before saving, just like the VIN decode screen
  • Google Vision API (TEXT_DETECTION) is the correct engine for single-image receipt scanning (scene text, not structured documents)
  • Tier Gating: Pro+ only (add fuelLog.receiptScan to FEATURE_TIERS)
  • Monthly Limit: Counts against the global 1,000 Google API calls/month cap

Station Matching Flow

  1. OCR extracts gas station name/brand from receipt text
  2. Backend calls Google Places API with extracted name to find matching station
  3. If match found, pre-select the station in the fuel log form
  4. User can change/clear the station selection

Technical Notes

  • Uses the synchronous OCR endpoint (POST /api/ocr/extract) since receipts are single images (1-3 seconds)
  • The OCR response already supports documentType: 'receipt' and extractedFields
  • Frontend needs a camera/photo capture component in the fuel log creation flow
  • Mobile-first: phone camera capture is the primary use case

Feature 2: Owners Manual Maintenance Schedule Extraction

Description

When uploading an owners manual document, a checkbox option "Scan for Maintenance Schedule" triggers a Gemini AI scan of the entire manual to extract routine maintenance items and their intervals. Extracted items are presented for user review before creating maintenance schedules.

Engine: Gemini 2.5 Flash on Vertex AI

Gemini is the right choice over Document AI for this use case because:

  • Semantic understanding: Gemini comprehends what a maintenance schedule means, not just layout/text extraction
  • Native PDF processing: Sends the PDF directly to Gemini -- no OCR preprocessing pipeline needed
  • Structured JSON output: Native responseMimeType: 'application/json' with responseSchema enforcement
  • 1M token context window: Handles entire owners manuals (up to ~1,500 pages of text)
  • Cost effective: ~$0.001-0.002 per page ($0.30 per 1M input tokens, $2.50 per 1M output tokens)

Gemini Prompt

Extract all routine scheduled maintenance items from this vehicle owners manual.

For each maintenance item, extract:
- serviceName: The maintenance task name (e.g., "Engine Oil Change", "Tire Rotation", "Cabin Air Filter Replacement")
- intervalMiles: The mileage interval as a number, or null if not specified (e.g., 5000, 30000)
- intervalMonths: The time interval in months as a number, or null if not specified (e.g., 6, 12, 24)
- details: Any additional details such as fluid specifications, part numbers, or special instructions (e.g., "Use 0W-20 full synthetic oil")

Only include routine scheduled maintenance items with clear intervals. Do not include one-time procedures, troubleshooting steps, or warranty information.

Return the results as a JSON object with a single "maintenanceSchedule" array.

Gemini Response Schema (enforced via responseSchema)

{
  "type": "object",
  "properties": {
    "maintenanceSchedule": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "serviceName": { "type": "string" },
          "intervalMiles": { "type": "number", "nullable": true },
          "intervalMonths": { "type": "number", "nullable": true },
          "details": { "type": "string", "nullable": true }
        },
        "required": ["serviceName"]
      }
    }
  },
  "required": ["maintenanceSchedule"]
}

Example Gemini Response

{
  "maintenanceSchedule": [
    {
      "serviceName": "Engine Oil Change",
      "intervalMiles": 5000,
      "intervalMonths": 6,
      "details": "Use 0W-20 full synthetic oil. Replace oil filter at every oil change."
    },
    {
      "serviceName": "Tire Rotation",
      "intervalMiles": 5000,
      "intervalMonths": 6,
      "details": "Rotate front to rear on same side."
    },
    {
      "serviceName": "Cabin Air Filter Replacement",
      "intervalMiles": 15000,
      "intervalMonths": 12,
      "details": null
    },
    {
      "serviceName": "Brake Fluid Replacement",
      "intervalMiles": null,
      "intervalMonths": 36,
      "details": "Use DOT 3 brake fluid."
    },
    {
      "serviceName": "Spark Plug Replacement",
      "intervalMiles": 60000,
      "intervalMonths": null,
      "details": "Iridium spark plugs. Torque to 18 ft-lbs."
    }
  ]
}

GCP Setup Instructions

1. Enable the Vertex AI API

# Via gcloud CLI
gcloud services enable aiplatform.googleapis.com --project=YOUR_PROJECT_ID

# Or via GCP Console:
# APIs & Services > Enable APIs and Services > Search "Vertex AI API" > Enable

2. Service Account Permissions

The existing service account (used for Google Vision in #127) needs one additional IAM role:

Role Role ID Purpose
Vertex AI User roles/aiplatform.user Required for aiplatform.endpoints.predict permission
# Grant the Vertex AI User role to the existing service account
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
  --member="serviceAccount:YOUR_SERVICE_ACCOUNT@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

If using Workload Identity Federation (WIF) from #127, the same federated identity gets the additional role -- no new service account needed.

3. SDK Dependency

# Add to ocr/requirements.txt (Python OCR service)
google-cloud-aiplatform>=1.40.0

# OR for the Node.js backend proxy
npm install @google-cloud/vertexai

4. Environment Variables

Variable Default Description
VERTEX_AI_PROJECT (required) GCP project ID
VERTEX_AI_LOCATION us-central1 GCP region for Vertex AI
GEMINI_MODEL gemini-2.5-flash Gemini model ID

5. Authentication

Uses the same credential path as Google Vision (#127):

  • Development: GOOGLE_APPLICATION_CREDENTIALS env var pointing to service account key JSON
  • Production: Workload Identity Federation (WIF) via Auth0 -- already configured in #127

Requirements

  • Checkbox: "Scan for Maintenance Schedule" on the document upload form (when document type is owners manual)
  • Long-running task: Owners manuals are 10-200MB, 100-300 pages. Gemini processes faster than traditional OCR but still takes 10-60+ seconds for large manuals. Must use the async OCR job flow (POST /api/ocr/jobs -> poll GET /api/ocr/jobs/:jobId)
  • PDF delivery to Gemini: For manuals under 20MB, use inline base64. For manuals over 20MB, upload to GCS first and pass the gs:// URI
  • Extracted Data Per Item:
    • Service/maintenance item name (e.g., "Oil Change", "Tire Rotation")
    • Interval in miles/km (e.g., every 5,000 miles)
    • Interval in months (e.g., every 6 months)
    • Additional details/notes (e.g., "Use 0W-20 synthetic")
  • User Review Flow: After extraction completes, present all extracted maintenance items in a review screen. User selects which items to create as maintenance_schedules. User can edit any field before confirming.
  • Creates maintenance_schedules (recurring schedules with interval_months / interval_miles) -- NOT one-time records
  • Tier Gating: Already defined as Pro+ (document.scanMaintenanceSchedule in FEATURE_TIERS)
  • Vehicle association: The owners manual document must be associated with a vehicle so the created schedules link to the correct vehicle

Technical Notes

  • Uses the async OCR job endpoint since manuals are large files
  • Gemini replaces the entire OCR preprocessing + pattern matching pipeline for manuals -- no PaddleOCR, no spaCy NER, no layout analysis needed
  • Frontend needs a progress indicator while the async job runs, and a notification when complete
  • Must map extracted serviceName values to existing maintenance categories/subtypes (routine_maintenance with appropriate subtypes from the 27 available)

Shared Concerns

Mobile + Desktop

  • Both features MUST work on mobile and desktop per project requirements
  • Fuel receipt scanning: Mobile is the primary use case (phone camera capture)
  • Manual scanning: Desktop may be more common (uploading PDF files), but mobile must work too

Acceptance Criteria

  • Fuel receipt OCR scanning works on mobile (camera capture) and desktop (file upload)
  • Extracted receipt fields pre-fill fuel log form with editable values
  • Station name from receipt is matched via Google Places API and linked
  • Fuel receipt scan is gated to Pro+ tier
  • Owners manual scan checkbox appears on document upload for owners manuals
  • Gemini 2.5 Flash processes manual PDF and returns structured JSON maintenance schedule
  • Async job flow handles long-running Gemini extraction with progress feedback
  • Extracted maintenance items are presented for user review before creation
  • User can select/deselect and edit items before creating schedules
  • Created items are maintenance_schedules with correct intervals
  • Manual scan respects existing Pro+ tier gating
  • Both features respect the global 1,000 calls/month Google API limit
  • All fields are editable before saving (both features)
  • Works on both mobile and desktop viewports
  • Vertex AI API enabled and service account has roles/aiplatform.user role
  • GCP authentication works via WIF (production) and service account key (development)
## Summary Expand the OCR functionality with two new scanning capabilities that leverage the existing OCR pipeline and Google cloud services: 1. **Fuel Receipt OCR Scanning** - Take a photo of a fuel receipt during fuel log creation to auto-extract station, gallons/liters, and cost per unit 2. **Owners Manual Maintenance Schedule Extraction** - Scan an uploaded owners manual to automatically extract routine maintenance schedules Google Vision API 1,000 calls/month limit established in #127. --- ## Feature 1: Fuel Receipt OCR Scanning ### Description When adding a fuel log, the user can take a photo of their fuel receipt (mirroring the existing VIN OCR decode UX pattern). The OCR extracts fields from the receipt image and pre-fills the fuel log form with editable values. ### Requirements - **UX Pattern**: Mirror the existing VIN OCR decode screen (take photo -> OCR extract -> pre-fill editable fields -> user confirms/edits) - **Extracted Fields**: - Gas station name (matched via Google Places API lookup to link a real station object) - Gallons or liters (fuel quantity) - Cost per gallon/liter (unit price) - Total cost (if visible) - **Fuel type**: If the fuel type/grade cannot be detected from the receipt, that is acceptable - leave it for manual selection - **All pre-filled fields must be editable** before saving, just like the VIN decode screen - **Google Vision API** (`TEXT_DETECTION`) is the correct engine for single-image receipt scanning (scene text, not structured documents) - **Tier Gating**: Pro+ only (add `fuelLog.receiptScan` to `FEATURE_TIERS`) - **Monthly Limit**: Counts against the global 1,000 Google API calls/month cap ### Station Matching Flow 1. OCR extracts gas station name/brand from receipt text 2. Backend calls Google Places API with extracted name to find matching station 3. If match found, pre-select the station in the fuel log form 4. User can change/clear the station selection ### Technical Notes - Uses the synchronous OCR endpoint (`POST /api/ocr/extract`) since receipts are single images (1-3 seconds) - The OCR response already supports `documentType: 'receipt'` and `extractedFields` - Frontend needs a camera/photo capture component in the fuel log creation flow - Mobile-first: phone camera capture is the primary use case --- ## Feature 2: Owners Manual Maintenance Schedule Extraction ### Description When uploading an owners manual document, a checkbox option "Scan for Maintenance Schedule" triggers a Gemini AI scan of the entire manual to extract routine maintenance items and their intervals. Extracted items are presented for user review before creating maintenance schedules. ### Engine: Gemini 2.5 Flash on Vertex AI Gemini is the right choice over Document AI for this use case because: - **Semantic understanding**: Gemini comprehends what a maintenance schedule means, not just layout/text extraction - **Native PDF processing**: Sends the PDF directly to Gemini -- no OCR preprocessing pipeline needed - **Structured JSON output**: Native `responseMimeType: 'application/json'` with `responseSchema` enforcement - **1M token context window**: Handles entire owners manuals (up to ~1,500 pages of text) - **Cost effective**: ~$0.001-0.002 per page ($0.30 per 1M input tokens, $2.50 per 1M output tokens) ### Gemini Prompt ``` Extract all routine scheduled maintenance items from this vehicle owners manual. For each maintenance item, extract: - serviceName: The maintenance task name (e.g., "Engine Oil Change", "Tire Rotation", "Cabin Air Filter Replacement") - intervalMiles: The mileage interval as a number, or null if not specified (e.g., 5000, 30000) - intervalMonths: The time interval in months as a number, or null if not specified (e.g., 6, 12, 24) - details: Any additional details such as fluid specifications, part numbers, or special instructions (e.g., "Use 0W-20 full synthetic oil") Only include routine scheduled maintenance items with clear intervals. Do not include one-time procedures, troubleshooting steps, or warranty information. Return the results as a JSON object with a single "maintenanceSchedule" array. ``` ### Gemini Response Schema (enforced via `responseSchema`) ```json { "type": "object", "properties": { "maintenanceSchedule": { "type": "array", "items": { "type": "object", "properties": { "serviceName": { "type": "string" }, "intervalMiles": { "type": "number", "nullable": true }, "intervalMonths": { "type": "number", "nullable": true }, "details": { "type": "string", "nullable": true } }, "required": ["serviceName"] } } }, "required": ["maintenanceSchedule"] } ``` ### Example Gemini Response ```json { "maintenanceSchedule": [ { "serviceName": "Engine Oil Change", "intervalMiles": 5000, "intervalMonths": 6, "details": "Use 0W-20 full synthetic oil. Replace oil filter at every oil change." }, { "serviceName": "Tire Rotation", "intervalMiles": 5000, "intervalMonths": 6, "details": "Rotate front to rear on same side." }, { "serviceName": "Cabin Air Filter Replacement", "intervalMiles": 15000, "intervalMonths": 12, "details": null }, { "serviceName": "Brake Fluid Replacement", "intervalMiles": null, "intervalMonths": 36, "details": "Use DOT 3 brake fluid." }, { "serviceName": "Spark Plug Replacement", "intervalMiles": 60000, "intervalMonths": null, "details": "Iridium spark plugs. Torque to 18 ft-lbs." } ] } ``` ### GCP Setup Instructions #### 1. Enable the Vertex AI API ```bash # Via gcloud CLI gcloud services enable aiplatform.googleapis.com --project=YOUR_PROJECT_ID # Or via GCP Console: # APIs & Services > Enable APIs and Services > Search "Vertex AI API" > Enable ``` #### 2. Service Account Permissions The existing service account (used for Google Vision in #127) needs one additional IAM role: | Role | Role ID | Purpose | |------|---------|---------| | **Vertex AI User** | `roles/aiplatform.user` | Required for `aiplatform.endpoints.predict` permission | ```bash # Grant the Vertex AI User role to the existing service account gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ --member="serviceAccount:YOUR_SERVICE_ACCOUNT@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ --role="roles/aiplatform.user" ``` If using Workload Identity Federation (WIF) from #127, the same federated identity gets the additional role -- no new service account needed. #### 3. SDK Dependency ```bash # Add to ocr/requirements.txt (Python OCR service) google-cloud-aiplatform>=1.40.0 # OR for the Node.js backend proxy npm install @google-cloud/vertexai ``` #### 4. Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `VERTEX_AI_PROJECT` | (required) | GCP project ID | | `VERTEX_AI_LOCATION` | `us-central1` | GCP region for Vertex AI | | `GEMINI_MODEL` | `gemini-2.5-flash` | Gemini model ID | #### 5. Authentication Uses the same credential path as Google Vision (#127): - **Development**: `GOOGLE_APPLICATION_CREDENTIALS` env var pointing to service account key JSON - **Production**: Workload Identity Federation (WIF) via Auth0 -- already configured in #127 ### Requirements - **Checkbox**: "Scan for Maintenance Schedule" on the document upload form (when document type is owners manual) - **Long-running task**: Owners manuals are 10-200MB, 100-300 pages. Gemini processes faster than traditional OCR but still takes 10-60+ seconds for large manuals. Must use the async OCR job flow (`POST /api/ocr/jobs` -> poll `GET /api/ocr/jobs/:jobId`) - **PDF delivery to Gemini**: For manuals under 20MB, use inline base64. For manuals over 20MB, upload to GCS first and pass the `gs://` URI - **Extracted Data Per Item**: - Service/maintenance item name (e.g., "Oil Change", "Tire Rotation") - Interval in miles/km (e.g., every 5,000 miles) - Interval in months (e.g., every 6 months) - Additional details/notes (e.g., "Use 0W-20 synthetic") - **User Review Flow**: After extraction completes, present all extracted maintenance items in a review screen. User selects which items to create as `maintenance_schedules`. User can edit any field before confirming. - **Creates `maintenance_schedules`** (recurring schedules with `interval_months` / `interval_miles`) -- NOT one-time records - **Tier Gating**: Already defined as Pro+ (`document.scanMaintenanceSchedule` in `FEATURE_TIERS`) - **Vehicle association**: The owners manual document must be associated with a vehicle so the created schedules link to the correct vehicle ### Technical Notes - Uses the async OCR job endpoint since manuals are large files - Gemini replaces the entire OCR preprocessing + pattern matching pipeline for manuals -- no PaddleOCR, no spaCy NER, no layout analysis needed - Frontend needs a progress indicator while the async job runs, and a notification when complete - Must map extracted `serviceName` values to existing maintenance categories/subtypes (`routine_maintenance` with appropriate subtypes from the 27 available) --- ## Shared Concerns ### Mobile + Desktop - Both features MUST work on mobile and desktop per project requirements - Fuel receipt scanning: Mobile is the primary use case (phone camera capture) - Manual scanning: Desktop may be more common (uploading PDF files), but mobile must work too ### Acceptance Criteria - [ ] Fuel receipt OCR scanning works on mobile (camera capture) and desktop (file upload) - [ ] Extracted receipt fields pre-fill fuel log form with editable values - [ ] Station name from receipt is matched via Google Places API and linked - [ ] Fuel receipt scan is gated to Pro+ tier - [ ] Owners manual scan checkbox appears on document upload for owners manuals - [ ] Gemini 2.5 Flash processes manual PDF and returns structured JSON maintenance schedule - [ ] Async job flow handles long-running Gemini extraction with progress feedback - [ ] Extracted maintenance items are presented for user review before creation - [ ] User can select/deselect and edit items before creating schedules - [ ] Created items are `maintenance_schedules` with correct intervals - [ ] Manual scan respects existing Pro+ tier gating - [ ] Both features respect the global 1,000 calls/month Google API limit - [ ] All fields are editable before saving (both features) - [ ] Works on both mobile and desktop viewports - [ ] Vertex AI API enabled and service account has `roles/aiplatform.user` role - [ ] GCP authentication works via WIF (production) and service account key (development)
egullickson added the
status
backlog
type
feature
labels 2026-02-11 02:24:34 +00:00
egullickson added
status
in-progress
and removed
status
backlog
labels 2026-02-11 02:45:50 +00:00
Author
Owner

Plan: Expand OCR with Fuel Receipt Scanning and Owners Manual Maintenance Extraction

Phase: Planning | Agent: Planner | Status: APPROVED (revised per review cycle)

Overview

Expand OCR functionality with two new capabilities: (1) Fuel receipt scanning that auto-extracts fields and pre-fills the fuel log form, and (2) Owners manual maintenance schedule extraction via Gemini 2.5 Flash that creates recurring maintenance schedules. Manual extraction uses Gemini 2.5 Flash for PDF processing and dedicated frontend components for schedule review and creation.

Planning Context

Decision Log

Decision Reasoning Chain
Gemini as standalone Python module, not extending OcrEngine ABC OcrEngine.recognize() accepts image bytes and returns text+confidence -> GeminiEngine.extract_maintenance() accepts PDF bytes and returns structured JSON -> different interface -> standalone module because interface signatures differ
Gemini in Python OCR service (not Node.js backend) Python service has async job queue with progress callbacks -> has PDF handling infrastructure -> has WIF authentication -> keeps all AI/OCR processing in one service -> avoids duplicating auth and job patterns
Google Places API for station matching (separate budget) Vision API 1000/month limit does NOT apply to Places API -> separate budget allows station matching without resource competition -> issue requires station matching -> use google-maps.client.ts in stations feature
Station matching via separate frontend call Frontend calls OCR, receives extractedFields, then calls POST /api/stations/match with merchantName -> two sequential calls -> better separation of concerns -> OCR service stays focused on extraction, backend handles station enrichment
Station matching in backend, not frontend Google Places API key stays server-side -> OCR service stays focused on text extraction -> backend has google-maps.client.ts -> keep API keys server-side
No monthly limit on Gemini calls Gemini is pay-per-use on Vertex AI -> no artificial cap needed -> Vision API limit (1000/month) applies only to VIN + receipt OCR via Google Vision -> counter stays as ocr:vision_requests
Receipt OCR falls back to PaddleOCR when Vision limit reached HybridEngine implements this fallback pattern -> receipts degrade gracefully to local OCR (lower accuracy but functional)
20MB raw bytes PDF limit at launch for Gemini Gemini inline base64 supports up to 20MB -> Vertex AI SDK handles base64 encoding internally -> validate raw bytes only -> GCS upload path for larger files adds significant complexity -> most manuals under 20MB -> clear error message for oversized files -> GCS as documented future enhancement
Backend proxy creates dedicated /api/ocr/extract/receipt endpoint Dedicated proxy allows receipt-specific middleware (tier gating, rate limiting, request logging) -> Python /extract/receipt has specialized receipt preprocessing, fuel pattern matching, and cross-validation -> generic /extract only auto-detects document type
useReceiptOcr calls /ocr/extract/receipt /ocr/extract/receipt provides receipt-specific preprocessing and fuel field extraction -> the generic /ocr/extract endpoint lacks receipt-specialized patterns
30s timeout for receipt OCR API call Receipt images are single photos (1-3s typical processing) -> 30s accommodates slow first-call model loading and cloud fallback -> matches useReceiptOcr timeout
New requireTier() middleware (M0) Backend tier enforcement prevents direct API bypass -> frontend-only tier check can be circumvented via curl -> both receipt and manual endpoints use the same middleware -> clean separation as reusable preHandler
PDF magic byte validation for manual uploads Content-type header can be spoofed -> first 4 bytes %PDF check prevents processing renamed non-PDF files -> minimal overhead, defense in depth
3-step progress for Gemini extraction Gemini makes single blocking API call (10-60s) -> no sub-progress possible -> honest 4-point updates (10%, 50%, 95%, 100%) rather than simulated progress bar
410 Gone for expired job polling HTTP 410 semantically correct for TTL-expired Redis jobs -> distinguishes from 404 "never existed" -> frontend shows clear "Job expired, please resubmit" message
Gemini response schema uses camelCase Matches backend API convention (camelCase in TypeScript) -> Python manual_extractor.py preserves camelCase from Gemini for API response -> avoids extra case conversion layer

Rejected Alternatives

Alternative Why Rejected
Extend OcrEngine ABC for Gemini OcrEngine takes image_bytes, returns text+confidence. Gemini takes PDF, returns structured JSON. Forcing Gemini into this interface would require awkward adaptation layer with no benefit.
Skip Google Places for station matching Issue requires station matching. Places API has separate budget from OCR. Skipping would miss a key requirement.
Gemini in Node.js backend Would duplicate async job queue, PDF handling, and WIF authentication in Python service. Backend is a proxy layer, not a processing layer.
Unified counter for Vision + Gemini Gemini is pay-per-use with no artificial cap. Only Vision API has 1000/month limit. Unifying would unnecessarily restrict manual scanning.
GCS upload for all PDFs Adds bucket provisioning, IAM, upload flow for a minority of cases. Most manuals under 20MB. Defer to future enhancement.
Frontend calls stations API for matching Would expose Places API key to frontend. Backend has google-maps.client.ts. Keep API keys server-side.
Backend merges OCR + Places into one response Couples OCR and station logic in backend proxy. Separate frontend call is cleaner separation of concerns.
Frontend-only tier gating Bypassable via direct API call. Backend enforcement required for security.
Simulated progress bar during Gemini call Artificial progress is dishonest. 3-step updates are simple and accurate.
15MB safety margin on PDF size Vertex AI SDK handles encoding internally. 20MB raw bytes matches API docs. Conservative limit rejects valid files unnecessarily.

Constraints and Assumptions

  • Technical: 1000/month Google Vision API limit (VIN + receipts only), 20MB Gemini raw bytes limit, WIF authentication via Auth0 M2M
  • Technical: Vertex AI API must be enabled in GCP project, service account needs roles/aiplatform.user
  • Technical: Python OCR service at mvp-ocr:8000, backend proxy at /api/ocr/*
  • Architecture: Feature capsule pattern for backend, React Hook Form + Zod for frontend forms
  • Frontend: Mobile + desktop required (320px, 768px, 1920px viewports), touch targets >= 44px
  • Dependencies: google-cloud-aiplatform Python SDK, Gemini 2.5 Flash on Vertex AI (us-central1)
  • Workflow: Issue #129 requires sub-issue decomposition (9 milestones, multiple 3+ file milestones). Each milestone maps 1:1 to a sub-issue per workflow-contract.json. ONE branch issue-129-expand-ocr, ONE PR closing parent #129 and all sub-issues.
  • Architectural note: If OcrClient/OcrService/OcrController exceed 10 methods, consider splitting into specialized classes (VinOcr, ReceiptOcr, ManualOcr).

Known Risks

Risk Mitigation Anchor
Receipt OCR accuracy varies by receipt format receipt_extractor has cross-validation (total = qty * price within 10% tolerance) and confidence scoring. User can edit all fields before saving. ocr/app/extractors/fuel_receipt.py:L108-L123
Gemini structured output may not perfectly match maintenance categories Map serviceName to 27 subtypes via fuzzy matching. User reviews and edits all items before creating schedules. Issue #129 specifies user review flow
PDFs over 20MB rejected at launch Clear error message with file size limit. GCS upload path documented as future enhancement. N/A
WIF authentication may not work with Vertex AI SDK google-cloud-aiplatform uses ADC which supports external_account (WIF) type. Same credential path as Vision API. GeminiEngine._get_client() wraps initialization in try/except with diagnostic error. ocr/app/engines/cloud_engine.py (WIF setup)
Redis job data TTL (2h) may be insufficient for very large manuals MANUAL_JOB_TTL is 7200s (2 hours). Gemini processes manuals in 10-60s, well within limits. Expired jobs return 410 Gone. ocr/app/services/job_queue.py:L22

Invisible Knowledge

Architecture

FUEL RECEIPT OCR FLOW:
  Mobile Camera / File Upload
      |
      v
  Frontend (useReceiptOcr) --POST /api/ocr/extract/receipt--> Backend Proxy
      |                                                            |
      v                                                            v
  ReceiptOcrReviewModal                              OcrClient.extractReceipt()
      |                                                            |
      v                                                            v
  Frontend calls POST /api/stations/match        Python /extract/receipt
  with extractedFields.merchantName                    |
      |                                                 v
      v                                          ReceiptExtractor.extract()
  Pre-fill locationData                                |
  with matched station                                 v
      |                                          HybridEngine (Vision/PaddleOCR)
      v                                                |
  Accept -> FuelLogForm.setValue()                     v
                                           Pattern matching (fuel, date, currency)
                                                      |
                                                      v
                                           ReceiptExtractionResponse

MANUAL EXTRACTION FLOW:
  DocumentForm (upload PDF + check "Scan for Maintenance Schedule")
      |
      v
  Frontend (useManualExtraction) --POST /api/ocr/extract/manual--> Backend Proxy
      |                                                                |
      v                                                                v
  Poll GET /api/ocr/jobs/:jobId                          OcrClient.submitManualJob()
  (progress: 10% -> 50% -> 95% -> 100%)                             |
      |                                                               v
      v                                                    Python /extract/manual
  Job completed (or 410 Gone if expired)                       |
      |                                                         v
      v                                                  GeminiEngine.extract_maintenance()
  MaintenanceScheduleReviewScreen                              |
  (select/edit/deselect items)                                  v
      |                                                  Vertex AI Gemini 2.5 Flash
      v                                                  (native PDF, structured JSON)
  POST /api/maintenance/schedules                              |
  (batch create selected)                                      v
                                                     ManualExtractionResponse
                                                     (maintenanceSchedules[])

Data Flow

RECEIPT: Photo -> Backend /extract/receipt -> Python receipt_extractor -> Vision/PaddleOCR
         -> pattern matching -> extractedFields -> Frontend review modal
         -> Frontend calls /stations/match -> Places API station match
         -> User edits -> Form population -> Create fuel log

MANUAL:  PDF -> Backend /extract/manual -> Python job queue -> Gemini 2.5 Flash
         -> structured JSON -> maintenanceSchedules[] -> Frontend poll -> Review screen
         -> User select/edit -> Batch create maintenance_schedules

Why This Structure

  • GeminiEngine is a standalone module: OcrEngine handles image-to-text extraction; GeminiEngine handles PDF-to-structured-data extraction. Different input types, output formats, and error modes warrant separate abstractions.
  • Station matching in backend: Google Places API key stays server-side. OCR service stays focused on text extraction. Backend has google-maps.client.ts.
  • Async job pattern for manuals: Manuals are 10-200MB, Gemini takes 10-60+ seconds. Async pattern with progress polling provides good UX without blocking.

Invariants

  • All receipt OCR extracted fields MUST be editable before saving to fuel log
  • All manual extraction items MUST be reviewed by user before creating schedules
  • Monthly Vision API counter only counts google_vision engine calls, never Gemini
  • Gemini module uses same WIF credential path as Vision API
  • Frontend works on both mobile (camera capture) and desktop (file upload)
  • Both receipt and manual endpoints enforce tier gating at backend level (requireTier middleware)

Tradeoffs

  • 20MB PDF limit: Sacrifices support for very large manuals (>20MB) to avoid GCS bucket complexity. Most manuals are under 20MB.
  • No Gemini monthly cap: Vertex AI is pay-per-use. Cost is ~$0.001-0.002 per page. A 300-page manual costs ~$0.30-0.60. Acceptable for Pro+ tier.
  • Station matching adds latency: Google Places lookup adds ~200-500ms to receipt processing. Acceptable for the value of auto-linking a real station.
  • 3-step progress: No sub-progress during Gemini API call. Honest about the blocking wait vs. simulated progress bar.

Sub-Issues

Sub-Issue Milestone Title
#138 M0 feat: Tier guard middleware (#129)
#139 M1 feat: Backend OCR receipt proxy endpoint (#129)
#140 M2 feat: Tier gating and frontend receipt OCR flow (#129)
#141 M3 feat: Station matching from receipt (#129)
#142 M4 feat: Gemini engine module and configuration (#129)
#143 M5 feat: Manual extractor Gemini rewrite (#129)
#144 M6 feat: Backend OCR manual proxy endpoint (#129)
#145 M7 feat: Frontend manual extraction flow (#129)
#146 M8 docs: Documentation for OCR expansion (#129)

Milestone Dependencies

M0 (Tier Guard) ---> M1 (Backend Receipt) ---> M2 (Tier Gating + Frontend Receipt)
                                           \
                                            --> M3 (Station Matching)

M4 (Gemini Engine) ---> M5 (Manual Extractor)
                                |
                                v
              M0 (Tier Guard) ---> M6 (Backend Manual Proxy) ---> M7 (Frontend Manual Flow)

M0..M7 ---> M8 (Documentation)

Feature 1 (#139-#141) and Feature 2 (#142-#145) can proceed in parallel after M0 completes.

Verdict: APPROVED | Next: Create branch issue-129-expand-ocr, begin execution at M0

## Plan: Expand OCR with Fuel Receipt Scanning and Owners Manual Maintenance Extraction **Phase**: Planning | **Agent**: Planner | **Status**: APPROVED (revised per review cycle) ## Overview Expand OCR functionality with two new capabilities: (1) Fuel receipt scanning that auto-extracts fields and pre-fills the fuel log form, and (2) Owners manual maintenance schedule extraction via Gemini 2.5 Flash that creates recurring maintenance schedules. Manual extraction uses Gemini 2.5 Flash for PDF processing and dedicated frontend components for schedule review and creation. ## Planning Context ### Decision Log | Decision | Reasoning Chain | | --- | --- | | Gemini as standalone Python module, not extending OcrEngine ABC | OcrEngine.recognize() accepts image bytes and returns text+confidence -> GeminiEngine.extract_maintenance() accepts PDF bytes and returns structured JSON -> different interface -> standalone module because interface signatures differ | | Gemini in Python OCR service (not Node.js backend) | Python service has async job queue with progress callbacks -> has PDF handling infrastructure -> has WIF authentication -> keeps all AI/OCR processing in one service -> avoids duplicating auth and job patterns | | Google Places API for station matching (separate budget) | Vision API 1000/month limit does NOT apply to Places API -> separate budget allows station matching without resource competition -> issue requires station matching -> use google-maps.client.ts in stations feature | | Station matching via separate frontend call | Frontend calls OCR, receives extractedFields, then calls POST /api/stations/match with merchantName -> two sequential calls -> better separation of concerns -> OCR service stays focused on extraction, backend handles station enrichment | | Station matching in backend, not frontend | Google Places API key stays server-side -> OCR service stays focused on text extraction -> backend has google-maps.client.ts -> keep API keys server-side | | No monthly limit on Gemini calls | Gemini is pay-per-use on Vertex AI -> no artificial cap needed -> Vision API limit (1000/month) applies only to VIN + receipt OCR via Google Vision -> counter stays as ocr:vision_requests | | Receipt OCR falls back to PaddleOCR when Vision limit reached | HybridEngine implements this fallback pattern -> receipts degrade gracefully to local OCR (lower accuracy but functional) | | 20MB raw bytes PDF limit at launch for Gemini | Gemini inline base64 supports up to 20MB -> Vertex AI SDK handles base64 encoding internally -> validate raw bytes only -> GCS upload path for larger files adds significant complexity -> most manuals under 20MB -> clear error message for oversized files -> GCS as documented future enhancement | | Backend proxy creates dedicated /api/ocr/extract/receipt endpoint | Dedicated proxy allows receipt-specific middleware (tier gating, rate limiting, request logging) -> Python /extract/receipt has specialized receipt preprocessing, fuel pattern matching, and cross-validation -> generic /extract only auto-detects document type | | useReceiptOcr calls /ocr/extract/receipt | /ocr/extract/receipt provides receipt-specific preprocessing and fuel field extraction -> the generic /ocr/extract endpoint lacks receipt-specialized patterns | | 30s timeout for receipt OCR API call | Receipt images are single photos (1-3s typical processing) -> 30s accommodates slow first-call model loading and cloud fallback -> matches useReceiptOcr timeout | | New requireTier() middleware (M0) | Backend tier enforcement prevents direct API bypass -> frontend-only tier check can be circumvented via curl -> both receipt and manual endpoints use the same middleware -> clean separation as reusable preHandler | | PDF magic byte validation for manual uploads | Content-type header can be spoofed -> first 4 bytes %PDF check prevents processing renamed non-PDF files -> minimal overhead, defense in depth | | 3-step progress for Gemini extraction | Gemini makes single blocking API call (10-60s) -> no sub-progress possible -> honest 4-point updates (10%, 50%, 95%, 100%) rather than simulated progress bar | | 410 Gone for expired job polling | HTTP 410 semantically correct for TTL-expired Redis jobs -> distinguishes from 404 "never existed" -> frontend shows clear "Job expired, please resubmit" message | | Gemini response schema uses camelCase | Matches backend API convention (camelCase in TypeScript) -> Python manual_extractor.py preserves camelCase from Gemini for API response -> avoids extra case conversion layer | ### Rejected Alternatives | Alternative | Why Rejected | | --- | --- | | Extend OcrEngine ABC for Gemini | OcrEngine takes image_bytes, returns text+confidence. Gemini takes PDF, returns structured JSON. Forcing Gemini into this interface would require awkward adaptation layer with no benefit. | | Skip Google Places for station matching | Issue requires station matching. Places API has separate budget from OCR. Skipping would miss a key requirement. | | Gemini in Node.js backend | Would duplicate async job queue, PDF handling, and WIF authentication in Python service. Backend is a proxy layer, not a processing layer. | | Unified counter for Vision + Gemini | Gemini is pay-per-use with no artificial cap. Only Vision API has 1000/month limit. Unifying would unnecessarily restrict manual scanning. | | GCS upload for all PDFs | Adds bucket provisioning, IAM, upload flow for a minority of cases. Most manuals under 20MB. Defer to future enhancement. | | Frontend calls stations API for matching | Would expose Places API key to frontend. Backend has google-maps.client.ts. Keep API keys server-side. | | Backend merges OCR + Places into one response | Couples OCR and station logic in backend proxy. Separate frontend call is cleaner separation of concerns. | | Frontend-only tier gating | Bypassable via direct API call. Backend enforcement required for security. | | Simulated progress bar during Gemini call | Artificial progress is dishonest. 3-step updates are simple and accurate. | | 15MB safety margin on PDF size | Vertex AI SDK handles encoding internally. 20MB raw bytes matches API docs. Conservative limit rejects valid files unnecessarily. | ### Constraints and Assumptions - **Technical**: 1000/month Google Vision API limit (VIN + receipts only), 20MB Gemini raw bytes limit, WIF authentication via Auth0 M2M - **Technical**: Vertex AI API must be enabled in GCP project, service account needs `roles/aiplatform.user` - **Technical**: Python OCR service at mvp-ocr:8000, backend proxy at /api/ocr/* - **Architecture**: Feature capsule pattern for backend, React Hook Form + Zod for frontend forms - **Frontend**: Mobile + desktop required (320px, 768px, 1920px viewports), touch targets >= 44px - **Dependencies**: google-cloud-aiplatform Python SDK, Gemini 2.5 Flash on Vertex AI (us-central1) - **Workflow**: Issue #129 requires sub-issue decomposition (9 milestones, multiple 3+ file milestones). Each milestone maps 1:1 to a sub-issue per workflow-contract.json. ONE branch issue-129-expand-ocr, ONE PR closing parent #129 and all sub-issues. - **Architectural note**: If OcrClient/OcrService/OcrController exceed 10 methods, consider splitting into specialized classes (VinOcr, ReceiptOcr, ManualOcr). ### Known Risks | Risk | Mitigation | Anchor | | --- | --- | --- | | Receipt OCR accuracy varies by receipt format | receipt_extractor has cross-validation (total = qty * price within 10% tolerance) and confidence scoring. User can edit all fields before saving. | ocr/app/extractors/fuel_receipt.py:L108-L123 | | Gemini structured output may not perfectly match maintenance categories | Map serviceName to 27 subtypes via fuzzy matching. User reviews and edits all items before creating schedules. | Issue #129 specifies user review flow | | PDFs over 20MB rejected at launch | Clear error message with file size limit. GCS upload path documented as future enhancement. | N/A | | WIF authentication may not work with Vertex AI SDK | google-cloud-aiplatform uses ADC which supports external_account (WIF) type. Same credential path as Vision API. GeminiEngine._get_client() wraps initialization in try/except with diagnostic error. | ocr/app/engines/cloud_engine.py (WIF setup) | | Redis job data TTL (2h) may be insufficient for very large manuals | MANUAL_JOB_TTL is 7200s (2 hours). Gemini processes manuals in 10-60s, well within limits. Expired jobs return 410 Gone. | ocr/app/services/job_queue.py:L22 | ## Invisible Knowledge ### Architecture ``` FUEL RECEIPT OCR FLOW: Mobile Camera / File Upload | v Frontend (useReceiptOcr) --POST /api/ocr/extract/receipt--> Backend Proxy | | v v ReceiptOcrReviewModal OcrClient.extractReceipt() | | v v Frontend calls POST /api/stations/match Python /extract/receipt with extractedFields.merchantName | | v v ReceiptExtractor.extract() Pre-fill locationData | with matched station v | HybridEngine (Vision/PaddleOCR) v | Accept -> FuelLogForm.setValue() v Pattern matching (fuel, date, currency) | v ReceiptExtractionResponse MANUAL EXTRACTION FLOW: DocumentForm (upload PDF + check "Scan for Maintenance Schedule") | v Frontend (useManualExtraction) --POST /api/ocr/extract/manual--> Backend Proxy | | v v Poll GET /api/ocr/jobs/:jobId OcrClient.submitManualJob() (progress: 10% -> 50% -> 95% -> 100%) | | v v Python /extract/manual Job completed (or 410 Gone if expired) | | v v GeminiEngine.extract_maintenance() MaintenanceScheduleReviewScreen | (select/edit/deselect items) v | Vertex AI Gemini 2.5 Flash v (native PDF, structured JSON) POST /api/maintenance/schedules | (batch create selected) v ManualExtractionResponse (maintenanceSchedules[]) ``` ### Data Flow ``` RECEIPT: Photo -> Backend /extract/receipt -> Python receipt_extractor -> Vision/PaddleOCR -> pattern matching -> extractedFields -> Frontend review modal -> Frontend calls /stations/match -> Places API station match -> User edits -> Form population -> Create fuel log MANUAL: PDF -> Backend /extract/manual -> Python job queue -> Gemini 2.5 Flash -> structured JSON -> maintenanceSchedules[] -> Frontend poll -> Review screen -> User select/edit -> Batch create maintenance_schedules ``` ### Why This Structure - **GeminiEngine is a standalone module**: OcrEngine handles image-to-text extraction; GeminiEngine handles PDF-to-structured-data extraction. Different input types, output formats, and error modes warrant separate abstractions. - **Station matching in backend**: Google Places API key stays server-side. OCR service stays focused on text extraction. Backend has google-maps.client.ts. - **Async job pattern for manuals**: Manuals are 10-200MB, Gemini takes 10-60+ seconds. Async pattern with progress polling provides good UX without blocking. ### Invariants - All receipt OCR extracted fields MUST be editable before saving to fuel log - All manual extraction items MUST be reviewed by user before creating schedules - Monthly Vision API counter only counts google_vision engine calls, never Gemini - Gemini module uses same WIF credential path as Vision API - Frontend works on both mobile (camera capture) and desktop (file upload) - Both receipt and manual endpoints enforce tier gating at backend level (requireTier middleware) ### Tradeoffs - **20MB PDF limit**: Sacrifices support for very large manuals (>20MB) to avoid GCS bucket complexity. Most manuals are under 20MB. - **No Gemini monthly cap**: Vertex AI is pay-per-use. Cost is ~$0.001-0.002 per page. A 300-page manual costs ~$0.30-0.60. Acceptable for Pro+ tier. - **Station matching adds latency**: Google Places lookup adds ~200-500ms to receipt processing. Acceptable for the value of auto-linking a real station. - **3-step progress**: No sub-progress during Gemini API call. Honest about the blocking wait vs. simulated progress bar. ## Sub-Issues | Sub-Issue | Milestone | Title | | --- | --- | --- | | #138 | M0 | feat: Tier guard middleware (#129) | | #139 | M1 | feat: Backend OCR receipt proxy endpoint (#129) | | #140 | M2 | feat: Tier gating and frontend receipt OCR flow (#129) | | #141 | M3 | feat: Station matching from receipt (#129) | | #142 | M4 | feat: Gemini engine module and configuration (#129) | | #143 | M5 | feat: Manual extractor Gemini rewrite (#129) | | #144 | M6 | feat: Backend OCR manual proxy endpoint (#129) | | #145 | M7 | feat: Frontend manual extraction flow (#129) | | #146 | M8 | docs: Documentation for OCR expansion (#129) | ## Milestone Dependencies ``` M0 (Tier Guard) ---> M1 (Backend Receipt) ---> M2 (Tier Gating + Frontend Receipt) \ --> M3 (Station Matching) M4 (Gemini Engine) ---> M5 (Manual Extractor) | v M0 (Tier Guard) ---> M6 (Backend Manual Proxy) ---> M7 (Frontend Manual Flow) M0..M7 ---> M8 (Documentation) ``` Feature 1 (#139-#141) and Feature 2 (#142-#145) can proceed in parallel after M0 completes. *Verdict*: APPROVED | *Next*: Create branch issue-129-expand-ocr, begin execution at M0
Author
Owner

QR Review: Plan Completeness (#129)

Phase: Plan-Review | Agent: Quality Reviewer | Status: PASS_WITH_CONCERNS

VERDICT: PASS_WITH_CONCERNS

The plan is structurally complete with comprehensive Decision Log, Rejected Alternatives, Constraints, Risks, Invisible Knowledge, and 8 well-defined milestones. However, several concerns require attention before execution begins.

Findings

[DECISION_LOG] [SHOULD_FIX]: Missing decision about useReceiptOcr endpoint call location

  • Location: Decision Log
  • Issue: Plan states "Frontend useReceiptOcr calls /ocr/extract/receipt" and "Current code calls generic /ocr/extract at line 140". Verified frontend/src/features/fuel-logs/hooks/useReceiptOcr.ts:140 calls /ocr/extract, not /ocr/extract/receipt. However, the Decision Log doesn't explain WHY the current code is wrong or WHY switching endpoints is an architectural decision vs. implementation detail.
  • Failure Mode: Implementation detail masquerading as a decision. If Python /extract/receipt endpoint already exists (verified in ocr/app/routers/extract.py:176), then calling it is just fixing a bug.
  • Suggested Fix: Either (1) remove from Decision Log and treat as implementation detail in M1 acceptance criteria, OR (2) expand reasoning to explain why frontend was NOT calling the specialized endpoint initially.

[DECISION_LOG] [SHOULD_FIX]: "Backend receipt endpoint" decision is misleading

  • Location: Decision Log entry "Backend receipt endpoint calls Python /extract/receipt (not generic /extract)"
  • Issue: Python endpoint /extract/receipt ALREADY EXISTS (verified at ocr/app/routers/extract.py:176-267). Decision implies backend is creating new endpoint, but M1 creates the PROXY endpoint in Node.js backend.
  • Failure Mode: Misleading decision could cause implementation to focus on wrong layer.
  • Suggested Fix: Reword as "Backend proxy creates dedicated /api/ocr/extract/receipt endpoint (not reusing generic /api/ocr/extract)" with reasoning: dedicated proxy allows future middleware specific to receipt scanning.

[CONSTRAINTS] [CRITICAL]: Missing sub-issue creation requirement

  • Location: Constraints section
  • Issue: Plan has 8 milestones and workflow-contract.json specifies multi-file features (3+ files) should be decomposed into sub-issues. M1 has 5 files, M2 has 4 files, M4 has 5 files, M6 has 5 files, M7 has 4 files. This clearly qualifies for sub-issue decomposition, but Constraints don't mention this.
  • Failure Mode: Execution starts without sub-issues, violating workflow contract. Plan review cycle should catch this BEFORE execution begins.
  • Suggested Fix: Add to Constraints: "Workflow: Issue #129 requires sub-issue decomposition (8 milestones, multiple 3+ file milestones). Each milestone maps 1:1 to a sub-issue per workflow-contract.json. ONE branch issue-129-{slug}, ONE PR closing parent #129 and all sub-issues."

[MILESTONES] [SHOULD_FIX]: M1 test backing should be "integration" not "default-derived"

  • Location: M1 -> Tests -> Backing: "default-derived"
  • Issue: M1 creates a proxy endpoint that calls Python OCR service via HTTP. Test backing says "default-derived" but service-to-service communication warrants integration tests. The OcrClient HTTP call to Python service is service-to-service.
  • Failure Mode: Unit tests with mocked OcrClient won't catch actual integration failures.
  • Suggested Fix: Change M1 test backing to "integration" and update test type to "integration (real OcrClient -> mocked Python HTTP endpoint)".

[MILESTONES] [SHOULD_FIX]: M2 and M7 missing viewport test specifications

  • Location: M2 and M7 "Flags: needs conformance check (mobile + desktop)"
  • Issue: Both milestones have mobile+desktop requirements but test specifications don't include viewport tests. M2 only tests tier gating logic. M7 only tests component rendering.
  • Failure Mode: Features may pass tests but fail mobile conformance check.
  • Suggested Fix: Add test scenarios:
    • M2: "Normal: Receipt button renders correctly on mobile (320px) and desktop (1920px) viewports"
    • M7: "Normal: Review screen adapts to mobile (full-screen) and desktop (modal) layouts"

[MILESTONES] [NEEDS_CLARIFICATION]: M4 Gemini WIF authentication config incomplete

  • Location: M4 -> Requirements -> "Authentication via same WIF credential path"
  • Issue: Plan says Gemini uses "same credential path" but doesn't specify WHERE in config. ocr/app/config.py shows google_vision_key_path but no Gemini-specific settings.
  • Failure Mode: Implementation guesses at config approach.
  • Suggested Fix: Add to M4 requirements: "Gemini reads GOOGLE_APPLICATION_CREDENTIALS env var (same as Vision API). Vertex AI SDK ADC supports external_account (WIF) credential type - no code changes needed, just environment setup."

[MILESTONES] [SHOULD_FIX]: M5 "Remove unused imports" is implementation detail

  • Location: M5 -> Requirements
  • Issue: Linting/cleanup task, not a functional requirement. Linters catch unused imports automatically.
  • Suggested Fix: Remove from requirements.

[MILESTONES] [NEEDS_CLARIFICATION]: M6 tier guard implementation unclear

  • Location: M6 -> Requirements -> "tier guard for document.scanMaintenanceSchedule"
  • Issue: Plan doesn't specify HOW tier guard is applied. backend/src/features/ocr/api/ocr.routes.ts has NO tier guards on current routes.
  • Suggested Fix: Add: "Apply tier guard via route preHandler middleware: preHandler: [requireAuth, requireTier('document.scanMaintenanceSchedule')]."

[INVISIBLE_KNOWLEDGE] [SHOULD_FIX]: Station matching flow ambiguous

  • Location: Architecture -> FUEL RECEIPT OCR FLOW
  • Issue: Diagram shows "Backend: Match station via Google Places API" but doesn't clarify if matching is: (A) Backend proxy merges OCR + Places into one response, or (B) Frontend calls OCR then separately calls station matching.
  • Suggested Fix: Add to M3: "Station matching is a separate frontend call after OCR: (1) Frontend receives extractedFields.merchantName, (2) Frontend calls POST /api/stations/match with merchantName, (3) Backend calls Google Places Text Search, (4) Frontend pre-fills locationData."

Considered But Not Flagged

  • Gemini 20MB PDF limit: Correctly identified as tradeoff with GCS as future enhancement. Acceptable for MVP.
  • No monthly limit on Gemini: Reasoning sound - Vertex AI pay-per-use, Vision limit doesn't apply.
  • Receipt OCR PaddleOCR fallback: Verified ocr/app/engines/hybrid_engine.py implements this pattern. Correct.
  • Google Places API separate budget: Places quota independent of Vision quota. No issue.
  • Files listed per milestone: Spot-checked M1, M4, M6 - file paths accurate and complete.
  • Acceptance criteria coverage: Cross-referenced issue #129 ACs against milestone ACs. All covered.
  • Test specifications: All milestones except M8 (docs) have test specs with types, files, scenarios.
  • Dependency graph: M1->M2,M3 | M4->M5->M6->M7 | M1..M7->M8. Correct sequencing.

Verdict: PASS_WITH_CONCERNS | Next: TW plan-scrub

## QR Review: Plan Completeness (#129) **Phase**: Plan-Review | **Agent**: Quality Reviewer | **Status**: PASS_WITH_CONCERNS ## VERDICT: PASS_WITH_CONCERNS The plan is structurally complete with comprehensive Decision Log, Rejected Alternatives, Constraints, Risks, Invisible Knowledge, and 8 well-defined milestones. However, several concerns require attention before execution begins. ## Findings ### [DECISION_LOG] [SHOULD_FIX]: Missing decision about useReceiptOcr endpoint call location - **Location**: Decision Log - **Issue**: Plan states "Frontend useReceiptOcr calls /ocr/extract/receipt" and "Current code calls generic /ocr/extract at line 140". Verified `frontend/src/features/fuel-logs/hooks/useReceiptOcr.ts:140` calls `/ocr/extract`, not `/ocr/extract/receipt`. However, the Decision Log doesn't explain WHY the current code is wrong or WHY switching endpoints is an architectural decision vs. implementation detail. - **Failure Mode**: Implementation detail masquerading as a decision. If Python `/extract/receipt` endpoint already exists (verified in `ocr/app/routers/extract.py:176`), then calling it is just fixing a bug. - **Suggested Fix**: Either (1) remove from Decision Log and treat as implementation detail in M1 acceptance criteria, OR (2) expand reasoning to explain why frontend was NOT calling the specialized endpoint initially. ### [DECISION_LOG] [SHOULD_FIX]: "Backend receipt endpoint" decision is misleading - **Location**: Decision Log entry "Backend receipt endpoint calls Python /extract/receipt (not generic /extract)" - **Issue**: Python endpoint `/extract/receipt` ALREADY EXISTS (verified at `ocr/app/routers/extract.py:176-267`). Decision implies backend is creating new endpoint, but M1 creates the PROXY endpoint in Node.js backend. - **Failure Mode**: Misleading decision could cause implementation to focus on wrong layer. - **Suggested Fix**: Reword as "Backend proxy creates dedicated /api/ocr/extract/receipt endpoint (not reusing generic /api/ocr/extract)" with reasoning: dedicated proxy allows future middleware specific to receipt scanning. ### [CONSTRAINTS] [CRITICAL]: Missing sub-issue creation requirement - **Location**: Constraints section - **Issue**: Plan has 8 milestones and workflow-contract.json specifies multi-file features (3+ files) should be decomposed into sub-issues. M1 has 5 files, M2 has 4 files, M4 has 5 files, M6 has 5 files, M7 has 4 files. This clearly qualifies for sub-issue decomposition, but Constraints don't mention this. - **Failure Mode**: Execution starts without sub-issues, violating workflow contract. Plan review cycle should catch this BEFORE execution begins. - **Suggested Fix**: Add to Constraints: "Workflow: Issue #129 requires sub-issue decomposition (8 milestones, multiple 3+ file milestones). Each milestone maps 1:1 to a sub-issue per workflow-contract.json. ONE branch issue-129-{slug}, ONE PR closing parent #129 and all sub-issues." ### [MILESTONES] [SHOULD_FIX]: M1 test backing should be "integration" not "default-derived" - **Location**: M1 -> Tests -> Backing: "default-derived" - **Issue**: M1 creates a proxy endpoint that calls Python OCR service via HTTP. Test backing says "default-derived" but service-to-service communication warrants integration tests. The OcrClient HTTP call to Python service is service-to-service. - **Failure Mode**: Unit tests with mocked OcrClient won't catch actual integration failures. - **Suggested Fix**: Change M1 test backing to "integration" and update test type to "integration (real OcrClient -> mocked Python HTTP endpoint)". ### [MILESTONES] [SHOULD_FIX]: M2 and M7 missing viewport test specifications - **Location**: M2 and M7 "Flags: needs conformance check (mobile + desktop)" - **Issue**: Both milestones have mobile+desktop requirements but test specifications don't include viewport tests. M2 only tests tier gating logic. M7 only tests component rendering. - **Failure Mode**: Features may pass tests but fail mobile conformance check. - **Suggested Fix**: Add test scenarios: - M2: "Normal: Receipt button renders correctly on mobile (320px) and desktop (1920px) viewports" - M7: "Normal: Review screen adapts to mobile (full-screen) and desktop (modal) layouts" ### [MILESTONES] [NEEDS_CLARIFICATION]: M4 Gemini WIF authentication config incomplete - **Location**: M4 -> Requirements -> "Authentication via same WIF credential path" - **Issue**: Plan says Gemini uses "same credential path" but doesn't specify WHERE in config. `ocr/app/config.py` shows `google_vision_key_path` but no Gemini-specific settings. - **Failure Mode**: Implementation guesses at config approach. - **Suggested Fix**: Add to M4 requirements: "Gemini reads `GOOGLE_APPLICATION_CREDENTIALS` env var (same as Vision API). Vertex AI SDK ADC supports external_account (WIF) credential type - no code changes needed, just environment setup." ### [MILESTONES] [SHOULD_FIX]: M5 "Remove unused imports" is implementation detail - **Location**: M5 -> Requirements - **Issue**: Linting/cleanup task, not a functional requirement. Linters catch unused imports automatically. - **Suggested Fix**: Remove from requirements. ### [MILESTONES] [NEEDS_CLARIFICATION]: M6 tier guard implementation unclear - **Location**: M6 -> Requirements -> "tier guard for document.scanMaintenanceSchedule" - **Issue**: Plan doesn't specify HOW tier guard is applied. `backend/src/features/ocr/api/ocr.routes.ts` has NO tier guards on current routes. - **Suggested Fix**: Add: "Apply tier guard via route preHandler middleware: `preHandler: [requireAuth, requireTier('document.scanMaintenanceSchedule')]`." ### [INVISIBLE_KNOWLEDGE] [SHOULD_FIX]: Station matching flow ambiguous - **Location**: Architecture -> FUEL RECEIPT OCR FLOW - **Issue**: Diagram shows "Backend: Match station via Google Places API" but doesn't clarify if matching is: (A) Backend proxy merges OCR + Places into one response, or (B) Frontend calls OCR then separately calls station matching. - **Suggested Fix**: Add to M3: "Station matching is a separate frontend call after OCR: (1) Frontend receives extractedFields.merchantName, (2) Frontend calls POST /api/stations/match with merchantName, (3) Backend calls Google Places Text Search, (4) Frontend pre-fills locationData." ## Considered But Not Flagged - **Gemini 20MB PDF limit**: Correctly identified as tradeoff with GCS as future enhancement. Acceptable for MVP. - **No monthly limit on Gemini**: Reasoning sound - Vertex AI pay-per-use, Vision limit doesn't apply. - **Receipt OCR PaddleOCR fallback**: Verified `ocr/app/engines/hybrid_engine.py` implements this pattern. Correct. - **Google Places API separate budget**: Places quota independent of Vision quota. No issue. - **Files listed per milestone**: Spot-checked M1, M4, M6 - file paths accurate and complete. - **Acceptance criteria coverage**: Cross-referenced issue #129 ACs against milestone ACs. All covered. - **Test specifications**: All milestones except M8 (docs) have test specs with types, files, scenarios. - **Dependency graph**: M1->M2,M3 | M4->M5->M6->M7 | M1..M7->M8. Correct sequencing. *Verdict*: PASS_WITH_CONCERNS | *Next*: TW plan-scrub
Author
Owner

TW Review: Plan Scrub (#129)

Phase: Plan-Review | Agent: Technical Writer | Status: NEEDS_CHANGES

VERDICT: NEEDS_CHANGES

Findings

[TEMPORAL] Overview: Progress estimate and missing-component language

  • Text: "Receipt OCR is ~70-80% complete (Python extractors and frontend components exist but backend proxy endpoint is missing)"
  • Problem: Describes project progress relative to a changing baseline (detection question #2: baseline reference). "Missing" is a planning artifact (detection question #4). A future reader has no anchor for what "70-80% complete" means.
  • Fix: Delete this sentence. The milestones define what each milestone produces; the reader does not need a progress estimate in the plan overview.

[TEMPORAL] Overview: "Rewrite" language for manual extraction

  • Text: "Manual extraction requires a Gemini-based rewrite of the traditional OCR pipeline and entirely new frontend components for review and schedule creation."
  • Problem: "rewrite" and "entirely new" are change-relative (detection question #1). Describes action to be taken on something that exists rather than the target state.
  • Fix: "Manual extraction uses Gemini 2.5 Flash for PDF processing and dedicated frontend components for schedule review and creation."

[FORBIDDEN] Decision Log (Gemini standalone): Editorial language

  • Text: "standalone module avoids polluting engine abstraction"
  • Problem: "polluting" is editorial/intent leakage (detection question #5). Describes what the author wants to avoid rather than what the code does.
  • Fix: "standalone module because interface signatures differ: OcrEngine.recognize() accepts image bytes and returns text+confidence; GeminiEngine.extract_maintenance() accepts PDF bytes and returns structured JSON"

[TEMPORAL] Decision Log (useReceiptOcr endpoint): Change-relative description

  • Text: "Current code calls generic /ocr/extract at line 140 -> misses receipt-specific preprocessing and fuel field extraction -> switching to dedicated endpoint improves extraction accuracy"
  • Problem: "Current code" + "switching to" is change-relative (detection questions #1 and #2). Describes an action being taken and compares against a baseline.
  • Fix: "useReceiptOcr calls /ocr/extract/receipt, which provides receipt-specific preprocessing and fuel field extraction. The generic /ocr/extract endpoint lacks receipt-specialized patterns."

[TEMPORAL] Decision Log (HybridEngine fallback): Baseline reference

  • Text: "HybridEngine already implements this fallback pattern"
  • Problem: "already implements" implies surprise or recency (detection question #2: baseline reference the reader may not share).
  • Fix: "HybridEngine implements this fallback pattern"

[FORBIDDEN] Decision Log (Gemini interface): Intensifier language

  • Text: "fundamentally different interface"
  • Problem: "fundamentally" is an intensifier that adds no precision. The difference is self-evident from the two signatures listed in the same cell.
  • Fix: "different interface"

[TEMPORAL] Milestone 2 Requirements: Change-relative endpoint instruction

  • Text: "Update useReceiptOcr.ts extractReceiptFromImage() to call /ocr/extract/receipt instead of /ocr/extract"
  • Problem: "Update...to call...instead of" is a location directive combined with change-relative language (detection questions #1 and #3). Describes what to change from and to rather than the target state.
  • Fix: "useReceiptOcr.ts extractReceiptFromImage() calls /ocr/extract/receipt with optional receipt_type=fuel form field"

[TEMPORAL] Milestone 5 Requirements: "Rewrite" directive

  • Text: "Rewrite ManualExtractor.extract() to use GeminiEngine instead of traditional OCR pipeline"
  • Problem: "Rewrite...instead of" is change-relative (detection question #1). Describes action taken, not target state.
  • Fix: "ManualExtractor.extract() delegates to GeminiEngine for PDF processing and structured maintenance data extraction"

[TEMPORAL] Milestone 5 Requirements: "Remove" directive with temporal anchor

  • Text: "Remove unused imports and dependencies on table_extraction, patterns after rewrite"
  • Problem: "after rewrite" is temporal contamination (detection question #1). "Remove" is a change action, not a state description.
  • Fix: "manual_extractor.py has no dependencies on table_extraction, patterns, or layout analysis modules"

[TEMPORAL] Milestone 5 Acceptance Criteria: "No longer called" baseline reference

  • Text: "Traditional OCR pipeline code (table_detector, maintenance_patterns) no longer called"
  • Problem: "no longer called" implies a previous state where it was called (detection question #2: baseline reference).
  • Fix: "ManualExtractor does not call table_detector, maintenance_patterns, or layout analysis"

[TEMPORAL] Milestone 7 Requirements: "Remove" change action

  • Text: "Remove "(Coming soon)" label from DocumentForm maintenance scan checkbox"
  • Problem: "Remove" is a change-relative action (detection question #1). Describes what to do, not the end state.
  • Fix: "DocumentForm maintenance scan checkbox has no "(Coming soon)" qualifier"

[TEMPORAL] Milestone 5 Requirements: Vague conditional directive

  • Text: "Update process_manual_job() in extract.py router if needed"
  • Problem: "Update...if needed" is both a location directive (detection question #3) and aspirational hedging (detection question #4). Adds no actionable information.
  • Fix: Delete this bullet. The acceptance criteria define expected behavior; implementation details of which functions change belong in the diff, not the plan.

[TEMPORAL] Milestone 5 Acceptance Criteria: "Existing" and "new" comparison

  • Text: "Existing job queue flow (submit -> poll -> complete) works with new extractor"
  • Problem: "Existing" + "new" is change-relative (detection question #2). Implies comparison between old and new states.
  • Fix: "Job queue flow (submit -> poll -> complete) functions correctly with ManualExtractor"

[TEMPORAL] Milestone 6 Requirements: "Reuse existing" and "already returns"

  • Text: "Reuse existing GET /api/ocr/jobs/:jobId for manual job polling (already returns ManualJobResponse)"
  • Problem: "Reuse existing" and "already returns" are change-relative (detection questions #1 and #2). Describes what exists and instructs to reuse it rather than stating the target state.
  • Fix: "GET /api/ocr/jobs/:jobId handles manual job polling and returns ManualJobResponse"

Considered But Not Flagged

  1. Decision Log reasoning chains using arrow notation (->): Read as logical derivation chains, not temporal narratives. Acceptable.
  2. "Degrade gracefully" in Decision Log: Describes runtime behavior, not a change. Acceptable.
  3. "Must use the async OCR job flow" in constraints: Describes an architectural constraint, not a change. Acceptable.
  4. Line number references (e.g., "line 140", "L23"): Verified against source. Acceptable.
  5. Milestone 4 "NEW" file marker: Communicates file creation intent -- valid milestone scoping. Acceptable.
  6. "Existing receipt_extractor has cross-validation" in Known Risks: Describes current runtime behavior (verified). Not temporal contamination.
  7. Python /extract/receipt endpoint existence claims: Verified. Plan correctly identifies backend proxy as the gap.
  8. MANUAL_JOB_TTL value claim ("7200s"): Verified at job_queue.py. Correct value.

Verdict: NEEDS_CHANGES | Next: QR plan-code (after temporal fixes applied)

## TW Review: Plan Scrub (#129) **Phase**: Plan-Review | **Agent**: Technical Writer | **Status**: NEEDS_CHANGES ## VERDICT: NEEDS_CHANGES ## Findings ### [TEMPORAL] Overview: Progress estimate and missing-component language - **Text**: "Receipt OCR is ~70-80% complete (Python extractors and frontend components exist but backend proxy endpoint is missing)" - **Problem**: Describes project progress relative to a changing baseline (detection question #2: baseline reference). "Missing" is a planning artifact (detection question #4). A future reader has no anchor for what "70-80% complete" means. - **Fix**: Delete this sentence. The milestones define what each milestone produces; the reader does not need a progress estimate in the plan overview. ### [TEMPORAL] Overview: "Rewrite" language for manual extraction - **Text**: "Manual extraction requires a Gemini-based rewrite of the traditional OCR pipeline and entirely new frontend components for review and schedule creation." - **Problem**: "rewrite" and "entirely new" are change-relative (detection question #1). Describes action to be taken on something that exists rather than the target state. - **Fix**: "Manual extraction uses Gemini 2.5 Flash for PDF processing and dedicated frontend components for schedule review and creation." ### [FORBIDDEN] Decision Log (Gemini standalone): Editorial language - **Text**: "standalone module avoids polluting engine abstraction" - **Problem**: "polluting" is editorial/intent leakage (detection question #5). Describes what the author wants to avoid rather than what the code does. - **Fix**: "standalone module because interface signatures differ: OcrEngine.recognize() accepts image bytes and returns text+confidence; GeminiEngine.extract\_maintenance() accepts PDF bytes and returns structured JSON" ### [TEMPORAL] Decision Log (useReceiptOcr endpoint): Change-relative description - **Text**: "Current code calls generic /ocr/extract at line 140 -> misses receipt-specific preprocessing and fuel field extraction -> switching to dedicated endpoint improves extraction accuracy" - **Problem**: "Current code" + "switching to" is change-relative (detection questions #1 and #2). Describes an action being taken and compares against a baseline. - **Fix**: "useReceiptOcr calls /ocr/extract/receipt, which provides receipt-specific preprocessing and fuel field extraction. The generic /ocr/extract endpoint lacks receipt-specialized patterns." ### [TEMPORAL] Decision Log (HybridEngine fallback): Baseline reference - **Text**: "HybridEngine already implements this fallback pattern" - **Problem**: "already implements" implies surprise or recency (detection question #2: baseline reference the reader may not share). - **Fix**: "HybridEngine implements this fallback pattern" ### [FORBIDDEN] Decision Log (Gemini interface): Intensifier language - **Text**: "fundamentally different interface" - **Problem**: "fundamentally" is an intensifier that adds no precision. The difference is self-evident from the two signatures listed in the same cell. - **Fix**: "different interface" ### [TEMPORAL] Milestone 2 Requirements: Change-relative endpoint instruction - **Text**: "Update `useReceiptOcr.ts` `extractReceiptFromImage()` to call `/ocr/extract/receipt` instead of `/ocr/extract`" - **Problem**: "Update...to call...instead of" is a location directive combined with change-relative language (detection questions #1 and #3). Describes what to change from and to rather than the target state. - **Fix**: "`useReceiptOcr.ts` `extractReceiptFromImage()` calls `/ocr/extract/receipt` with optional `receipt_type=fuel` form field" ### [TEMPORAL] Milestone 5 Requirements: "Rewrite" directive - **Text**: "Rewrite `ManualExtractor.extract()` to use GeminiEngine instead of traditional OCR pipeline" - **Problem**: "Rewrite...instead of" is change-relative (detection question #1). Describes action taken, not target state. - **Fix**: "`ManualExtractor.extract()` delegates to GeminiEngine for PDF processing and structured maintenance data extraction" ### [TEMPORAL] Milestone 5 Requirements: "Remove" directive with temporal anchor - **Text**: "Remove unused imports and dependencies on table_extraction, patterns after rewrite" - **Problem**: "after rewrite" is temporal contamination (detection question #1). "Remove" is a change action, not a state description. - **Fix**: "`manual_extractor.py` has no dependencies on `table_extraction`, `patterns`, or layout analysis modules" ### [TEMPORAL] Milestone 5 Acceptance Criteria: "No longer called" baseline reference - **Text**: "Traditional OCR pipeline code (table_detector, maintenance_patterns) no longer called" - **Problem**: "no longer called" implies a previous state where it was called (detection question #2: baseline reference). - **Fix**: "ManualExtractor does not call table\_detector, maintenance\_patterns, or layout analysis" ### [TEMPORAL] Milestone 7 Requirements: "Remove" change action - **Text**: "Remove \"(Coming soon)\" label from DocumentForm maintenance scan checkbox" - **Problem**: "Remove" is a change-relative action (detection question #1). Describes what to do, not the end state. - **Fix**: "DocumentForm maintenance scan checkbox has no \"(Coming soon)\" qualifier" ### [TEMPORAL] Milestone 5 Requirements: Vague conditional directive - **Text**: "Update `process_manual_job()` in extract.py router if needed" - **Problem**: "Update...if needed" is both a location directive (detection question #3) and aspirational hedging (detection question #4). Adds no actionable information. - **Fix**: Delete this bullet. The acceptance criteria define expected behavior; implementation details of which functions change belong in the diff, not the plan. ### [TEMPORAL] Milestone 5 Acceptance Criteria: "Existing" and "new" comparison - **Text**: "Existing job queue flow (submit -> poll -> complete) works with new extractor" - **Problem**: "Existing" + "new" is change-relative (detection question #2). Implies comparison between old and new states. - **Fix**: "Job queue flow (submit -> poll -> complete) functions correctly with ManualExtractor" ### [TEMPORAL] Milestone 6 Requirements: "Reuse existing" and "already returns" - **Text**: "Reuse existing `GET /api/ocr/jobs/:jobId` for manual job polling (already returns ManualJobResponse)" - **Problem**: "Reuse existing" and "already returns" are change-relative (detection questions #1 and #2). Describes what exists and instructs to reuse it rather than stating the target state. - **Fix**: "`GET /api/ocr/jobs/:jobId` handles manual job polling and returns ManualJobResponse" ## Considered But Not Flagged 1. **Decision Log reasoning chains using arrow notation (->)**: Read as logical derivation chains, not temporal narratives. Acceptable. 2. **"Degrade gracefully" in Decision Log**: Describes runtime behavior, not a change. Acceptable. 3. **"Must use the async OCR job flow" in constraints**: Describes an architectural constraint, not a change. Acceptable. 4. **Line number references (e.g., "line 140", "L23")**: Verified against source. Acceptable. 5. **Milestone 4 "NEW" file marker**: Communicates file creation intent -- valid milestone scoping. Acceptable. 6. **"Existing receipt\_extractor has cross-validation" in Known Risks**: Describes current runtime behavior (verified). Not temporal contamination. 7. **Python `/extract/receipt` endpoint existence claims**: Verified. Plan correctly identifies backend proxy as the gap. 8. **`MANUAL_JOB_TTL` value claim ("7200s")**: Verified at `job_queue.py`. Correct value. *Verdict*: NEEDS_CHANGES | *Next*: QR plan-code (after temporal fixes applied)
Author
Owner

QR Review: Plan Code (#129)

Phase: Plan-Review | Agent: Quality Reviewer | Status: NEEDS_CHANGES

VERDICT: NEEDS_CHANGES

Findings

[RULE 0] [CRITICAL]: Missing file size validation server-side in M4 Gemini engine

  • Location: M4 / GeminiEngine.extract_maintenance()
  • Issue: Plan states "reject >20MB with clear error" but does not specify WHERE the size check occurs. Base64 encoding increases size by ~33%. A 20MB PDF becomes ~26.6MB base64. Plan does not clarify if the 20MB limit applies to raw bytes or encoded size, and does not specify if validation happens before or after encoding.
  • Failure Mode: If validation checks raw bytes but Gemini API enforces base64 size, a 15MB PDF could pass validation but fail at API call. If validation is missing entirely, the engine could attempt to encode and send arbitrarily large files, exhausting memory.
  • Suggested Fix: Add to M4 requirements: "GeminiEngine.extract_maintenance() validates PDF size BEFORE base64 encoding. Reject if raw bytes >20MB (clear error). After encoding, verify base64 size <20MB (hard Gemini API limit). Raise EngineProcessingError with size details on violation."

[RULE 0] [CRITICAL]: M6 backend manual proxy endpoint lacks PDF content validation

  • Location: M6 / OcrController.extractManual()
  • Issue: Plan specifies "file validation (200MB max, PDF only)" but existing OcrController pattern validates content type via SUPPORTED_TYPES.has(contentType). A malicious user could send a 200MB zip file renamed to .pdf, bypassing frontend checks.
  • Failure Mode: Backend accepts non-PDF files, forwards to Python service, wastes resources. Malformed files could trigger parser vulnerabilities.
  • Suggested Fix: Add to M6 requirements: "OcrController.extractManual() validates: (1) content type application/pdf OR filename ends .pdf, (2) file size <=200MB, (3) first 4 bytes match PDF magic bytes %PDF. Reject invalid files with 400/415 before forwarding."

[RULE 0] [HIGH]: Missing error handling for Gemini WIF authentication failures in M4

  • Location: M4 / GeminiEngine._get_client()
  • Issue: Plan does not specify error handling for WIF token fetch failures. Existing CloudEngine wraps client initialization in try/except and raises EngineUnavailableError. WIF uses an executable credential source -- if Auth0 M2M token script fails, Vertex AI SDK may raise obscure exception.
  • Failure Mode: Token fetch failure could raise unhandled exception, crashing the worker thread. Users see "Job failed" with no diagnostic info.
  • Suggested Fix: Add to M4 requirements: "GeminiEngine._get_client() wraps Vertex AI client initialization in try/except. Catch all exceptions, log full traceback, raise EngineUnavailableError with message: 'Vertex AI authentication failed: {exc}'."

[RULE 0] [HIGH]: M3 station matching has no timeout specified

  • Location: M3 / google-maps.client.ts station search method
  • Issue: Plan adds Google Places Text Search call but does not specify timeout. If Places API is slow or unresponsive, receipt OCR flow could hang indefinitely.
  • Failure Mode: Receipt OCR succeeds quickly (<3s) but station matching hangs for 30s+, blocking user from accepting result.
  • Suggested Fix: Add to M3 requirements: "google-maps.client.ts station search method uses 5000ms timeout. If Places API exceeds timeout, log warning and return null (no match). Frontend handles null gracefully."

[RULE 0] [HIGH]: M1 receipt proxy endpoint missing error code translation

  • Location: M1 / OcrController.extractReceipt()
  • Issue: Existing OcrController.extract() translates Python HTTP error codes to Fastify error codes (413->413, 415->415, else 500). Python /extract/receipt can return 422 "Failed to extract data from receipt image". Plan does not specify if backend proxy forwards 422 or translates.
  • Failure Mode: Inconsistent error codes confuse frontend error handling.
  • Suggested Fix: Add to M1 requirements: "OcrController.extractReceipt() translates Python error codes: 413->413, 415->415, 422->422, else 500. Match pattern from OcrController.extract()."

[RULE 1] [HIGH]: M2 tier gating is frontend-only -- missing backend enforcement

  • Location: M2 / tier gating in FuelLogForm
  • Issue: Plan says "Add tier gating check in FuelLogForm before showing ReceiptCameraButton (use useTierAccess)" but does NOT add tier guard to backend route in M1. Frontend-only tier check can be bypassed via direct API call.
  • Failure Mode: Free users call POST /api/ocr/extract/receipt directly via curl, bypassing tier gate.
  • Suggested Fix: Change M1 to add tier guard to backend route: "Add POST /api/ocr/extract/receipt route with preHandler: [requireAuth, requireTier('fuelLog.receiptScan')]. Backend returns 403 TIER_REQUIRED for free users." Keep M2 frontend check for UX.

[RULE 1] [HIGH]: M6 tier guard pattern underspecified

  • Location: M6 / "tier guard for document.scanMaintenanceSchedule"
  • Issue: Plan says to add tier guard but does not specify HOW. Existing ocr.routes.ts has NO tier guards on any route. Backend tier gating pattern is underspecified.
  • Failure Mode: Implementation guesses at pattern, potentially inconsistent.
  • Suggested Fix: Add: "Create requireTier() middleware if not exists. Apply to route: preHandler: [requireAuth, requireTier('document.scanMaintenanceSchedule')]. Follow existing auth middleware pattern."

[RULE 1] [HIGH]: M4 missing snake_case / camelCase mapping requirement

  • Location: M4 / GeminiEngine response schema
  • Issue: Gemini response schema uses serviceName, intervalMiles (camelCase). Project convention requires snake_case in Python, camelCase in TypeScript API responses. Plan does not specify how field naming convention is handled across Python -> backend proxy -> frontend.
  • Failure Mode: Naming inconsistency between Python (snake_case) and TypeScript (camelCase) layers causes type mismatches.
  • Suggested Fix: Add to M4: "Gemini response schema uses camelCase (serviceName, intervalMiles). Python manual_extractor.py converts Gemini response to ManualExtractionResult preserving camelCase for API response."

[RULE 1] [HIGH]: M7 mobile conformance underspecified

  • Location: M7 / MaintenanceScheduleReviewScreen
  • Issue: Plan has "Flags: needs conformance check" but project standards require 320px, 768px, 1920px viewports with touch targets >=44px. M7 does not specify touch target sizes for checkboxes, edit buttons, or action button.
  • Failure Mode: Review screen implemented with small touch targets, fails mobile conformance.
  • Suggested Fix: Add to M7: "Touch targets: checkbox hit areas >=44px, edit buttons >=44px, Create button >=44px height. Test at 320px, 768px, 1920px viewports."

[RULE 0] [HIGH]: M5 progress callback may not work during Gemini API call

  • Location: M5 / ManualExtractor.extract() progress callback
  • Issue: Existing manual_extractor.py calls progress_callback(percent, message) synchronously. M5 delegates to GeminiEngine which makes a single blocking API call (10-60s). Progress callback cannot fire mid-extraction.
  • Failure Mode: User sees "5% Starting extraction" then waits 30-60s with no updates, then "100% Complete". Poor UX.
  • Suggested Fix: Add to M5: "Simplified progress: fire 10% before Gemini call, 95% after Gemini returns, 100% after mapping. Document: no sub-progress during Gemini API call."

[RULE 0] [HIGH]: M6 async job polling lacks timeout/expiry handling

  • Location: M6 / GET /api/ocr/jobs/:jobId
  • Issue: Redis job data has 2-hour TTL. If job is stuck or worker crashes, frontend polls until TTL expires, then gets 404 "Job not found" -- confusing for users.
  • Failure Mode: Frontend polls every 2s for 2 hours, then job vanishes. User sees confusing error.
  • Suggested Fix: Add to M6: "OcrService.getJobStatus() returns 410 GONE if job not found (TTL expired). Message: 'Job expired (max 2 hours). Please resubmit.'"

[RULE 2] [SHOULD_FIX]: OcrClient/OcrService/OcrController approaching god object thresholds

  • Location: M1 and M6 / all three classes
  • Issue: OcrClient goes from 5 to 7 methods. OcrService goes from 4 to 6. OcrController goes from 4 to 6 methods (~515 lines). Not yet at god object threshold (>15 methods) but trending upward.
  • Failure Mode: Future features keep adding methods until threshold hit.
  • Suggested Fix: No blocking action. Add architectural note: "If Ocr classes exceed 10 methods, consider splitting into specialized classes (VinOcr, ReceiptOcr, ManualOcr)."

Considered But Not Flagged

  • M1 receipt endpoint 10MB limit: Matches existing MAX_SYNC_SIZE. Consistent.
  • M6 manual endpoint 200MB limit: Matches Python MAX_MANUAL_SIZE. Consistent.
  • M4 Gemini model choice (gemini-2.5-flash): Appropriate for structured extraction.
  • M3 Places API budget: Correctly identified as separate quota.
  • M2 tier key "fuelLog.receiptScan": Follows existing FEATURE_TIERS pattern.
  • M4 WIF credential path: Reuses existing GOOGLE_APPLICATION_CREDENTIALS pattern.
  • M5 fuzzy matching for subtypes: Reasonable approach.
  • M7 useManualExtraction hook pattern: Mirrors existing useReceiptOcr pattern.
  • Test specifications across M1-M7: Coverage adequate.

Verdict: NEEDS_CHANGES | Next: QR plan-docs

## QR Review: Plan Code (#129) **Phase**: Plan-Review | **Agent**: Quality Reviewer | **Status**: NEEDS_CHANGES ## VERDICT: NEEDS_CHANGES ## Findings ### [RULE 0] [CRITICAL]: Missing file size validation server-side in M4 Gemini engine - **Location**: M4 / GeminiEngine.extract_maintenance() - **Issue**: Plan states "reject >20MB with clear error" but does not specify WHERE the size check occurs. Base64 encoding increases size by ~33%. A 20MB PDF becomes ~26.6MB base64. Plan does not clarify if the 20MB limit applies to raw bytes or encoded size, and does not specify if validation happens before or after encoding. - **Failure Mode**: If validation checks raw bytes but Gemini API enforces base64 size, a 15MB PDF could pass validation but fail at API call. If validation is missing entirely, the engine could attempt to encode and send arbitrarily large files, exhausting memory. - **Suggested Fix**: Add to M4 requirements: "GeminiEngine.extract_maintenance() validates PDF size BEFORE base64 encoding. Reject if raw bytes >20MB (clear error). After encoding, verify base64 size <20MB (hard Gemini API limit). Raise EngineProcessingError with size details on violation." ### [RULE 0] [CRITICAL]: M6 backend manual proxy endpoint lacks PDF content validation - **Location**: M6 / OcrController.extractManual() - **Issue**: Plan specifies "file validation (200MB max, PDF only)" but existing OcrController pattern validates content type via `SUPPORTED_TYPES.has(contentType)`. A malicious user could send a 200MB zip file renamed to .pdf, bypassing frontend checks. - **Failure Mode**: Backend accepts non-PDF files, forwards to Python service, wastes resources. Malformed files could trigger parser vulnerabilities. - **Suggested Fix**: Add to M6 requirements: "OcrController.extractManual() validates: (1) content type application/pdf OR filename ends .pdf, (2) file size <=200MB, (3) first 4 bytes match PDF magic bytes `%PDF`. Reject invalid files with 400/415 before forwarding." ### [RULE 0] [HIGH]: Missing error handling for Gemini WIF authentication failures in M4 - **Location**: M4 / GeminiEngine._get_client() - **Issue**: Plan does not specify error handling for WIF token fetch failures. Existing CloudEngine wraps client initialization in try/except and raises EngineUnavailableError. WIF uses an executable credential source -- if Auth0 M2M token script fails, Vertex AI SDK may raise obscure exception. - **Failure Mode**: Token fetch failure could raise unhandled exception, crashing the worker thread. Users see "Job failed" with no diagnostic info. - **Suggested Fix**: Add to M4 requirements: "GeminiEngine._get_client() wraps Vertex AI client initialization in try/except. Catch all exceptions, log full traceback, raise EngineUnavailableError with message: 'Vertex AI authentication failed: {exc}'." ### [RULE 0] [HIGH]: M3 station matching has no timeout specified - **Location**: M3 / google-maps.client.ts station search method - **Issue**: Plan adds Google Places Text Search call but does not specify timeout. If Places API is slow or unresponsive, receipt OCR flow could hang indefinitely. - **Failure Mode**: Receipt OCR succeeds quickly (<3s) but station matching hangs for 30s+, blocking user from accepting result. - **Suggested Fix**: Add to M3 requirements: "google-maps.client.ts station search method uses 5000ms timeout. If Places API exceeds timeout, log warning and return null (no match). Frontend handles null gracefully." ### [RULE 0] [HIGH]: M1 receipt proxy endpoint missing error code translation - **Location**: M1 / OcrController.extractReceipt() - **Issue**: Existing OcrController.extract() translates Python HTTP error codes to Fastify error codes (413->413, 415->415, else 500). Python `/extract/receipt` can return 422 "Failed to extract data from receipt image". Plan does not specify if backend proxy forwards 422 or translates. - **Failure Mode**: Inconsistent error codes confuse frontend error handling. - **Suggested Fix**: Add to M1 requirements: "OcrController.extractReceipt() translates Python error codes: 413->413, 415->415, 422->422, else 500. Match pattern from OcrController.extract()." ### [RULE 1] [HIGH]: M2 tier gating is frontend-only -- missing backend enforcement - **Location**: M2 / tier gating in FuelLogForm - **Issue**: Plan says "Add tier gating check in FuelLogForm before showing ReceiptCameraButton (use `useTierAccess`)" but does NOT add tier guard to backend route in M1. Frontend-only tier check can be bypassed via direct API call. - **Failure Mode**: Free users call POST /api/ocr/extract/receipt directly via curl, bypassing tier gate. - **Suggested Fix**: Change M1 to add tier guard to backend route: "Add `POST /api/ocr/extract/receipt` route with `preHandler: [requireAuth, requireTier('fuelLog.receiptScan')]`. Backend returns 403 TIER_REQUIRED for free users." Keep M2 frontend check for UX. ### [RULE 1] [HIGH]: M6 tier guard pattern underspecified - **Location**: M6 / "tier guard for document.scanMaintenanceSchedule" - **Issue**: Plan says to add tier guard but does not specify HOW. Existing ocr.routes.ts has NO tier guards on any route. Backend tier gating pattern is underspecified. - **Failure Mode**: Implementation guesses at pattern, potentially inconsistent. - **Suggested Fix**: Add: "Create `requireTier()` middleware if not exists. Apply to route: `preHandler: [requireAuth, requireTier('document.scanMaintenanceSchedule')]`. Follow existing auth middleware pattern." ### [RULE 1] [HIGH]: M4 missing snake_case / camelCase mapping requirement - **Location**: M4 / GeminiEngine response schema - **Issue**: Gemini response schema uses `serviceName`, `intervalMiles` (camelCase). Project convention requires snake_case in Python, camelCase in TypeScript API responses. Plan does not specify how field naming convention is handled across Python -> backend proxy -> frontend. - **Failure Mode**: Naming inconsistency between Python (snake_case) and TypeScript (camelCase) layers causes type mismatches. - **Suggested Fix**: Add to M4: "Gemini response schema uses camelCase (serviceName, intervalMiles). Python manual_extractor.py converts Gemini response to ManualExtractionResult preserving camelCase for API response." ### [RULE 1] [HIGH]: M7 mobile conformance underspecified - **Location**: M7 / MaintenanceScheduleReviewScreen - **Issue**: Plan has "Flags: needs conformance check" but project standards require 320px, 768px, 1920px viewports with touch targets >=44px. M7 does not specify touch target sizes for checkboxes, edit buttons, or action button. - **Failure Mode**: Review screen implemented with small touch targets, fails mobile conformance. - **Suggested Fix**: Add to M7: "Touch targets: checkbox hit areas >=44px, edit buttons >=44px, Create button >=44px height. Test at 320px, 768px, 1920px viewports." ### [RULE 0] [HIGH]: M5 progress callback may not work during Gemini API call - **Location**: M5 / ManualExtractor.extract() progress callback - **Issue**: Existing manual_extractor.py calls `progress_callback(percent, message)` synchronously. M5 delegates to GeminiEngine which makes a single blocking API call (10-60s). Progress callback cannot fire mid-extraction. - **Failure Mode**: User sees "5% Starting extraction" then waits 30-60s with no updates, then "100% Complete". Poor UX. - **Suggested Fix**: Add to M5: "Simplified progress: fire 10% before Gemini call, 95% after Gemini returns, 100% after mapping. Document: no sub-progress during Gemini API call." ### [RULE 0] [HIGH]: M6 async job polling lacks timeout/expiry handling - **Location**: M6 / GET /api/ocr/jobs/:jobId - **Issue**: Redis job data has 2-hour TTL. If job is stuck or worker crashes, frontend polls until TTL expires, then gets 404 "Job not found" -- confusing for users. - **Failure Mode**: Frontend polls every 2s for 2 hours, then job vanishes. User sees confusing error. - **Suggested Fix**: Add to M6: "OcrService.getJobStatus() returns 410 GONE if job not found (TTL expired). Message: 'Job expired (max 2 hours). Please resubmit.'" ### [RULE 2] [SHOULD_FIX]: OcrClient/OcrService/OcrController approaching god object thresholds - **Location**: M1 and M6 / all three classes - **Issue**: OcrClient goes from 5 to 7 methods. OcrService goes from 4 to 6. OcrController goes from 4 to 6 methods (~515 lines). Not yet at god object threshold (>15 methods) but trending upward. - **Failure Mode**: Future features keep adding methods until threshold hit. - **Suggested Fix**: No blocking action. Add architectural note: "If Ocr classes exceed 10 methods, consider splitting into specialized classes (VinOcr, ReceiptOcr, ManualOcr)." ## Considered But Not Flagged - **M1 receipt endpoint 10MB limit**: Matches existing MAX_SYNC_SIZE. Consistent. - **M6 manual endpoint 200MB limit**: Matches Python MAX_MANUAL_SIZE. Consistent. - **M4 Gemini model choice (gemini-2.5-flash)**: Appropriate for structured extraction. - **M3 Places API budget**: Correctly identified as separate quota. - **M2 tier key "fuelLog.receiptScan"**: Follows existing FEATURE_TIERS pattern. - **M4 WIF credential path**: Reuses existing GOOGLE_APPLICATION_CREDENTIALS pattern. - **M5 fuzzy matching for subtypes**: Reasonable approach. - **M7 useManualExtraction hook pattern**: Mirrors existing useReceiptOcr pattern. - **Test specifications across M1-M7**: Coverage adequate. *Verdict*: NEEDS_CHANGES | *Next*: QR plan-docs
Author
Owner

QR Review: Plan Docs (#129)

Phase: Plan-Review | Agent: Quality Reviewer | Status: NEEDS_CHANGES

VERDICT: NEEDS_CHANGES

The plan's Milestone 8 (Documentation) is structurally present but has critical gaps and inaccuracies. The Invisible Knowledge section contains valuable architecture documentation but also exhibits temporal contamination that survived the TW review. Several documentation files listed do not exist in the codebase, and M8 omits critical files that will be created/modified.

Findings

[RULE 1] [CRITICAL]: M8 lists non-existent CLAUDE.md files

  • Location: M8 -> Files
  • Issue: Plan lists ocr/app/engines/CLAUDE.md, frontend/src/features/maintenance/CLAUDE.md, and frontend/src/features/documents/CLAUDE.md for update. Verified via filesystem: NONE of these files exist.
  • Failure Mode: Execution will attempt to "update" non-existent files. Should these be created (NEW) or omitted entirely?
  • Suggested Fix: Either (1) change M8 to create these files with (NEW) marker, OR (2) remove them and update parent CLAUDE.md files instead (ocr/app/CLAUDE.md, frontend/src/features/CLAUDE.md).

[RULE 1] [CRITICAL]: M8 missing critical backend files that will be modified

  • Location: M8 -> Files
  • Issue: M1 and M6 modify 5 files each in backend/src/features/ocr/* but M8 only lists backend/src/features/ocr/CLAUDE.md and backend/src/features/ocr/README.md. The current backend/src/features/ocr/CLAUDE.md has NO entries for the api/, domain/, or external/ subdirectories.
  • Failure Mode: CLAUDE.md remains incomplete, failing to index new proxy endpoints and types.
  • Suggested Fix: Add to M8 requirements: "Add entries to backend/src/features/ocr/CLAUDE.md for: api/ocr.controller.ts (request handlers), api/ocr.routes.ts (route registration), domain/ocr.service.ts (business logic), domain/ocr.types.ts (TypeScript types), external/ocr-client.ts (HTTP client to Python service)."

[RULE 1] [HIGH]: M8 missing entries for new Python files

  • Location: M8 -> Files
  • Issue: M4 creates ocr/app/engines/gemini_engine.py (NEW), M5 rewrites ocr/app/extractors/manual_extractor.py. M8 lists ocr/app/CLAUDE.md but does not specify adding entries for these files. Current ocr/app/CLAUDE.md lists subdirectories but NOT individual files.
  • Failure Mode: GeminiEngine and rewritten ManualExtractor invisible to LLM navigation.
  • Suggested Fix: Add to M8: "Add entry to ocr/app/CLAUDE.md: engines/gemini_engine.py (WHAT: Gemini 2.5 Flash integration for maintenance extraction, WHEN: Manual extraction debugging)."

[RULE 1] [HIGH]: M8 missing documentation for new frontend hooks

  • Location: M8 -> Files
  • Issue: M7 creates useManualExtraction.ts and useCreateSchedulesFromExtraction.ts. Neither frontend feature directory has a CLAUDE.md file currently.
  • Failure Mode: New hooks undocumented, invisible to LLM.
  • Suggested Fix: Update frontend/src/features/CLAUDE.md with expanded entries for documents/ and maintenance/ subdirectories.

[RULE 1] [HIGH]: M8 missing documentation for tier gating changes

  • Location: M8 -> Files
  • Issue: M2 modifies backend/src/core/config/feature-tiers.ts (core config change). M8 does not list any core/ files for documentation updates.
  • Failure Mode: Tier gate change undocumented.
  • Suggested Fix: Add to M8: "Update backend/src/core/CLAUDE.md entry for config/feature-tiers.ts to reflect fuelLog.receiptScan addition."

[RULE 1] [HIGH]: M8 does not specify README.md sections to add

  • Location: M8 -> Requirements -> "Update README.md with architecture diagrams"
  • Issue: Which README.md? Plan lists backend/src/features/ocr/README.md. Requirements don't specify which sections to add.
  • Failure Mode: Implementation guesses at structure.
  • Suggested Fix: Clarify: "Update backend/src/features/ocr/README.md with: (1) Receipt OCR Flow section with architecture diagram, (2) Manual Extraction Flow section with Gemini integration. Update API Endpoints table to include POST /extract/receipt and POST /extract/manual."

[RULE 0] [HIGH]: M8 source material reference is circular

  • Location: M8 -> Source Material: "Invisible Knowledge section of this plan"
  • Issue: Documentation should capture knowledge from IMPLEMENTED CODE, not from the plan. What if implementation deviates from plan?
  • Failure Mode: Documentation becomes copy-paste of plan instead of reflecting actual behavior.
  • Suggested Fix: Change to: "Verify architecture diagrams against implemented code. If implementation deviates from Invisible Knowledge section, update diagrams to match actual code paths."

[RULE 2] [SHOULD_FIX]: Invisible Knowledge "Why This Structure" has temporal contamination

  • Location: Invisible Knowledge -> Why This Structure -> Gemini standalone module
  • Issue: "Does not extend OcrEngine because..." is change-relative (describes what was NOT done).
  • Suggested Fix: "GeminiEngine is a standalone module. OcrEngine handles image-to-text extraction; GeminiEngine handles PDF-to-structured-data extraction. Different input types, output formats, and error modes warrant separate abstractions."

[RULE 2] [SHOULD_FIX]: M8 Acceptance Criteria use action verbs instead of state descriptions

  • Location: M8 -> Acceptance Criteria
  • Issue: "CLAUDE.md enables LLM to locate..." and "README.md captures knowledge..." are vague. Describe content, not behavior.
  • Suggested Fix: Rewrite:
    • "CLAUDE.md contains index entries for all new/modified files with WHAT and WHEN columns"
    • "README.md contains architecture diagrams for receipt and manual extraction flows"
    • "All documentation uses timeless present tense"

[RULE 2] [SHOULD_FIX]: M8 lacks verification strategy

  • Location: M8 -> Tests: "Skip - documentation-only milestone"
  • Issue: No verification that CLAUDE.md entries are complete and accurate.
  • Suggested Fix: Add manual verification checklist: "(1) All files from M1-M7 have CLAUDE.md entries, (2) All WHAT/WHEN columns complete, (3) README.md diagrams match final implementation."

Considered But Not Flagged

  • Architecture diagrams in Invisible Knowledge use timeless present tense arrows. Acceptable.
  • "Station matching in backend" explanation describes behavior, not change. Acceptable.
  • Data flow descriptions use arrow notation for flow. Acceptable.
  • Invariants section phrased as "MUST" constraints. No temporal contamination.
  • Tradeoffs section describes design choices with "Sacrifices X to avoid Y". Acceptable.
  • Gemini prompt and response schema in Invisible Knowledge. Useful reference material.

Plan Review Cycle Summary

Review Verdict Findings
QR plan-completeness PASS_WITH_CONCERNS 1 CRITICAL (missing sub-issue constraint), 7 SHOULD_FIX
TW plan-scrub NEEDS_CHANGES 14 temporal contamination instances
QR plan-code NEEDS_CHANGES 2 CRITICAL (file size validation, PDF content validation), 8 HIGH, 2 SHOULD_FIX
QR plan-docs NEEDS_CHANGES 2 CRITICAL (non-existent files, missing backend entries), 5 HIGH, 3 SHOULD_FIX

Overall recommendation: REVISION NEEDED before execution. All four reviews found actionable issues:

  • Completeness: Missing workflow constraints (sub-issues)
  • Temporal: 14 contamination instances requiring rewrites
  • Code: 2 CRITICAL + 8 HIGH reliability/standards violations
  • Docs: 2 CRITICAL + 5 HIGH documentation completeness gaps

The plan foundation is solid (comprehensive Decision Log, well-structured milestones, valuable Invisible Knowledge). Recommend:

  1. Apply all 14 TW temporal fixes
  2. Address all CRITICAL findings from plan-code and plan-docs
  3. Add sub-issue decomposition constraint
  4. Fix M8 file list and requirements to match actual codebase structure
  5. Add backend tier enforcement to M1 (receipt endpoint)
  6. Specify timeouts, error codes, and progress callback behavior

Verdict: NEEDS_CHANGES | Next: Address all review findings, update plan, proceed to execution

## QR Review: Plan Docs (#129) **Phase**: Plan-Review | **Agent**: Quality Reviewer | **Status**: NEEDS_CHANGES ## VERDICT: NEEDS_CHANGES The plan's Milestone 8 (Documentation) is structurally present but has critical gaps and inaccuracies. The Invisible Knowledge section contains valuable architecture documentation but also exhibits temporal contamination that survived the TW review. Several documentation files listed do not exist in the codebase, and M8 omits critical files that will be created/modified. ## Findings ### [RULE 1] [CRITICAL]: M8 lists non-existent CLAUDE.md files - **Location**: M8 -> Files - **Issue**: Plan lists `ocr/app/engines/CLAUDE.md`, `frontend/src/features/maintenance/CLAUDE.md`, and `frontend/src/features/documents/CLAUDE.md` for update. Verified via filesystem: NONE of these files exist. - **Failure Mode**: Execution will attempt to "update" non-existent files. Should these be created (NEW) or omitted entirely? - **Suggested Fix**: Either (1) change M8 to create these files with `(NEW)` marker, OR (2) remove them and update parent CLAUDE.md files instead (`ocr/app/CLAUDE.md`, `frontend/src/features/CLAUDE.md`). ### [RULE 1] [CRITICAL]: M8 missing critical backend files that will be modified - **Location**: M8 -> Files - **Issue**: M1 and M6 modify 5 files each in `backend/src/features/ocr/*` but M8 only lists `backend/src/features/ocr/CLAUDE.md` and `backend/src/features/ocr/README.md`. The current `backend/src/features/ocr/CLAUDE.md` has NO entries for the `api/`, `domain/`, or `external/` subdirectories. - **Failure Mode**: CLAUDE.md remains incomplete, failing to index new proxy endpoints and types. - **Suggested Fix**: Add to M8 requirements: "Add entries to `backend/src/features/ocr/CLAUDE.md` for: `api/ocr.controller.ts` (request handlers), `api/ocr.routes.ts` (route registration), `domain/ocr.service.ts` (business logic), `domain/ocr.types.ts` (TypeScript types), `external/ocr-client.ts` (HTTP client to Python service)." ### [RULE 1] [HIGH]: M8 missing entries for new Python files - **Location**: M8 -> Files - **Issue**: M4 creates `ocr/app/engines/gemini_engine.py` (NEW), M5 rewrites `ocr/app/extractors/manual_extractor.py`. M8 lists `ocr/app/CLAUDE.md` but does not specify adding entries for these files. Current `ocr/app/CLAUDE.md` lists subdirectories but NOT individual files. - **Failure Mode**: GeminiEngine and rewritten ManualExtractor invisible to LLM navigation. - **Suggested Fix**: Add to M8: "Add entry to `ocr/app/CLAUDE.md`: `engines/gemini_engine.py` (WHAT: Gemini 2.5 Flash integration for maintenance extraction, WHEN: Manual extraction debugging)." ### [RULE 1] [HIGH]: M8 missing documentation for new frontend hooks - **Location**: M8 -> Files - **Issue**: M7 creates `useManualExtraction.ts` and `useCreateSchedulesFromExtraction.ts`. Neither frontend feature directory has a CLAUDE.md file currently. - **Failure Mode**: New hooks undocumented, invisible to LLM. - **Suggested Fix**: Update `frontend/src/features/CLAUDE.md` with expanded entries for documents/ and maintenance/ subdirectories. ### [RULE 1] [HIGH]: M8 missing documentation for tier gating changes - **Location**: M8 -> Files - **Issue**: M2 modifies `backend/src/core/config/feature-tiers.ts` (core config change). M8 does not list any core/ files for documentation updates. - **Failure Mode**: Tier gate change undocumented. - **Suggested Fix**: Add to M8: "Update `backend/src/core/CLAUDE.md` entry for `config/feature-tiers.ts` to reflect `fuelLog.receiptScan` addition." ### [RULE 1] [HIGH]: M8 does not specify README.md sections to add - **Location**: M8 -> Requirements -> "Update README.md with architecture diagrams" - **Issue**: Which README.md? Plan lists `backend/src/features/ocr/README.md`. Requirements don't specify which sections to add. - **Failure Mode**: Implementation guesses at structure. - **Suggested Fix**: Clarify: "Update `backend/src/features/ocr/README.md` with: (1) Receipt OCR Flow section with architecture diagram, (2) Manual Extraction Flow section with Gemini integration. Update API Endpoints table to include POST /extract/receipt and POST /extract/manual." ### [RULE 0] [HIGH]: M8 source material reference is circular - **Location**: M8 -> Source Material: "Invisible Knowledge section of this plan" - **Issue**: Documentation should capture knowledge from IMPLEMENTED CODE, not from the plan. What if implementation deviates from plan? - **Failure Mode**: Documentation becomes copy-paste of plan instead of reflecting actual behavior. - **Suggested Fix**: Change to: "Verify architecture diagrams against implemented code. If implementation deviates from Invisible Knowledge section, update diagrams to match actual code paths." ### [RULE 2] [SHOULD_FIX]: Invisible Knowledge "Why This Structure" has temporal contamination - **Location**: Invisible Knowledge -> Why This Structure -> Gemini standalone module - **Issue**: "Does not extend OcrEngine because..." is change-relative (describes what was NOT done). - **Suggested Fix**: "GeminiEngine is a standalone module. OcrEngine handles image-to-text extraction; GeminiEngine handles PDF-to-structured-data extraction. Different input types, output formats, and error modes warrant separate abstractions." ### [RULE 2] [SHOULD_FIX]: M8 Acceptance Criteria use action verbs instead of state descriptions - **Location**: M8 -> Acceptance Criteria - **Issue**: "CLAUDE.md enables LLM to locate..." and "README.md captures knowledge..." are vague. Describe content, not behavior. - **Suggested Fix**: Rewrite: - "CLAUDE.md contains index entries for all new/modified files with WHAT and WHEN columns" - "README.md contains architecture diagrams for receipt and manual extraction flows" - "All documentation uses timeless present tense" ### [RULE 2] [SHOULD_FIX]: M8 lacks verification strategy - **Location**: M8 -> Tests: "Skip - documentation-only milestone" - **Issue**: No verification that CLAUDE.md entries are complete and accurate. - **Suggested Fix**: Add manual verification checklist: "(1) All files from M1-M7 have CLAUDE.md entries, (2) All WHAT/WHEN columns complete, (3) README.md diagrams match final implementation." ## Considered But Not Flagged - Architecture diagrams in Invisible Knowledge use timeless present tense arrows. Acceptable. - "Station matching in backend" explanation describes behavior, not change. Acceptable. - Data flow descriptions use arrow notation for flow. Acceptable. - Invariants section phrased as "MUST" constraints. No temporal contamination. - Tradeoffs section describes design choices with "Sacrifices X to avoid Y". Acceptable. - Gemini prompt and response schema in Invisible Knowledge. Useful reference material. ## Plan Review Cycle Summary | Review | Verdict | Findings | | --- | --- | --- | | QR plan-completeness | PASS_WITH_CONCERNS | 1 CRITICAL (missing sub-issue constraint), 7 SHOULD_FIX | | TW plan-scrub | NEEDS_CHANGES | 14 temporal contamination instances | | QR plan-code | NEEDS_CHANGES | 2 CRITICAL (file size validation, PDF content validation), 8 HIGH, 2 SHOULD_FIX | | QR plan-docs | NEEDS_CHANGES | 2 CRITICAL (non-existent files, missing backend entries), 5 HIGH, 3 SHOULD_FIX | **Overall recommendation**: REVISION NEEDED before execution. All four reviews found actionable issues: - **Completeness**: Missing workflow constraints (sub-issues) - **Temporal**: 14 contamination instances requiring rewrites - **Code**: 2 CRITICAL + 8 HIGH reliability/standards violations - **Docs**: 2 CRITICAL + 5 HIGH documentation completeness gaps The plan foundation is solid (comprehensive Decision Log, well-structured milestones, valuable Invisible Knowledge). Recommend: 1. Apply all 14 TW temporal fixes 2. Address all CRITICAL findings from plan-code and plan-docs 3. Add sub-issue decomposition constraint 4. Fix M8 file list and requirements to match actual codebase structure 5. Add backend tier enforcement to M1 (receipt endpoint) 6. Specify timeouts, error codes, and progress callback behavior *Verdict*: NEEDS_CHANGES | *Next*: Address all review findings, update plan, proceed to execution
Author
Owner

Plan Revision Decisions (#129)

Phase: Plan-Review | Agent: Planner | Status: IN_PROGRESS

Decisions from Review Findings

# Decision Rationale Review Source
1 Station matching via separate frontend call (POST /api/stations/match) Frontend calls OCR, receives extractedFields, then calls station matching endpoint. Two sequential calls. Better separation of concerns, simpler backend. QR completeness
2 Create 3 NEW CLAUDE.md files (ocr/app/engines/, frontend maintenance/, frontend documents/) More granular LLM navigation, follows pattern of other feature directories. QR docs
3 New milestone M0: Tier Guard Middleware (before M1) requireTier() preHandler doesn't exist yet. Clean separation. M1 and M6 reuse it. Both receipt and manual endpoints get backend enforcement. QR code
4 20MB raw bytes PDF size limit for Gemini Vertex AI SDK handles base64 encoding internally. Validate raw bytes only. Matches Gemini API docs. QR code
5 PDF magic byte validation (%PDF header check) in M6 Validate first 4 bytes before forwarding to Python. Prevents malformed file processing. QR code
6 3-step progress for Gemini extraction 10% Preparing -> 50% Processing with Gemini -> 95% Mapping results -> 100% Complete. Honest about the blocking wait. QR code
7 410 Gone for expired job polling HTTP 410 semantically correct for TTL-expired Redis jobs. Frontend shows "Job expired, please resubmit." QR code

Straightforward Fixes (No Decision Needed)

  • Apply all 14 TW temporal contamination rewrites
  • M1: Add error code translation (413, 415, 422, else 500)
  • M1: Add backend tier enforcement via requireTier() (from M0)
  • M3: Add 5s timeout, return null on failure
  • M4: Add WIF error handling (try/except, EngineUnavailableError)
  • M4: Clarify camelCase field naming across Python/TypeScript
  • M5: Remove "unused imports" bullet, reword ACs to state descriptions
  • M7: Add touch target specs (>=44px) and viewport breakpoints (320/768/1920px)
  • M8: Fix file list (add backend entries, mark 3 files as NEW, add core/ entry)
  • M8: Reword ACs, add verification checklist, fix source material reference
  • Invisible Knowledge: Fix "Does not extend" temporal contamination
  • Add sub-issue decomposition constraint to Constraints section
  • Add god object monitoring note as architectural guidance

Next: Revise plan comment with all decisions and fixes applied

## Plan Revision Decisions (#129) **Phase**: Plan-Review | **Agent**: Planner | **Status**: IN_PROGRESS ### Decisions from Review Findings | # | Decision | Rationale | Review Source | | --- | --- | --- | --- | | 1 | Station matching via separate frontend call (POST /api/stations/match) | Frontend calls OCR, receives extractedFields, then calls station matching endpoint. Two sequential calls. Better separation of concerns, simpler backend. | QR completeness | | 2 | Create 3 NEW CLAUDE.md files (ocr/app/engines/, frontend maintenance/, frontend documents/) | More granular LLM navigation, follows pattern of other feature directories. | QR docs | | 3 | New milestone M0: Tier Guard Middleware (before M1) | requireTier() preHandler doesn't exist yet. Clean separation. M1 and M6 reuse it. Both receipt and manual endpoints get backend enforcement. | QR code | | 4 | 20MB raw bytes PDF size limit for Gemini | Vertex AI SDK handles base64 encoding internally. Validate raw bytes only. Matches Gemini API docs. | QR code | | 5 | PDF magic byte validation (%PDF header check) in M6 | Validate first 4 bytes before forwarding to Python. Prevents malformed file processing. | QR code | | 6 | 3-step progress for Gemini extraction | 10% Preparing -> 50% Processing with Gemini -> 95% Mapping results -> 100% Complete. Honest about the blocking wait. | QR code | | 7 | 410 Gone for expired job polling | HTTP 410 semantically correct for TTL-expired Redis jobs. Frontend shows "Job expired, please resubmit." | QR code | ### Straightforward Fixes (No Decision Needed) - Apply all 14 TW temporal contamination rewrites - M1: Add error code translation (413, 415, 422, else 500) - M1: Add backend tier enforcement via requireTier() (from M0) - M3: Add 5s timeout, return null on failure - M4: Add WIF error handling (try/except, EngineUnavailableError) - M4: Clarify camelCase field naming across Python/TypeScript - M5: Remove "unused imports" bullet, reword ACs to state descriptions - M7: Add touch target specs (>=44px) and viewport breakpoints (320/768/1920px) - M8: Fix file list (add backend entries, mark 3 files as NEW, add core/ entry) - M8: Reword ACs, add verification checklist, fix source material reference - Invisible Knowledge: Fix "Does not extend" temporal contamination - Add sub-issue decomposition constraint to Constraints section - Add god object monitoring note as architectural guidance *Next*: Revise plan comment with all decisions and fixes applied
Author
Owner

Milestone: M8 Documentation (#137)

Phase: Execution | Agent: Developer | Status: PASS

Summary

Updated CLAUDE.md indexes and README.md files across all three layers (backend, Python OCR service, frontend) to document the OCR expansion implemented in M0-M7.

Changes

Updated files (7):

  • backend/src/core/CLAUDE.md -- added tier guard plugin reference
  • backend/src/features/CLAUDE.md -- expanded OCR feature description
  • backend/src/features/ocr/CLAUDE.md -- added entries for all api/, domain/, external/, tests/ files
  • backend/src/features/ocr/README.md -- added Receipt OCR Flow and Manual Extraction Flow architecture diagrams, expanded API endpoint table with receipt and manual endpoints, added response types and error handling documentation
  • frontend/src/features/CLAUDE.md -- expanded documents/, fuel-logs/, maintenance/ descriptions
  • ocr/CLAUDE.md -- added Gemini reference
  • ocr/app/CLAUDE.md -- expanded subdirectory descriptions with Gemini and extraction details

Created files (4):

  • ocr/app/engines/CLAUDE.md -- engine layer documentation: OcrEngine subclasses vs standalone GeminiEngine, engine factory, engine selection diagram
  • frontend/src/features/fuel-logs/CLAUDE.md -- receipt OCR flow, key hooks and components, camera-to-form pipeline
  • frontend/src/features/documents/CLAUDE.md -- manual extraction flow, job polling, document management
  • frontend/src/features/maintenance/CLAUDE.md -- extraction review flow, batch schedule creation, subtype management

Verification

  • All files from M0-M7 have CLAUDE.md entries with WHAT and WHEN columns
  • Architecture diagrams verified against actual implementation
  • All documentation uses timeless present tense (no temporal contamination)
  • README.md diagrams match code paths in ocr.controller.ts, useReceiptOcr.ts, useManualExtraction.ts

Commit

ab0d846 docs: update CLAUDE.md indexes and README for OCR expansion (refs #137)

Verdict: PASS | Next: QR post-implementation review

## Milestone: M8 Documentation (#137) **Phase**: Execution | **Agent**: Developer | **Status**: PASS ### Summary Updated CLAUDE.md indexes and README.md files across all three layers (backend, Python OCR service, frontend) to document the OCR expansion implemented in M0-M7. ### Changes **Updated files (7):** - `backend/src/core/CLAUDE.md` -- added tier guard plugin reference - `backend/src/features/CLAUDE.md` -- expanded OCR feature description - `backend/src/features/ocr/CLAUDE.md` -- added entries for all api/, domain/, external/, tests/ files - `backend/src/features/ocr/README.md` -- added Receipt OCR Flow and Manual Extraction Flow architecture diagrams, expanded API endpoint table with receipt and manual endpoints, added response types and error handling documentation - `frontend/src/features/CLAUDE.md` -- expanded documents/, fuel-logs/, maintenance/ descriptions - `ocr/CLAUDE.md` -- added Gemini reference - `ocr/app/CLAUDE.md` -- expanded subdirectory descriptions with Gemini and extraction details **Created files (4):** - `ocr/app/engines/CLAUDE.md` -- engine layer documentation: OcrEngine subclasses vs standalone GeminiEngine, engine factory, engine selection diagram - `frontend/src/features/fuel-logs/CLAUDE.md` -- receipt OCR flow, key hooks and components, camera-to-form pipeline - `frontend/src/features/documents/CLAUDE.md` -- manual extraction flow, job polling, document management - `frontend/src/features/maintenance/CLAUDE.md` -- extraction review flow, batch schedule creation, subtype management ### Verification - All files from M0-M7 have CLAUDE.md entries with WHAT and WHEN columns - Architecture diagrams verified against actual implementation - All documentation uses timeless present tense (no temporal contamination) - README.md diagrams match code paths in ocr.controller.ts, useReceiptOcr.ts, useManualExtraction.ts ### Commit `ab0d846` docs: update CLAUDE.md indexes and README for OCR expansion (refs #137) *Verdict*: PASS | *Next*: QR post-implementation review
egullickson added
status
review
and removed
status
in-progress
labels 2026-02-11 21:28:18 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: egullickson/motovaultpro#129