feat: Expand OCR with fuel receipt scanning and owners manual maintenance extraction #129
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Expand the OCR functionality with two new scanning capabilities that leverage the existing OCR pipeline and Google cloud services:
Google Vision API 1,000 calls/month limit established in #127.
Feature 1: Fuel Receipt OCR Scanning
Description
When adding a fuel log, the user can take a photo of their fuel receipt (mirroring the existing VIN OCR decode UX pattern). The OCR extracts fields from the receipt image and pre-fills the fuel log form with editable values.
Requirements
TEXT_DETECTION) is the correct engine for single-image receipt scanning (scene text, not structured documents)fuelLog.receiptScantoFEATURE_TIERS)Station Matching Flow
Technical Notes
POST /api/ocr/extract) since receipts are single images (1-3 seconds)documentType: 'receipt'andextractedFieldsFeature 2: Owners Manual Maintenance Schedule Extraction
Description
When uploading an owners manual document, a checkbox option "Scan for Maintenance Schedule" triggers a Gemini AI scan of the entire manual to extract routine maintenance items and their intervals. Extracted items are presented for user review before creating maintenance schedules.
Engine: Gemini 2.5 Flash on Vertex AI
Gemini is the right choice over Document AI for this use case because:
responseMimeType: 'application/json'withresponseSchemaenforcementGemini Prompt
Gemini Response Schema (enforced via
responseSchema)Example Gemini Response
GCP Setup Instructions
1. Enable the Vertex AI API
2. Service Account Permissions
The existing service account (used for Google Vision in #127) needs one additional IAM role:
roles/aiplatform.useraiplatform.endpoints.predictpermissionIf using Workload Identity Federation (WIF) from #127, the same federated identity gets the additional role -- no new service account needed.
3. SDK Dependency
4. Environment Variables
VERTEX_AI_PROJECTVERTEX_AI_LOCATIONus-central1GEMINI_MODELgemini-2.5-flash5. Authentication
Uses the same credential path as Google Vision (#127):
GOOGLE_APPLICATION_CREDENTIALSenv var pointing to service account key JSONRequirements
POST /api/ocr/jobs-> pollGET /api/ocr/jobs/:jobId)gs://URImaintenance_schedules. User can edit any field before confirming.maintenance_schedules(recurring schedules withinterval_months/interval_miles) -- NOT one-time recordsdocument.scanMaintenanceScheduleinFEATURE_TIERS)Technical Notes
serviceNamevalues to existing maintenance categories/subtypes (routine_maintenancewith appropriate subtypes from the 27 available)Shared Concerns
Mobile + Desktop
Acceptance Criteria
maintenance_scheduleswith correct intervalsroles/aiplatform.userrolePlan: Expand OCR with Fuel Receipt Scanning and Owners Manual Maintenance Extraction
Phase: Planning | Agent: Planner | Status: APPROVED (revised per review cycle)
Overview
Expand OCR functionality with two new capabilities: (1) Fuel receipt scanning that auto-extracts fields and pre-fills the fuel log form, and (2) Owners manual maintenance schedule extraction via Gemini 2.5 Flash that creates recurring maintenance schedules. Manual extraction uses Gemini 2.5 Flash for PDF processing and dedicated frontend components for schedule review and creation.
Planning Context
Decision Log
Rejected Alternatives
Constraints and Assumptions
roles/aiplatform.userKnown Risks
Invisible Knowledge
Architecture
Data Flow
Why This Structure
Invariants
Tradeoffs
Sub-Issues
Milestone Dependencies
Feature 1 (#139-#141) and Feature 2 (#142-#145) can proceed in parallel after M0 completes.
Verdict: APPROVED | Next: Create branch issue-129-expand-ocr, begin execution at M0
QR Review: Plan Completeness (#129)
Phase: Plan-Review | Agent: Quality Reviewer | Status: PASS_WITH_CONCERNS
VERDICT: PASS_WITH_CONCERNS
The plan is structurally complete with comprehensive Decision Log, Rejected Alternatives, Constraints, Risks, Invisible Knowledge, and 8 well-defined milestones. However, several concerns require attention before execution begins.
Findings
[DECISION_LOG] [SHOULD_FIX]: Missing decision about useReceiptOcr endpoint call location
frontend/src/features/fuel-logs/hooks/useReceiptOcr.ts:140calls/ocr/extract, not/ocr/extract/receipt. However, the Decision Log doesn't explain WHY the current code is wrong or WHY switching endpoints is an architectural decision vs. implementation detail./extract/receiptendpoint already exists (verified inocr/app/routers/extract.py:176), then calling it is just fixing a bug.[DECISION_LOG] [SHOULD_FIX]: "Backend receipt endpoint" decision is misleading
/extract/receiptALREADY EXISTS (verified atocr/app/routers/extract.py:176-267). Decision implies backend is creating new endpoint, but M1 creates the PROXY endpoint in Node.js backend.[CONSTRAINTS] [CRITICAL]: Missing sub-issue creation requirement
[MILESTONES] [SHOULD_FIX]: M1 test backing should be "integration" not "default-derived"
[MILESTONES] [SHOULD_FIX]: M2 and M7 missing viewport test specifications
[MILESTONES] [NEEDS_CLARIFICATION]: M4 Gemini WIF authentication config incomplete
ocr/app/config.pyshowsgoogle_vision_key_pathbut no Gemini-specific settings.GOOGLE_APPLICATION_CREDENTIALSenv var (same as Vision API). Vertex AI SDK ADC supports external_account (WIF) credential type - no code changes needed, just environment setup."[MILESTONES] [SHOULD_FIX]: M5 "Remove unused imports" is implementation detail
[MILESTONES] [NEEDS_CLARIFICATION]: M6 tier guard implementation unclear
backend/src/features/ocr/api/ocr.routes.tshas NO tier guards on current routes.preHandler: [requireAuth, requireTier('document.scanMaintenanceSchedule')]."[INVISIBLE_KNOWLEDGE] [SHOULD_FIX]: Station matching flow ambiguous
Considered But Not Flagged
ocr/app/engines/hybrid_engine.pyimplements this pattern. Correct.Verdict: PASS_WITH_CONCERNS | Next: TW plan-scrub
TW Review: Plan Scrub (#129)
Phase: Plan-Review | Agent: Technical Writer | Status: NEEDS_CHANGES
VERDICT: NEEDS_CHANGES
Findings
[TEMPORAL] Overview: Progress estimate and missing-component language
[TEMPORAL] Overview: "Rewrite" language for manual extraction
[FORBIDDEN] Decision Log (Gemini standalone): Editorial language
[TEMPORAL] Decision Log (useReceiptOcr endpoint): Change-relative description
[TEMPORAL] Decision Log (HybridEngine fallback): Baseline reference
[FORBIDDEN] Decision Log (Gemini interface): Intensifier language
[TEMPORAL] Milestone 2 Requirements: Change-relative endpoint instruction
useReceiptOcr.tsextractReceiptFromImage()to call/ocr/extract/receiptinstead of/ocr/extract"useReceiptOcr.tsextractReceiptFromImage()calls/ocr/extract/receiptwith optionalreceipt_type=fuelform field"[TEMPORAL] Milestone 5 Requirements: "Rewrite" directive
ManualExtractor.extract()to use GeminiEngine instead of traditional OCR pipeline"ManualExtractor.extract()delegates to GeminiEngine for PDF processing and structured maintenance data extraction"[TEMPORAL] Milestone 5 Requirements: "Remove" directive with temporal anchor
manual_extractor.pyhas no dependencies ontable_extraction,patterns, or layout analysis modules"[TEMPORAL] Milestone 5 Acceptance Criteria: "No longer called" baseline reference
[TEMPORAL] Milestone 7 Requirements: "Remove" change action
[TEMPORAL] Milestone 5 Requirements: Vague conditional directive
process_manual_job()in extract.py router if needed"[TEMPORAL] Milestone 5 Acceptance Criteria: "Existing" and "new" comparison
[TEMPORAL] Milestone 6 Requirements: "Reuse existing" and "already returns"
GET /api/ocr/jobs/:jobIdfor manual job polling (already returns ManualJobResponse)"GET /api/ocr/jobs/:jobIdhandles manual job polling and returns ManualJobResponse"Considered But Not Flagged
/extract/receiptendpoint existence claims: Verified. Plan correctly identifies backend proxy as the gap.MANUAL_JOB_TTLvalue claim ("7200s"): Verified atjob_queue.py. Correct value.Verdict: NEEDS_CHANGES | Next: QR plan-code (after temporal fixes applied)
QR Review: Plan Code (#129)
Phase: Plan-Review | Agent: Quality Reviewer | Status: NEEDS_CHANGES
VERDICT: NEEDS_CHANGES
Findings
[RULE 0] [CRITICAL]: Missing file size validation server-side in M4 Gemini engine
[RULE 0] [CRITICAL]: M6 backend manual proxy endpoint lacks PDF content validation
SUPPORTED_TYPES.has(contentType). A malicious user could send a 200MB zip file renamed to .pdf, bypassing frontend checks.%PDF. Reject invalid files with 400/415 before forwarding."[RULE 0] [HIGH]: Missing error handling for Gemini WIF authentication failures in M4
[RULE 0] [HIGH]: M3 station matching has no timeout specified
[RULE 0] [HIGH]: M1 receipt proxy endpoint missing error code translation
/extract/receiptcan return 422 "Failed to extract data from receipt image". Plan does not specify if backend proxy forwards 422 or translates.[RULE 1] [HIGH]: M2 tier gating is frontend-only -- missing backend enforcement
useTierAccess)" but does NOT add tier guard to backend route in M1. Frontend-only tier check can be bypassed via direct API call.POST /api/ocr/extract/receiptroute withpreHandler: [requireAuth, requireTier('fuelLog.receiptScan')]. Backend returns 403 TIER_REQUIRED for free users." Keep M2 frontend check for UX.[RULE 1] [HIGH]: M6 tier guard pattern underspecified
requireTier()middleware if not exists. Apply to route:preHandler: [requireAuth, requireTier('document.scanMaintenanceSchedule')]. Follow existing auth middleware pattern."[RULE 1] [HIGH]: M4 missing snake_case / camelCase mapping requirement
serviceName,intervalMiles(camelCase). Project convention requires snake_case in Python, camelCase in TypeScript API responses. Plan does not specify how field naming convention is handled across Python -> backend proxy -> frontend.[RULE 1] [HIGH]: M7 mobile conformance underspecified
[RULE 0] [HIGH]: M5 progress callback may not work during Gemini API call
progress_callback(percent, message)synchronously. M5 delegates to GeminiEngine which makes a single blocking API call (10-60s). Progress callback cannot fire mid-extraction.[RULE 0] [HIGH]: M6 async job polling lacks timeout/expiry handling
[RULE 2] [SHOULD_FIX]: OcrClient/OcrService/OcrController approaching god object thresholds
Considered But Not Flagged
Verdict: NEEDS_CHANGES | Next: QR plan-docs
QR Review: Plan Docs (#129)
Phase: Plan-Review | Agent: Quality Reviewer | Status: NEEDS_CHANGES
VERDICT: NEEDS_CHANGES
The plan's Milestone 8 (Documentation) is structurally present but has critical gaps and inaccuracies. The Invisible Knowledge section contains valuable architecture documentation but also exhibits temporal contamination that survived the TW review. Several documentation files listed do not exist in the codebase, and M8 omits critical files that will be created/modified.
Findings
[RULE 1] [CRITICAL]: M8 lists non-existent CLAUDE.md files
ocr/app/engines/CLAUDE.md,frontend/src/features/maintenance/CLAUDE.md, andfrontend/src/features/documents/CLAUDE.mdfor update. Verified via filesystem: NONE of these files exist.(NEW)marker, OR (2) remove them and update parent CLAUDE.md files instead (ocr/app/CLAUDE.md,frontend/src/features/CLAUDE.md).[RULE 1] [CRITICAL]: M8 missing critical backend files that will be modified
backend/src/features/ocr/*but M8 only listsbackend/src/features/ocr/CLAUDE.mdandbackend/src/features/ocr/README.md. The currentbackend/src/features/ocr/CLAUDE.mdhas NO entries for theapi/,domain/, orexternal/subdirectories.backend/src/features/ocr/CLAUDE.mdfor:api/ocr.controller.ts(request handlers),api/ocr.routes.ts(route registration),domain/ocr.service.ts(business logic),domain/ocr.types.ts(TypeScript types),external/ocr-client.ts(HTTP client to Python service)."[RULE 1] [HIGH]: M8 missing entries for new Python files
ocr/app/engines/gemini_engine.py(NEW), M5 rewritesocr/app/extractors/manual_extractor.py. M8 listsocr/app/CLAUDE.mdbut does not specify adding entries for these files. Currentocr/app/CLAUDE.mdlists subdirectories but NOT individual files.ocr/app/CLAUDE.md:engines/gemini_engine.py(WHAT: Gemini 2.5 Flash integration for maintenance extraction, WHEN: Manual extraction debugging)."[RULE 1] [HIGH]: M8 missing documentation for new frontend hooks
useManualExtraction.tsanduseCreateSchedulesFromExtraction.ts. Neither frontend feature directory has a CLAUDE.md file currently.frontend/src/features/CLAUDE.mdwith expanded entries for documents/ and maintenance/ subdirectories.[RULE 1] [HIGH]: M8 missing documentation for tier gating changes
backend/src/core/config/feature-tiers.ts(core config change). M8 does not list any core/ files for documentation updates.backend/src/core/CLAUDE.mdentry forconfig/feature-tiers.tsto reflectfuelLog.receiptScanaddition."[RULE 1] [HIGH]: M8 does not specify README.md sections to add
backend/src/features/ocr/README.md. Requirements don't specify which sections to add.backend/src/features/ocr/README.mdwith: (1) Receipt OCR Flow section with architecture diagram, (2) Manual Extraction Flow section with Gemini integration. Update API Endpoints table to include POST /extract/receipt and POST /extract/manual."[RULE 0] [HIGH]: M8 source material reference is circular
[RULE 2] [SHOULD_FIX]: Invisible Knowledge "Why This Structure" has temporal contamination
[RULE 2] [SHOULD_FIX]: M8 Acceptance Criteria use action verbs instead of state descriptions
[RULE 2] [SHOULD_FIX]: M8 lacks verification strategy
Considered But Not Flagged
Plan Review Cycle Summary
Overall recommendation: REVISION NEEDED before execution. All four reviews found actionable issues:
The plan foundation is solid (comprehensive Decision Log, well-structured milestones, valuable Invisible Knowledge). Recommend:
Verdict: NEEDS_CHANGES | Next: Address all review findings, update plan, proceed to execution
Plan Revision Decisions (#129)
Phase: Plan-Review | Agent: Planner | Status: IN_PROGRESS
Decisions from Review Findings
Straightforward Fixes (No Decision Needed)
Next: Revise plan comment with all decisions and fixes applied
Milestone: M8 Documentation (#137)
Phase: Execution | Agent: Developer | Status: PASS
Summary
Updated CLAUDE.md indexes and README.md files across all three layers (backend, Python OCR service, frontend) to document the OCR expansion implemented in M0-M7.
Changes
Updated files (7):
backend/src/core/CLAUDE.md-- added tier guard plugin referencebackend/src/features/CLAUDE.md-- expanded OCR feature descriptionbackend/src/features/ocr/CLAUDE.md-- added entries for all api/, domain/, external/, tests/ filesbackend/src/features/ocr/README.md-- added Receipt OCR Flow and Manual Extraction Flow architecture diagrams, expanded API endpoint table with receipt and manual endpoints, added response types and error handling documentationfrontend/src/features/CLAUDE.md-- expanded documents/, fuel-logs/, maintenance/ descriptionsocr/CLAUDE.md-- added Gemini referenceocr/app/CLAUDE.md-- expanded subdirectory descriptions with Gemini and extraction detailsCreated files (4):
ocr/app/engines/CLAUDE.md-- engine layer documentation: OcrEngine subclasses vs standalone GeminiEngine, engine factory, engine selection diagramfrontend/src/features/fuel-logs/CLAUDE.md-- receipt OCR flow, key hooks and components, camera-to-form pipelinefrontend/src/features/documents/CLAUDE.md-- manual extraction flow, job polling, document managementfrontend/src/features/maintenance/CLAUDE.md-- extraction review flow, batch schedule creation, subtype managementVerification
Commit
ab0d846docs: update CLAUDE.md indexes and README for OCR expansion (refs #137)Verdict: PASS | Next: QR post-implementation review