feat: OCR-powered smart capture for VIN, receipts, and owner's manuals #12
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Overview
Enhance existing features with OCR-powered smart capture capabilities. This is not a standalone feature - it adds camera/image input as an alternative data entry method to existing record creation flows.
User Experience Model: Expensify-style smart scanning with auto-field detection, quick capture → queue → review workflow.
MVP Scope (Phase 1)
Priority 1: VIN from Photos
Camera → OCR VIN → NHTSA API → Decode → Update vehicle recordPriority 2: Maintenance Schedules from Owner's Manuals
Priority 3: Fuel Receipts
Camera → OCR → Review/correct fields → Create fuel log entryPhase 2
User Experience Requirements
Expensify-Style Flow
Mobile-First Design
Confidence Handling
Technical Architecture
Processing Model
Supported Input Formats
Data Flow Integration
vehiclesfuel-logsmaintenancedocumentsmaintenanceTechnical Reference
See
docs/ocr-pipeline-tech-stack.mdfor detailed architecture including:Acceptance Criteria
Phase 1 MVP
Non-Functional
Plan: OCR-Powered Smart Capture Feature
Phase: Planning | Agent: Planner | Status: AWAITING_REVIEW
Executive Summary
This plan breaks Issue #12 into smaller, context-efficient issues and addresses two architectural decisions:
Decision Analysis Results
Decision 1: VIN Camera Bounding Box
Verdict: REVISE - Recommend Option C (post-capture crop) with optional guidance hints
Key Insight: VIN presentations vary significantly (windshield dashboard, door jamb sticker, engine stamp) with different sizes and angles. A rigid bounding box optimized for one type frustrates users capturing others.
Recommendation:
Decision 2: HEIC to PNG Conversion Location
Verdict: REVISE - Strongly recommend Option B (server-side)
Critical Finding: The bandwidth argument for client-side conversion is backwards. HEIC is more efficient to upload than converted JPEG/PNG. Uploading raw HEIC to server is optimal.
Recommendation:
Proposed Issue Breakdown
Break this epic into 8 smaller issues for AI context efficiency:
Foundation Issues (Phase 0)
VIN OCR (Priority 1)
Fuel Receipt OCR (Priority 2)
Owner's Manual (Priority 3 - Async Processing)
Milestone Structure
Technical Architecture
OCR Service (New Container)
Camera Component Architecture
Data Flow
Affected Codebase Areas
docker-compose.yml, newocr/directorybackend/src/core/, new routesVehicleForm.tsx,vehicles.controller.tsFuelLogForm.tsx,fuel-logs.controller.tsmaintenance.service.tsfrontend/src/shared/components/CameraCapture/Acceptance Criteria Mapping
Open Questions for Clarification
VIN Guidance Hints: Should the camera show a translucent aspect-ratio guide (not a strict box) to help users frame VINs, or prefer completely unguided full-frame capture?
Post-Capture Crop: Should users be able to manually crop/adjust the captured image before OCR, or should we rely entirely on server-side smart cropping?
Issue Granularity: Are 8 sub-issues the right level of breakdown, or would you prefer fewer/more?
Priority Order: The plan assumes VIN (P1) → Receipts (P2) → Manuals (P3). Confirm this priority or adjust?
Container Architecture: Is adding a 6th container (mvp-ocr) acceptable, or should OCR be embedded in the existing backend container?
Verdict: AWAITING_REVIEW | Next: Address clarifying questions, then QR plan-completeness review
Plan Update: Decisions Confirmed
Phase: Planning | Agent: Planner | Status: DECISIONS_CONFIRMED
Confirmed Decisions
Finalized Architecture
Camera Component Behavior:
Container Architecture (6 containers):
Verdict: DECISIONS_CONFIRMED | Next: Create 8 sub-issues, then QR plan-completeness review
Plan: Sub-Issues Created
Phase: Planning | Agent: Planner | Status: PLAN_COMPLETE
Sub-Issue Summary
This epic has been broken into 8 manageable sub-issues:
Dependency Graph
Recommended Execution Order
Sprint 1: Foundation
Sprint 2: VIN OCR (P1)
4. #67 - VIN Photo OCR Pipeline
5. #68 - VIN Capture Integration
Sprint 3: Receipt OCR (P2)
6. #69 - Receipt OCR Pipeline
7. #70 - Receipt Capture Integration
Sprint 4: Manual OCR (P3)
8. #71 - Owner's Manual OCR Pipeline
Key Architectural Decisions (Finalized)
Verdict: PLAN_COMPLETE | Next: QR plan-completeness review
This issue (#12) now serves as the epic tracker. Individual sub-issues should be worked in the order above, with each sub-issue following the standard workflow (branch, implement, PR, merge).