feat: OCR-powered smart capture for VIN, receipts, and owner's manuals #12

Closed
opened 2026-01-04 03:08:33 +00:00 by egullickson · 3 comments
Owner

Overview

Enhance existing features with OCR-powered smart capture capabilities. This is not a standalone feature - it adds camera/image input as an alternative data entry method to existing record creation flows.

User Experience Model: Expensify-style smart scanning with auto-field detection, quick capture → queue → review workflow.

MVP Scope (Phase 1)

Priority 1: VIN from Photos

  • Direct in-app camera capture of VIN plate/sticker
  • OCR extracts VIN → sends to existing NHTSA decoder → existing decode logic runs
  • Integration point: Vehicle create/edit dialog gets camera button
  • Flow: Camera → OCR VIN → NHTSA API → Decode → Update vehicle record

Priority 2: Maintenance Schedules from Owner's Manuals

  • Upload PDF owner's manuals (large files, 10-200MB)
  • Extract maintenance schedule tables (mileage intervals, service types, fluid specs)
  • Async processing with job polling (large file handling)
  • Integration point: Creates maintenance schedule entries for vehicle

Priority 3: Fuel Receipts

  • Camera capture of fuel receipts
  • Extract: date, total amount, gallons/liters, price per unit, station name
  • Integration point: Fuel log create dialog gets camera button
  • Flow: Camera → OCR → Review/correct fields → Create fuel log entry

Phase 2

  • Maintenance receipts → Create maintenance records
  • Insurance cards → Document storage (existing feature, new input method)
  • Registration documents → Document storage

User Experience Requirements

Expensify-Style Flow

  1. Smart scan: Auto-detect document type and relevant fields
  2. Quick capture: Tap camera → capture → queue for processing
  3. Review later: User reviews extracted data, corrects errors before saving
  4. Field suggestions: Pre-populate form fields from OCR results

Mobile-First Design

  • Direct in-app camera integration (not file picker)
  • Optimized for one-handed mobile use
  • Support common mobile formats: HEIC, JPG, PNG
  • Camera buttons integrated into existing create record dialogs

Confidence Handling

  • User always reviews and corrects extracted fields before saving
  • Low confidence fields highlighted for attention
  • Original image available for reference during review

Technical Architecture

Processing Model

File Type Processing Response
Small (receipts, VIN photos) Synchronous Immediate JSON response
Large (owner's manuals) Async + polling Job ID, poll for status/results

Supported Input Formats

  • Images: HEIC, JPG, PNG, TIFF, WEBP
  • Documents: PDF (native text + scanned)

Data Flow Integration

OCR Source Target Feature Action
VIN photo vehicles Update vehicle record with decoded VIN data
Fuel receipt fuel-logs Create new fuel log entry
Maintenance receipt maintenance Create new maintenance record
Insurance/Registration documents Store via existing document feature
Owner's manual maintenance Create maintenance schedule entries

Technical Reference

See docs/ocr-pipeline-tech-stack.md for detailed architecture including:

  • System flow diagram
  • Complete tech stack (Tesseract, PaddleOCR, PyMuPDF, etc.)
  • Scaling considerations
  • Error handling flow

Acceptance Criteria

Phase 1 MVP

  • Camera button in vehicle create/edit dialog for VIN capture
  • VIN OCR → NHTSA decode → vehicle record update works end-to-end
  • Camera button in fuel log create dialog
  • Fuel receipt OCR → review screen → fuel log creation works end-to-end
  • Owner's manual PDF upload with async processing
  • Maintenance schedule extraction populates maintenance schedules
  • Mobile-responsive camera capture flow
  • Desktop file upload fallback
  • Review/correct screen before any data is saved
  • Low confidence field highlighting

Non-Functional

  • Small file processing < 3 seconds
  • Large file processing provides progress feedback
  • Graceful degradation when OCR confidence is low
## Overview Enhance existing features with OCR-powered smart capture capabilities. This is **not a standalone feature** - it adds camera/image input as an alternative data entry method to existing record creation flows. **User Experience Model**: Expensify-style smart scanning with auto-field detection, quick capture → queue → review workflow. ## MVP Scope (Phase 1) ### Priority 1: VIN from Photos - Direct in-app camera capture of VIN plate/sticker - OCR extracts VIN → sends to existing NHTSA decoder → existing decode logic runs - Integration point: Vehicle create/edit dialog gets camera button - Flow: `Camera → OCR VIN → NHTSA API → Decode → Update vehicle record` ### Priority 2: Maintenance Schedules from Owner's Manuals - Upload PDF owner's manuals (large files, 10-200MB) - Extract maintenance schedule tables (mileage intervals, service types, fluid specs) - Async processing with job polling (large file handling) - Integration point: Creates maintenance schedule entries for vehicle ### Priority 3: Fuel Receipts - Camera capture of fuel receipts - Extract: date, total amount, gallons/liters, price per unit, station name - Integration point: Fuel log create dialog gets camera button - Flow: `Camera → OCR → Review/correct fields → Create fuel log entry` ## Phase 2 - Maintenance receipts → Create maintenance records - Insurance cards → Document storage (existing feature, new input method) - Registration documents → Document storage ## User Experience Requirements ### Expensify-Style Flow 1. **Smart scan**: Auto-detect document type and relevant fields 2. **Quick capture**: Tap camera → capture → queue for processing 3. **Review later**: User reviews extracted data, corrects errors before saving 4. **Field suggestions**: Pre-populate form fields from OCR results ### Mobile-First Design - Direct in-app camera integration (not file picker) - Optimized for one-handed mobile use - Support common mobile formats: HEIC, JPG, PNG - Camera buttons integrated into existing create record dialogs ### Confidence Handling - User **always reviews and corrects** extracted fields before saving - Low confidence fields highlighted for attention - Original image available for reference during review ## Technical Architecture ### Processing Model | File Type | Processing | Response | |-----------|------------|----------| | Small (receipts, VIN photos) | Synchronous | Immediate JSON response | | Large (owner's manuals) | Async + polling | Job ID, poll for status/results | ### Supported Input Formats - Images: HEIC, JPG, PNG, TIFF, WEBP - Documents: PDF (native text + scanned) ### Data Flow Integration | OCR Source | Target Feature | Action | |------------|----------------|--------| | VIN photo | `vehicles` | Update vehicle record with decoded VIN data | | Fuel receipt | `fuel-logs` | Create new fuel log entry | | Maintenance receipt | `maintenance` | Create new maintenance record | | Insurance/Registration | `documents` | Store via existing document feature | | Owner's manual | `maintenance` | Create maintenance schedule entries | ## Technical Reference See `docs/ocr-pipeline-tech-stack.md` for detailed architecture including: - System flow diagram - Complete tech stack (Tesseract, PaddleOCR, PyMuPDF, etc.) - Scaling considerations - Error handling flow ## Acceptance Criteria ### Phase 1 MVP - [ ] Camera button in vehicle create/edit dialog for VIN capture - [ ] VIN OCR → NHTSA decode → vehicle record update works end-to-end - [ ] Camera button in fuel log create dialog - [ ] Fuel receipt OCR → review screen → fuel log creation works end-to-end - [ ] Owner's manual PDF upload with async processing - [ ] Maintenance schedule extraction populates maintenance schedules - [ ] Mobile-responsive camera capture flow - [ ] Desktop file upload fallback - [ ] Review/correct screen before any data is saved - [ ] Low confidence field highlighting ### Non-Functional - [ ] Small file processing < 3 seconds - [ ] Large file processing provides progress feedback - [ ] Graceful degradation when OCR confidence is low
egullickson added the
status
backlog
type
feature
labels 2026-01-04 03:08:46 +00:00
egullickson added
status
in-progress
and removed
status
backlog
labels 2026-02-01 18:33:40 +00:00
Author
Owner

Plan: OCR-Powered Smart Capture Feature

Phase: Planning | Agent: Planner | Status: AWAITING_REVIEW


Executive Summary

This plan breaks Issue #12 into smaller, context-efficient issues and addresses two architectural decisions:

  1. VIN Bounding Box: Recommend smart guidance hints over rigid bounding box
  2. HEIC Conversion: Recommend server-side conversion (existing pillow-heif)

Decision Analysis Results

Decision 1: VIN Camera Bounding Box

Verdict: REVISE - Recommend Option C (post-capture crop) with optional guidance hints

Finding Status
Bounding boxes help alignment (industry practice) VERIFIED with caveats
OCR accuracy improves 10-30% with preprocessing VERIFIED
VIN physical dimensions vary significantly VERIFIED - dashboard vs door jamb vs sticker
Camera overlay APIs work cross-platform VERIFIED

Key Insight: VIN presentations vary significantly (windshield dashboard, door jamb sticker, engine stamp) with different sizes and angles. A rigid bounding box optimized for one type frustrates users capturing others.

Recommendation:

  • Implement full-frame capture with optional guidance hints (not rigid box)
  • Add post-capture cropping tool for user refinement
  • Leverage server-side smart region detection (aligns with Expensify model)
  • This achieves OCR preprocessing benefits without constraining capture UX

Decision 2: HEIC to PNG Conversion Location

Verdict: REVISE - Strongly recommend Option B (server-side)

Finding Status
iPhones capture HEIC by default VERIFIED
Server pillow-heif converts reliably VERIFIED
Client-side reduces bandwidth FAILED - HEIC is ~50% smaller than JPEG
Browser heic2any library reliable UNCERTAIN - memory issues on large images
Client devices handle conversion UNCERTAIN - OOM risk on older devices

Critical Finding: The bandwidth argument for client-side conversion is backwards. HEIC is more efficient to upload than converted JPEG/PNG. Uploading raw HEIC to server is optimal.

Recommendation:

  • Keep server-side conversion using existing pillow-heif (docs/ocr-pipeline-tech-stack.md)
  • Accept HEIC uploads directly - they're smaller and faster to upload
  • No client-side heic2any library needed (avoids 2MB bundle bloat, memory risks)
  • Backend already has file-type library for magic byte detection

Proposed Issue Breakdown

Break this epic into 8 smaller issues for AI context efficiency:

Foundation Issues (Phase 0)

Issue Title Scope Dependencies
#12a OCR Service Container Setup Add Python OCR container (Tesseract, PaddleOCR, pillow-heif) to docker-compose. Create FastAPI service scaffold with health endpoint. None
#12b Core OCR API Integration Backend Fastify routes to proxy OCR requests. Job queue for async processing. Storage integration for uploaded images. #12a

VIN OCR (Priority 1)

Issue Title Scope Dependencies
#12c Camera Capture Component React camera component using getUserMedia. Full-frame capture with optional guidance hints. File input fallback for desktop. Support HEIC/JPEG/PNG. None
#12d VIN Photo OCR Pipeline OCR service endpoint for VIN extraction. Image preprocessing (deskew, denoise). 17-character VIN pattern matching. Confidence scoring. #12a, #12b
#12e VIN Capture Integration Add camera button to VehicleForm. Wire VIN OCR → NHTSA decode → form population. Review/correct UI before save. Confidence indicators. #12c, #12d

Fuel Receipt OCR (Priority 2)

Issue Title Scope Dependencies
#12f Receipt OCR Pipeline OCR service endpoint for receipt extraction. Extract: date, total, gallons, price/unit, station. Field confidence scoring. #12a, #12b
#12g Receipt Capture Integration Add camera button to FuelLogForm. Wire receipt OCR → form population. Review/correct screen before save. #12c, #12f

Owner's Manual (Priority 3 - Async Processing)

Issue Title Scope Dependencies
#12h Owner's Manual OCR Pipeline Async PDF processing with job polling. Table extraction for maintenance schedules. Pattern matching for intervals, service types, fluids. Create maintenance_schedules entries. #12a, #12b

Milestone Structure

Milestone 1: OCR Infrastructure
├── #12a: OCR Service Container Setup
└── #12b: Core OCR API Integration

Milestone 2: Camera Capture
└── #12c: Camera Capture Component

Milestone 3: VIN OCR (Priority 1)
├── #12d: VIN Photo OCR Pipeline
└── #12e: VIN Capture Integration

Milestone 4: Fuel Receipt OCR (Priority 2)
├── #12f: Receipt OCR Pipeline
└── #12g: Receipt Capture Integration

Milestone 5: Owner's Manual OCR (Priority 3)
└── #12h: Owner's Manual OCR Pipeline

Technical Architecture

OCR Service (New Container)

┌─────────────────────────────────────────────────────────┐
│                    mvp-ocr (Python 3.11)                │
├─────────────────────────────────────────────────────────┤
│  FastAPI REST API                                       │
│  ├── POST /extract/vin      (sync, <3s)                │
│  ├── POST /extract/receipt  (sync, <3s)                │
│  ├── POST /extract/manual   (async, returns job_id)    │
│  └── GET  /jobs/{job_id}    (poll for status)          │
├─────────────────────────────────────────────────────────┤
│  Processing Pipeline                                    │
│  ├── python-magic (format detection)                   │
│  ├── pillow-heif (HEIC conversion) ← Server-side       │
│  ├── OpenCV (preprocessing)                            │
│  ├── Tesseract 5.x (primary OCR)                       │
│  ├── PaddleOCR (fallback for low confidence)           │
│  └── spaCy + regex (pattern extraction)                │
├─────────────────────────────────────────────────────────┤
│  Celery + Redis (async job queue for large files)      │
└─────────────────────────────────────────────────────────┘

Camera Component Architecture

CameraCapture Component
├── Uses getUserMedia API (iOS Safari 11+, Chrome, Firefox)
├── Full-frame capture (no rigid bounding box)
├── Optional guidance hints overlay (VIN aspect ratio reference)
├── Post-capture crop tool for user refinement
├── Accepts: HEIC, JPEG, PNG (all sent to server as-is)
├── File input fallback for desktop/unsupported browsers
└── Mobile-first responsive design

Data Flow

[Mobile Camera] → [HEIC/JPEG] → [Upload to Server] → [OCR Service]
                                      ↓
                              [pillow-heif converts HEIC]
                                      ↓
                              [Preprocessing + OCR]
                                      ↓
                              [Structured JSON Response]
                                      ↓
                              [Review/Correct UI]
                                      ↓
                              [Save to Feature (vehicles/fuel-logs/maintenance)]

Affected Codebase Areas

Area Impact Key Files
Docker Add mvp-ocr container docker-compose.yml, new ocr/ directory
Backend Core OCR proxy routes, job queue backend/src/core/, new routes
Vehicles Feature Camera button, VIN OCR integration VehicleForm.tsx, vehicles.controller.ts
Fuel-Logs Feature Camera button, receipt OCR integration FuelLogForm.tsx, fuel-logs.controller.ts
Maintenance Feature Schedule creation from manual OCR maintenance.service.ts
Frontend Shared Camera capture component New frontend/src/shared/components/CameraCapture/

Acceptance Criteria Mapping

Original Criteria Covered By
Camera button in vehicle create/edit dialog #12e
VIN OCR → NHTSA decode → vehicle record #12d, #12e
Camera button in fuel log create dialog #12g
Fuel receipt OCR → review → fuel log creation #12f, #12g
Owner's manual PDF upload with async processing #12h
Maintenance schedule extraction #12h
Mobile-responsive camera capture #12c
Desktop file upload fallback #12c
Review/correct screen before save #12e, #12g
Low confidence field highlighting #12e, #12g
Small file processing <3 seconds #12d, #12f
Large file processing with progress #12h

Open Questions for Clarification

  1. VIN Guidance Hints: Should the camera show a translucent aspect-ratio guide (not a strict box) to help users frame VINs, or prefer completely unguided full-frame capture?

  2. Post-Capture Crop: Should users be able to manually crop/adjust the captured image before OCR, or should we rely entirely on server-side smart cropping?

  3. Issue Granularity: Are 8 sub-issues the right level of breakdown, or would you prefer fewer/more?

  4. Priority Order: The plan assumes VIN (P1) → Receipts (P2) → Manuals (P3). Confirm this priority or adjust?

  5. Container Architecture: Is adding a 6th container (mvp-ocr) acceptable, or should OCR be embedded in the existing backend container?


Verdict: AWAITING_REVIEW | Next: Address clarifying questions, then QR plan-completeness review

## Plan: OCR-Powered Smart Capture Feature **Phase**: Planning | **Agent**: Planner | **Status**: AWAITING_REVIEW --- ### Executive Summary This plan breaks Issue #12 into smaller, context-efficient issues and addresses two architectural decisions: 1. **VIN Bounding Box**: Recommend smart guidance hints over rigid bounding box 2. **HEIC Conversion**: Recommend server-side conversion (existing pillow-heif) --- ### Decision Analysis Results #### Decision 1: VIN Camera Bounding Box **Verdict: REVISE - Recommend Option C (post-capture crop) with optional guidance hints** | Finding | Status | |---------|--------| | Bounding boxes help alignment (industry practice) | VERIFIED with caveats | | OCR accuracy improves 10-30% with preprocessing | VERIFIED | | VIN physical dimensions vary significantly | VERIFIED - dashboard vs door jamb vs sticker | | Camera overlay APIs work cross-platform | VERIFIED | **Key Insight**: VIN presentations vary significantly (windshield dashboard, door jamb sticker, engine stamp) with different sizes and angles. A rigid bounding box optimized for one type frustrates users capturing others. **Recommendation**: - Implement full-frame capture with optional guidance hints (not rigid box) - Add post-capture cropping tool for user refinement - Leverage server-side smart region detection (aligns with Expensify model) - This achieves OCR preprocessing benefits without constraining capture UX --- #### Decision 2: HEIC to PNG Conversion Location **Verdict: REVISE - Strongly recommend Option B (server-side)** | Finding | Status | |---------|--------| | iPhones capture HEIC by default | VERIFIED | | Server pillow-heif converts reliably | VERIFIED | | Client-side reduces bandwidth | **FAILED** - HEIC is ~50% smaller than JPEG | | Browser heic2any library reliable | UNCERTAIN - memory issues on large images | | Client devices handle conversion | UNCERTAIN - OOM risk on older devices | **Critical Finding**: The bandwidth argument for client-side conversion is **backwards**. HEIC is more efficient to upload than converted JPEG/PNG. Uploading raw HEIC to server is optimal. **Recommendation**: - Keep server-side conversion using existing pillow-heif (docs/ocr-pipeline-tech-stack.md) - Accept HEIC uploads directly - they're smaller and faster to upload - No client-side heic2any library needed (avoids 2MB bundle bloat, memory risks) - Backend already has file-type library for magic byte detection --- ### Proposed Issue Breakdown Break this epic into 8 smaller issues for AI context efficiency: #### Foundation Issues (Phase 0) | Issue | Title | Scope | Dependencies | |-------|-------|-------|--------------| | #12a | **OCR Service Container Setup** | Add Python OCR container (Tesseract, PaddleOCR, pillow-heif) to docker-compose. Create FastAPI service scaffold with health endpoint. | None | | #12b | **Core OCR API Integration** | Backend Fastify routes to proxy OCR requests. Job queue for async processing. Storage integration for uploaded images. | #12a | #### VIN OCR (Priority 1) | Issue | Title | Scope | Dependencies | |-------|-------|-------|--------------| | #12c | **Camera Capture Component** | React camera component using getUserMedia. Full-frame capture with optional guidance hints. File input fallback for desktop. Support HEIC/JPEG/PNG. | None | | #12d | **VIN Photo OCR Pipeline** | OCR service endpoint for VIN extraction. Image preprocessing (deskew, denoise). 17-character VIN pattern matching. Confidence scoring. | #12a, #12b | | #12e | **VIN Capture Integration** | Add camera button to VehicleForm. Wire VIN OCR → NHTSA decode → form population. Review/correct UI before save. Confidence indicators. | #12c, #12d | #### Fuel Receipt OCR (Priority 2) | Issue | Title | Scope | Dependencies | |-------|-------|-------|--------------| | #12f | **Receipt OCR Pipeline** | OCR service endpoint for receipt extraction. Extract: date, total, gallons, price/unit, station. Field confidence scoring. | #12a, #12b | | #12g | **Receipt Capture Integration** | Add camera button to FuelLogForm. Wire receipt OCR → form population. Review/correct screen before save. | #12c, #12f | #### Owner's Manual (Priority 3 - Async Processing) | Issue | Title | Scope | Dependencies | |-------|-------|-------|--------------| | #12h | **Owner's Manual OCR Pipeline** | Async PDF processing with job polling. Table extraction for maintenance schedules. Pattern matching for intervals, service types, fluids. Create maintenance_schedules entries. | #12a, #12b | --- ### Milestone Structure ``` Milestone 1: OCR Infrastructure ├── #12a: OCR Service Container Setup └── #12b: Core OCR API Integration Milestone 2: Camera Capture └── #12c: Camera Capture Component Milestone 3: VIN OCR (Priority 1) ├── #12d: VIN Photo OCR Pipeline └── #12e: VIN Capture Integration Milestone 4: Fuel Receipt OCR (Priority 2) ├── #12f: Receipt OCR Pipeline └── #12g: Receipt Capture Integration Milestone 5: Owner's Manual OCR (Priority 3) └── #12h: Owner's Manual OCR Pipeline ``` --- ### Technical Architecture #### OCR Service (New Container) ``` ┌─────────────────────────────────────────────────────────┐ │ mvp-ocr (Python 3.11) │ ├─────────────────────────────────────────────────────────┤ │ FastAPI REST API │ │ ├── POST /extract/vin (sync, <3s) │ │ ├── POST /extract/receipt (sync, <3s) │ │ ├── POST /extract/manual (async, returns job_id) │ │ └── GET /jobs/{job_id} (poll for status) │ ├─────────────────────────────────────────────────────────┤ │ Processing Pipeline │ │ ├── python-magic (format detection) │ │ ├── pillow-heif (HEIC conversion) ← Server-side │ │ ├── OpenCV (preprocessing) │ │ ├── Tesseract 5.x (primary OCR) │ │ ├── PaddleOCR (fallback for low confidence) │ │ └── spaCy + regex (pattern extraction) │ ├─────────────────────────────────────────────────────────┤ │ Celery + Redis (async job queue for large files) │ └─────────────────────────────────────────────────────────┘ ``` #### Camera Component Architecture ``` CameraCapture Component ├── Uses getUserMedia API (iOS Safari 11+, Chrome, Firefox) ├── Full-frame capture (no rigid bounding box) ├── Optional guidance hints overlay (VIN aspect ratio reference) ├── Post-capture crop tool for user refinement ├── Accepts: HEIC, JPEG, PNG (all sent to server as-is) ├── File input fallback for desktop/unsupported browsers └── Mobile-first responsive design ``` #### Data Flow ``` [Mobile Camera] → [HEIC/JPEG] → [Upload to Server] → [OCR Service] ↓ [pillow-heif converts HEIC] ↓ [Preprocessing + OCR] ↓ [Structured JSON Response] ↓ [Review/Correct UI] ↓ [Save to Feature (vehicles/fuel-logs/maintenance)] ``` --- ### Affected Codebase Areas | Area | Impact | Key Files | |------|--------|-----------| | Docker | Add mvp-ocr container | `docker-compose.yml`, new `ocr/` directory | | Backend Core | OCR proxy routes, job queue | `backend/src/core/`, new routes | | Vehicles Feature | Camera button, VIN OCR integration | `VehicleForm.tsx`, `vehicles.controller.ts` | | Fuel-Logs Feature | Camera button, receipt OCR integration | `FuelLogForm.tsx`, `fuel-logs.controller.ts` | | Maintenance Feature | Schedule creation from manual OCR | `maintenance.service.ts` | | Frontend Shared | Camera capture component | New `frontend/src/shared/components/CameraCapture/` | --- ### Acceptance Criteria Mapping | Original Criteria | Covered By | |-------------------|------------| | Camera button in vehicle create/edit dialog | #12e | | VIN OCR → NHTSA decode → vehicle record | #12d, #12e | | Camera button in fuel log create dialog | #12g | | Fuel receipt OCR → review → fuel log creation | #12f, #12g | | Owner's manual PDF upload with async processing | #12h | | Maintenance schedule extraction | #12h | | Mobile-responsive camera capture | #12c | | Desktop file upload fallback | #12c | | Review/correct screen before save | #12e, #12g | | Low confidence field highlighting | #12e, #12g | | Small file processing <3 seconds | #12d, #12f | | Large file processing with progress | #12h | --- ### Open Questions for Clarification 1. **VIN Guidance Hints**: Should the camera show a translucent aspect-ratio guide (not a strict box) to help users frame VINs, or prefer completely unguided full-frame capture? 2. **Post-Capture Crop**: Should users be able to manually crop/adjust the captured image before OCR, or should we rely entirely on server-side smart cropping? 3. **Issue Granularity**: Are 8 sub-issues the right level of breakdown, or would you prefer fewer/more? 4. **Priority Order**: The plan assumes VIN (P1) → Receipts (P2) → Manuals (P3). Confirm this priority or adjust? 5. **Container Architecture**: Is adding a 6th container (mvp-ocr) acceptable, or should OCR be embedded in the existing backend container? --- *Verdict*: AWAITING_REVIEW | *Next*: Address clarifying questions, then QR plan-completeness review
Author
Owner

Plan Update: Decisions Confirmed

Phase: Planning | Agent: Planner | Status: DECISIONS_CONFIRMED


Confirmed Decisions

Question Decision
VIN Guidance Hints Translucent aspect-ratio guide (not rigid box)
Post-Capture Crop Users can manually crop before OCR
Issue Granularity 8 sub-issues confirmed
Priority Order VIN (P1) → Receipts (P2) → Manuals (P3)
Container Architecture 6th container (mvp-ocr) acceptable

Finalized Architecture

Camera Component Behavior:

1. Open camera with translucent VIN aspect-ratio guide overlay
2. User captures full-frame photo
3. Post-capture crop tool appears (optional adjustment)
4. User confirms → upload HEIC/JPEG as-is to server
5. Server converts HEIC → preprocesses → OCR → returns structured data
6. Review/correct UI with confidence indicators
7. User confirms → save to feature

Container Architecture (6 containers):

┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ mvp-traefik │ │ mvp-frontend│ │ mvp-backend │
└─────────────┘ └─────────────┘ └──────┬──────┘
                                       │
                    ┌──────────────────┼──────────────────┐
                    │                  │                  │
              ┌─────▼─────┐     ┌──────▼──────┐    ┌──────▼──────┐
              │ mvp-postgres│    │  mvp-redis  │    │  mvp-ocr   │
              └───────────┘     └─────────────┘    └─────────────┘

Verdict: DECISIONS_CONFIRMED | Next: Create 8 sub-issues, then QR plan-completeness review

## Plan Update: Decisions Confirmed **Phase**: Planning | **Agent**: Planner | **Status**: DECISIONS_CONFIRMED --- ### Confirmed Decisions | Question | Decision | |----------|----------| | VIN Guidance Hints | Translucent aspect-ratio guide (not rigid box) | | Post-Capture Crop | Users can manually crop before OCR | | Issue Granularity | 8 sub-issues confirmed | | Priority Order | VIN (P1) → Receipts (P2) → Manuals (P3) | | Container Architecture | 6th container (mvp-ocr) acceptable | --- ### Finalized Architecture **Camera Component Behavior:** ``` 1. Open camera with translucent VIN aspect-ratio guide overlay 2. User captures full-frame photo 3. Post-capture crop tool appears (optional adjustment) 4. User confirms → upload HEIC/JPEG as-is to server 5. Server converts HEIC → preprocesses → OCR → returns structured data 6. Review/correct UI with confidence indicators 7. User confirms → save to feature ``` **Container Architecture (6 containers):** ``` ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ mvp-traefik │ │ mvp-frontend│ │ mvp-backend │ └─────────────┘ └─────────────┘ └──────┬──────┘ │ ┌──────────────────┼──────────────────┐ │ │ │ ┌─────▼─────┐ ┌──────▼──────┐ ┌──────▼──────┐ │ mvp-postgres│ │ mvp-redis │ │ mvp-ocr │ └───────────┘ └─────────────┘ └─────────────┘ ``` --- *Verdict*: DECISIONS_CONFIRMED | *Next*: Create 8 sub-issues, then QR plan-completeness review
Author
Owner

Plan: Sub-Issues Created

Phase: Planning | Agent: Planner | Status: PLAN_COMPLETE


Sub-Issue Summary

This epic has been broken into 8 manageable sub-issues:

Issue Title Priority Dependencies
#64 OCR Service Container Setup P0 None
#65 Core OCR API Integration P0 #64
#66 Camera Capture Component P1 None
#67 VIN Photo OCR Pipeline P1 #64, #65
#68 VIN Capture Integration P1 #66, #67
#69 Receipt OCR Pipeline P2 #64, #65
#70 Receipt Capture Integration P2 #66, #69
#71 Owner's Manual OCR Pipeline P3 #64, #65

Dependency Graph

                    ┌──────────────┐
                    │ #64 OCR      │
                    │ Container    │
                    └──────┬───────┘
                           │
                    ┌──────▼───────┐
                    │ #65 Core     │
          ┌─────────┤ OCR API      ├─────────┐
          │         └──────────────┘         │
          │                                  │
    ┌─────▼─────┐                     ┌──────▼──────┐
    │ #67 VIN   │                     │ #69 Receipt │
    │ Pipeline  │                     │ Pipeline    │
    └─────┬─────┘                     └──────┬──────┘
          │                                  │
          │    ┌──────────────┐              │
          │    │ #66 Camera   │              │
          │    │ Component    │              │
          │    └──────┬───────┘              │
          │           │                      │
    ┌─────▼───────────▼───┐          ┌───────▼──────┐
    │ #68 VIN Integration │          │ #70 Receipt  │
    └─────────────────────┘          │ Integration  │
                                     └──────────────┘

    ┌──────────────┐
    │ #71 Owner's  │ (depends on #64, #65)
    │ Manual OCR   │
    └──────────────┘

Sprint 1: Foundation

  1. #64 - OCR Service Container Setup
  2. #65 - Core OCR API Integration
  3. #66 - Camera Capture Component (parallel with #64/#65)

Sprint 2: VIN OCR (P1)
4. #67 - VIN Photo OCR Pipeline
5. #68 - VIN Capture Integration

Sprint 3: Receipt OCR (P2)
6. #69 - Receipt OCR Pipeline
7. #70 - Receipt Capture Integration

Sprint 4: Manual OCR (P3)
8. #71 - Owner's Manual OCR Pipeline


Key Architectural Decisions (Finalized)

Decision Choice Rationale
VIN Camera Guidance Translucent aspect-ratio guide Helps alignment without being rigid
Post-Capture Crop User can manually crop Flexibility for varied VIN placements
HEIC Conversion Server-side (pillow-heif) Smaller upload, proven reliability
Container Architecture 6th container (mvp-ocr) Isolation, Python ecosystem for OCR

Verdict: PLAN_COMPLETE | Next: QR plan-completeness review

This issue (#12) now serves as the epic tracker. Individual sub-issues should be worked in the order above, with each sub-issue following the standard workflow (branch, implement, PR, merge).

## Plan: Sub-Issues Created **Phase**: Planning | **Agent**: Planner | **Status**: PLAN_COMPLETE --- ### Sub-Issue Summary This epic has been broken into 8 manageable sub-issues: | Issue | Title | Priority | Dependencies | |-------|-------|----------|--------------| | #64 | OCR Service Container Setup | P0 | None | | #65 | Core OCR API Integration | P0 | #64 | | #66 | Camera Capture Component | P1 | None | | #67 | VIN Photo OCR Pipeline | P1 | #64, #65 | | #68 | VIN Capture Integration | P1 | #66, #67 | | #69 | Receipt OCR Pipeline | P2 | #64, #65 | | #70 | Receipt Capture Integration | P2 | #66, #69 | | #71 | Owner's Manual OCR Pipeline | P3 | #64, #65 | --- ### Dependency Graph ``` ┌──────────────┐ │ #64 OCR │ │ Container │ └──────┬───────┘ │ ┌──────▼───────┐ │ #65 Core │ ┌─────────┤ OCR API ├─────────┐ │ └──────────────┘ │ │ │ ┌─────▼─────┐ ┌──────▼──────┐ │ #67 VIN │ │ #69 Receipt │ │ Pipeline │ │ Pipeline │ └─────┬─────┘ └──────┬──────┘ │ │ │ ┌──────────────┐ │ │ │ #66 Camera │ │ │ │ Component │ │ │ └──────┬───────┘ │ │ │ │ ┌─────▼───────────▼───┐ ┌───────▼──────┐ │ #68 VIN Integration │ │ #70 Receipt │ └─────────────────────┘ │ Integration │ └──────────────┘ ┌──────────────┐ │ #71 Owner's │ (depends on #64, #65) │ Manual OCR │ └──────────────┘ ``` --- ### Recommended Execution Order **Sprint 1: Foundation** 1. #64 - OCR Service Container Setup 2. #65 - Core OCR API Integration 3. #66 - Camera Capture Component (parallel with #64/#65) **Sprint 2: VIN OCR (P1)** 4. #67 - VIN Photo OCR Pipeline 5. #68 - VIN Capture Integration **Sprint 3: Receipt OCR (P2)** 6. #69 - Receipt OCR Pipeline 7. #70 - Receipt Capture Integration **Sprint 4: Manual OCR (P3)** 8. #71 - Owner's Manual OCR Pipeline --- ### Key Architectural Decisions (Finalized) | Decision | Choice | Rationale | |----------|--------|-----------| | VIN Camera Guidance | Translucent aspect-ratio guide | Helps alignment without being rigid | | Post-Capture Crop | User can manually crop | Flexibility for varied VIN placements | | HEIC Conversion | Server-side (pillow-heif) | Smaller upload, proven reliability | | Container Architecture | 6th container (mvp-ocr) | Isolation, Python ecosystem for OCR | --- *Verdict*: PLAN_COMPLETE | *Next*: QR plan-completeness review This issue (#12) now serves as the **epic tracker**. Individual sub-issues should be worked in the order above, with each sub-issue following the standard workflow (branch, implement, PR, merge).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: egullickson/motovaultpro#12