feat: OCR-powered smart capture for VIN, receipts, and owner's manuals #12

New Issue

egullickson · 2026-01-04T03:08:33Z

egullickson commented

2026-01-04 03:08:33 +00:00

Overview

Enhance existing features with OCR-powered smart capture capabilities. This is not a standalone feature - it adds camera/image input as an alternative data entry method to existing record creation flows.

User Experience Model: Expensify-style smart scanning with auto-field detection, quick capture → queue → review workflow.

MVP Scope (Phase 1)

Priority 1: VIN from Photos

Direct in-app camera capture of VIN plate/sticker
OCR extracts VIN → sends to existing NHTSA decoder → existing decode logic runs
Integration point: Vehicle create/edit dialog gets camera button
Flow: Camera → OCR VIN → NHTSA API → Decode → Update vehicle record

Priority 2: Maintenance Schedules from Owner's Manuals

Upload PDF owner's manuals (large files, 10-200MB)
Extract maintenance schedule tables (mileage intervals, service types, fluid specs)
Async processing with job polling (large file handling)
Integration point: Creates maintenance schedule entries for vehicle

Priority 3: Fuel Receipts

Camera capture of fuel receipts
Extract: date, total amount, gallons/liters, price per unit, station name
Integration point: Fuel log create dialog gets camera button
Flow: Camera → OCR → Review/correct fields → Create fuel log entry

Phase 2

Maintenance receipts → Create maintenance records
Insurance cards → Document storage (existing feature, new input method)
Registration documents → Document storage

User Experience Requirements

Expensify-Style Flow

Smart scan: Auto-detect document type and relevant fields
Quick capture: Tap camera → capture → queue for processing
Review later: User reviews extracted data, corrects errors before saving
Field suggestions: Pre-populate form fields from OCR results

Mobile-First Design

Direct in-app camera integration (not file picker)
Optimized for one-handed mobile use
Support common mobile formats: HEIC, JPG, PNG
Camera buttons integrated into existing create record dialogs

Confidence Handling

User always reviews and corrects extracted fields before saving
Low confidence fields highlighted for attention
Original image available for reference during review

Technical Architecture

Processing Model

File Type	Processing	Response
Small (receipts, VIN photos)	Synchronous	Immediate JSON response
Large (owner's manuals)	Async + polling	Job ID, poll for status/results

Supported Input Formats

Images: HEIC, JPG, PNG, TIFF, WEBP
Documents: PDF (native text + scanned)

Data Flow Integration

OCR Source	Target Feature	Action
VIN photo	`vehicles`	Update vehicle record with decoded VIN data
Fuel receipt	`fuel-logs`	Create new fuel log entry
Maintenance receipt	`maintenance`	Create new maintenance record
Insurance/Registration	`documents`	Store via existing document feature
Owner's manual	`maintenance`	Create maintenance schedule entries

Technical Reference

See docs/ocr-pipeline-tech-stack.md for detailed architecture including:

System flow diagram
Complete tech stack (Tesseract, PaddleOCR, PyMuPDF, etc.)
Scaling considerations
Error handling flow

Acceptance Criteria

Phase 1 MVP

Camera button in vehicle create/edit dialog for VIN capture
VIN OCR → NHTSA decode → vehicle record update works end-to-end
Camera button in fuel log create dialog
Fuel receipt OCR → review screen → fuel log creation works end-to-end
Owner's manual PDF upload with async processing
Maintenance schedule extraction populates maintenance schedules
Mobile-responsive camera capture flow
Desktop file upload fallback
Review/correct screen before any data is saved
Low confidence field highlighting

Non-Functional

Small file processing < 3 seconds
Large file processing provides progress feedback
Graceful degradation when OCR confidence is low

## Overview Enhance existing features with OCR-powered smart capture capabilities. This is **not a standalone feature** - it adds camera/image input as an alternative data entry method to existing record creation flows. **User Experience Model**: Expensify-style smart scanning with auto-field detection, quick capture → queue → review workflow. ## MVP Scope (Phase 1) ### Priority 1: VIN from Photos - Direct in-app camera capture of VIN plate/sticker - OCR extracts VIN → sends to existing NHTSA decoder → existing decode logic runs - Integration point: Vehicle create/edit dialog gets camera button - Flow: `Camera → OCR VIN → NHTSA API → Decode → Update vehicle record` ### Priority 2: Maintenance Schedules from Owner's Manuals - Upload PDF owner's manuals (large files, 10-200MB) - Extract maintenance schedule tables (mileage intervals, service types, fluid specs) - Async processing with job polling (large file handling) - Integration point: Creates maintenance schedule entries for vehicle ### Priority 3: Fuel Receipts - Camera capture of fuel receipts - Extract: date, total amount, gallons/liters, price per unit, station name - Integration point: Fuel log create dialog gets camera button - Flow: `Camera → OCR → Review/correct fields → Create fuel log entry` ## Phase 2 - Maintenance receipts → Create maintenance records - Insurance cards → Document storage (existing feature, new input method) - Registration documents → Document storage ## User Experience Requirements ### Expensify-Style Flow 1. **Smart scan**: Auto-detect document type and relevant fields 2. **Quick capture**: Tap camera → capture → queue for processing 3. **Review later**: User reviews extracted data, corrects errors before saving 4. **Field suggestions**: Pre-populate form fields from OCR results ### Mobile-First Design - Direct in-app camera integration (not file picker) - Optimized for one-handed mobile use - Support common mobile formats: HEIC, JPG, PNG - Camera buttons integrated into existing create record dialogs ### Confidence Handling - User **always reviews and corrects** extracted fields before saving - Low confidence fields highlighted for attention - Original image available for reference during review ## Technical Architecture ### Processing Model | File Type | Processing | Response | |-----------|------------|----------| | Small (receipts, VIN photos) | Synchronous | Immediate JSON response | | Large (owner's manuals) | Async + polling | Job ID, poll for status/results | ### Supported Input Formats - Images: HEIC, JPG, PNG, TIFF, WEBP - Documents: PDF (native text + scanned) ### Data Flow Integration | OCR Source | Target Feature | Action | |------------|----------------|--------| | VIN photo | `vehicles` | Update vehicle record with decoded VIN data | | Fuel receipt | `fuel-logs` | Create new fuel log entry | | Maintenance receipt | `maintenance` | Create new maintenance record | | Insurance/Registration | `documents` | Store via existing document feature | | Owner's manual | `maintenance` | Create maintenance schedule entries | ## Technical Reference See `docs/ocr-pipeline-tech-stack.md` for detailed architecture including: - System flow diagram - Complete tech stack (Tesseract, PaddleOCR, PyMuPDF, etc.) - Scaling considerations - Error handling flow ## Acceptance Criteria ### Phase 1 MVP - [ ] Camera button in vehicle create/edit dialog for VIN capture - [ ] VIN OCR → NHTSA decode → vehicle record update works end-to-end - [ ] Camera button in fuel log create dialog - [ ] Fuel receipt OCR → review screen → fuel log creation works end-to-end - [ ] Owner's manual PDF upload with async processing - [ ] Maintenance schedule extraction populates maintenance schedules - [ ] Mobile-responsive camera capture flow - [ ] Desktop file upload fallback - [ ] Review/correct screen before any data is saved - [ ] Low confidence field highlighting ### Non-Functional - [ ] Small file processing < 3 seconds - [ ] Large file processing provides progress feedback - [ ] Graceful degradation when OCR confidence is low

egullickson added the

labels 2026-01-04 03:08:46 +00:00

egullickson added

and removed

labels 2026-02-01 18:33:40 +00:00

egullickson commented

2026-02-01 18:39:54 +00:00

Plan: OCR-Powered Smart Capture Feature

Phase: Planning | Agent: Planner | Status: AWAITING_REVIEW

Executive Summary

This plan breaks Issue #12 into smaller, context-efficient issues and addresses two architectural decisions:

VIN Bounding Box: Recommend smart guidance hints over rigid bounding box
HEIC Conversion: Recommend server-side conversion (existing pillow-heif)

Decision Analysis Results

Decision 1: VIN Camera Bounding Box

Verdict: REVISE - Recommend Option C (post-capture crop) with optional guidance hints

Finding	Status
Bounding boxes help alignment (industry practice)	VERIFIED with caveats
OCR accuracy improves 10-30% with preprocessing	VERIFIED
VIN physical dimensions vary significantly	VERIFIED - dashboard vs door jamb vs sticker
Camera overlay APIs work cross-platform	VERIFIED

Key Insight: VIN presentations vary significantly (windshield dashboard, door jamb sticker, engine stamp) with different sizes and angles. A rigid bounding box optimized for one type frustrates users capturing others.

Recommendation:

Implement full-frame capture with optional guidance hints (not rigid box)
Add post-capture cropping tool for user refinement
Leverage server-side smart region detection (aligns with Expensify model)
This achieves OCR preprocessing benefits without constraining capture UX

Decision 2: HEIC to PNG Conversion Location

Verdict: REVISE - Strongly recommend Option B (server-side)

Finding	Status
iPhones capture HEIC by default	VERIFIED
Server pillow-heif converts reliably	VERIFIED
Client-side reduces bandwidth	FAILED - HEIC is ~50% smaller than JPEG
Browser heic2any library reliable	UNCERTAIN - memory issues on large images
Client devices handle conversion	UNCERTAIN - OOM risk on older devices

Critical Finding: The bandwidth argument for client-side conversion is backwards. HEIC is more efficient to upload than converted JPEG/PNG. Uploading raw HEIC to server is optimal.

Recommendation:

Keep server-side conversion using existing pillow-heif (docs/ocr-pipeline-tech-stack.md)
Accept HEIC uploads directly - they're smaller and faster to upload
No client-side heic2any library needed (avoids 2MB bundle bloat, memory risks)
Backend already has file-type library for magic byte detection

Proposed Issue Breakdown

Break this epic into 8 smaller issues for AI context efficiency:

Foundation Issues (Phase 0)

Issue	Title	Scope	Dependencies
#12a	OCR Service Container Setup	Add Python OCR container (Tesseract, PaddleOCR, pillow-heif) to docker-compose. Create FastAPI service scaffold with health endpoint.	None
#12b	Core OCR API Integration	Backend Fastify routes to proxy OCR requests. Job queue for async processing. Storage integration for uploaded images.	#12a

VIN OCR (Priority 1)

Issue	Title	Scope	Dependencies
#12c	Camera Capture Component	React camera component using getUserMedia. Full-frame capture with optional guidance hints. File input fallback for desktop. Support HEIC/JPEG/PNG.	None
#12d	VIN Photo OCR Pipeline	OCR service endpoint for VIN extraction. Image preprocessing (deskew, denoise). 17-character VIN pattern matching. Confidence scoring.	#12a, #12b
#12e	VIN Capture Integration	Add camera button to VehicleForm. Wire VIN OCR → NHTSA decode → form population. Review/correct UI before save. Confidence indicators.	#12c, #12d

Fuel Receipt OCR (Priority 2)

Issue	Title	Scope	Dependencies
#12f	Receipt OCR Pipeline	OCR service endpoint for receipt extraction. Extract: date, total, gallons, price/unit, station. Field confidence scoring.	#12a, #12b
#12g	Receipt Capture Integration	Add camera button to FuelLogForm. Wire receipt OCR → form population. Review/correct screen before save.	#12c, #12f

Owner's Manual (Priority 3 - Async Processing)

Issue	Title	Scope	Dependencies
#12h	Owner's Manual OCR Pipeline	Async PDF processing with job polling. Table extraction for maintenance schedules. Pattern matching for intervals, service types, fluids. Create maintenance_schedules entries.	#12a, #12b

Milestone Structure

Milestone 1: OCR Infrastructure
├── #12a: OCR Service Container Setup
└── #12b: Core OCR API Integration

Milestone 2: Camera Capture
└── #12c: Camera Capture Component

Milestone 3: VIN OCR (Priority 1)
├── #12d: VIN Photo OCR Pipeline
└── #12e: VIN Capture Integration

Milestone 4: Fuel Receipt OCR (Priority 2)
├── #12f: Receipt OCR Pipeline
└── #12g: Receipt Capture Integration

Milestone 5: Owner's Manual OCR (Priority 3)
└── #12h: Owner's Manual OCR Pipeline

Technical Architecture

OCR Service (New Container)

┌─────────────────────────────────────────────────────────┐
│                    mvp-ocr (Python 3.11)                │
├─────────────────────────────────────────────────────────┤
│  FastAPI REST API                                       │
│  ├── POST /extract/vin      (sync, <3s)                │
│  ├── POST /extract/receipt  (sync, <3s)                │
│  ├── POST /extract/manual   (async, returns job_id)    │
│  └── GET  /jobs/{job_id}    (poll for status)          │
├─────────────────────────────────────────────────────────┤
│  Processing Pipeline                                    │
│  ├── python-magic (format detection)                   │
│  ├── pillow-heif (HEIC conversion) ← Server-side       │
│  ├── OpenCV (preprocessing)                            │
│  ├── Tesseract 5.x (primary OCR)                       │
│  ├── PaddleOCR (fallback for low confidence)           │
│  └── spaCy + regex (pattern extraction)                │
├─────────────────────────────────────────────────────────┤
│  Celery + Redis (async job queue for large files)      │
└─────────────────────────────────────────────────────────┘

Camera Component Architecture

CameraCapture Component
├── Uses getUserMedia API (iOS Safari 11+, Chrome, Firefox)
├── Full-frame capture (no rigid bounding box)
├── Optional guidance hints overlay (VIN aspect ratio reference)
├── Post-capture crop tool for user refinement
├── Accepts: HEIC, JPEG, PNG (all sent to server as-is)
├── File input fallback for desktop/unsupported browsers
└── Mobile-first responsive design

Data Flow

[Mobile Camera] → [HEIC/JPEG] → [Upload to Server] → [OCR Service]
                                      ↓
                              [pillow-heif converts HEIC]
                                      ↓
                              [Preprocessing + OCR]
                                      ↓
                              [Structured JSON Response]
                                      ↓
                              [Review/Correct UI]
                                      ↓
                              [Save to Feature (vehicles/fuel-logs/maintenance)]

Affected Codebase Areas

Area	Impact	Key Files
Docker	Add mvp-ocr container	`docker-compose.yml`, new `ocr/` directory
Backend Core	OCR proxy routes, job queue	`backend/src/core/`, new routes
Vehicles Feature	Camera button, VIN OCR integration	`VehicleForm.tsx`, `vehicles.controller.ts`
Fuel-Logs Feature	Camera button, receipt OCR integration	`FuelLogForm.tsx`, `fuel-logs.controller.ts`
Maintenance Feature	Schedule creation from manual OCR	`maintenance.service.ts`
Frontend Shared	Camera capture component	New `frontend/src/shared/components/CameraCapture/`

Acceptance Criteria Mapping

Original Criteria	Covered By
Camera button in vehicle create/edit dialog	#12e
VIN OCR → NHTSA decode → vehicle record	#12d, #12e
Camera button in fuel log create dialog	#12g
Fuel receipt OCR → review → fuel log creation	#12f, #12g
Owner's manual PDF upload with async processing	#12h
Maintenance schedule extraction	#12h
Mobile-responsive camera capture	#12c
Desktop file upload fallback	#12c
Review/correct screen before save	#12e, #12g
Low confidence field highlighting	#12e, #12g
Small file processing <3 seconds	#12d, #12f
Large file processing with progress	#12h

Open Questions for Clarification

VIN Guidance Hints: Should the camera show a translucent aspect-ratio guide (not a strict box) to help users frame VINs, or prefer completely unguided full-frame capture?
Post-Capture Crop: Should users be able to manually crop/adjust the captured image before OCR, or should we rely entirely on server-side smart cropping?
Issue Granularity: Are 8 sub-issues the right level of breakdown, or would you prefer fewer/more?
Priority Order: The plan assumes VIN (P1) → Receipts (P2) → Manuals (P3). Confirm this priority or adjust?
Container Architecture: Is adding a 6th container (mvp-ocr) acceptable, or should OCR be embedded in the existing backend container?

Verdict: AWAITING_REVIEW | Next: Address clarifying questions, then QR plan-completeness review

## Plan: OCR-Powered Smart Capture Feature **Phase**: Planning | **Agent**: Planner | **Status**: AWAITING_REVIEW --- ### Executive Summary This plan breaks Issue #12 into smaller, context-efficient issues and addresses two architectural decisions: 1. **VIN Bounding Box**: Recommend smart guidance hints over rigid bounding box 2. **HEIC Conversion**: Recommend server-side conversion (existing pillow-heif) --- ### Decision Analysis Results #### Decision 1: VIN Camera Bounding Box **Verdict: REVISE - Recommend Option C (post-capture crop) with optional guidance hints** | Finding | Status | |---------|--------| | Bounding boxes help alignment (industry practice) | VERIFIED with caveats | | OCR accuracy improves 10-30% with preprocessing | VERIFIED | | VIN physical dimensions vary significantly | VERIFIED - dashboard vs door jamb vs sticker | | Camera overlay APIs work cross-platform | VERIFIED | **Key Insight**: VIN presentations vary significantly (windshield dashboard, door jamb sticker, engine stamp) with different sizes and angles. A rigid bounding box optimized for one type frustrates users capturing others. **Recommendation**: - Implement full-frame capture with optional guidance hints (not rigid box) - Add post-capture cropping tool for user refinement - Leverage server-side smart region detection (aligns with Expensify model) - This achieves OCR preprocessing benefits without constraining capture UX --- #### Decision 2: HEIC to PNG Conversion Location **Verdict: REVISE - Strongly recommend Option B (server-side)** | Finding | Status | |---------|--------| | iPhones capture HEIC by default | VERIFIED | | Server pillow-heif converts reliably | VERIFIED | | Client-side reduces bandwidth | **FAILED** - HEIC is ~50% smaller than JPEG | | Browser heic2any library reliable | UNCERTAIN - memory issues on large images | | Client devices handle conversion | UNCERTAIN - OOM risk on older devices | **Critical Finding**: The bandwidth argument for client-side conversion is **backwards**. HEIC is more efficient to upload than converted JPEG/PNG. Uploading raw HEIC to server is optimal. **Recommendation**: - Keep server-side conversion using existing pillow-heif (docs/ocr-pipeline-tech-stack.md) - Accept HEIC uploads directly - they're smaller and faster to upload - No client-side heic2any library needed (avoids 2MB bundle bloat, memory risks) - Backend already has file-type library for magic byte detection --- ### Proposed Issue Breakdown Break this epic into 8 smaller issues for AI context efficiency: #### Foundation Issues (Phase 0) | Issue | Title | Scope | Dependencies | |-------|-------|-------|--------------| | #12a | **OCR Service Container Setup** | Add Python OCR container (Tesseract, PaddleOCR, pillow-heif) to docker-compose. Create FastAPI service scaffold with health endpoint. | None | | #12b | **Core OCR API Integration** | Backend Fastify routes to proxy OCR requests. Job queue for async processing. Storage integration for uploaded images. | #12a | #### VIN OCR (Priority 1) | Issue | Title | Scope | Dependencies | |-------|-------|-------|--------------| | #12c | **Camera Capture Component** | React camera component using getUserMedia. Full-frame capture with optional guidance hints. File input fallback for desktop. Support HEIC/JPEG/PNG. | None | | #12d | **VIN Photo OCR Pipeline** | OCR service endpoint for VIN extraction. Image preprocessing (deskew, denoise). 17-character VIN pattern matching. Confidence scoring. | #12a, #12b | | #12e | **VIN Capture Integration** | Add camera button to VehicleForm. Wire VIN OCR → NHTSA decode → form population. Review/correct UI before save. Confidence indicators. | #12c, #12d | #### Fuel Receipt OCR (Priority 2) | Issue | Title | Scope | Dependencies | |-------|-------|-------|--------------| | #12f | **Receipt OCR Pipeline** | OCR service endpoint for receipt extraction. Extract: date, total, gallons, price/unit, station. Field confidence scoring. | #12a, #12b | | #12g | **Receipt Capture Integration** | Add camera button to FuelLogForm. Wire receipt OCR → form population. Review/correct screen before save. | #12c, #12f | #### Owner's Manual (Priority 3 - Async Processing) | Issue | Title | Scope | Dependencies | |-------|-------|-------|--------------| | #12h | **Owner's Manual OCR Pipeline** | Async PDF processing with job polling. Table extraction for maintenance schedules. Pattern matching for intervals, service types, fluids. Create maintenance_schedules entries. | #12a, #12b | --- ### Milestone Structure ``` Milestone 1: OCR Infrastructure ├── #12a: OCR Service Container Setup └── #12b: Core OCR API Integration Milestone 2: Camera Capture └── #12c: Camera Capture Component Milestone 3: VIN OCR (Priority 1) ├── #12d: VIN Photo OCR Pipeline └── #12e: VIN Capture Integration Milestone 4: Fuel Receipt OCR (Priority 2) ├── #12f: Receipt OCR Pipeline └── #12g: Receipt Capture Integration Milestone 5: Owner's Manual OCR (Priority 3) └── #12h: Owner's Manual OCR Pipeline ``` --- ### Technical Architecture #### OCR Service (New Container) ``` ┌─────────────────────────────────────────────────────────┐ │ mvp-ocr (Python 3.11) │ ├─────────────────────────────────────────────────────────┤ │ FastAPI REST API │ │ ├── POST /extract/vin (sync, <3s) │ │ ├── POST /extract/receipt (sync, <3s) │ │ ├── POST /extract/manual (async, returns job_id) │ │ └── GET /jobs/{job_id} (poll for status) │ ├─────────────────────────────────────────────────────────┤ │ Processing Pipeline │ │ ├── python-magic (format detection) │ │ ├── pillow-heif (HEIC conversion) ← Server-side │ │ ├── OpenCV (preprocessing) │ │ ├── Tesseract 5.x (primary OCR) │ │ ├── PaddleOCR (fallback for low confidence) │ │ └── spaCy + regex (pattern extraction) │ ├─────────────────────────────────────────────────────────┤ │ Celery + Redis (async job queue for large files) │ └─────────────────────────────────────────────────────────┘ ``` #### Camera Component Architecture ``` CameraCapture Component ├── Uses getUserMedia API (iOS Safari 11+, Chrome, Firefox) ├── Full-frame capture (no rigid bounding box) ├── Optional guidance hints overlay (VIN aspect ratio reference) ├── Post-capture crop tool for user refinement ├── Accepts: HEIC, JPEG, PNG (all sent to server as-is) ├── File input fallback for desktop/unsupported browsers └── Mobile-first responsive design ``` #### Data Flow ``` [Mobile Camera] → [HEIC/JPEG] → [Upload to Server] → [OCR Service] ↓ [pillow-heif converts HEIC] ↓ [Preprocessing + OCR] ↓ [Structured JSON Response] ↓ [Review/Correct UI] ↓ [Save to Feature (vehicles/fuel-logs/maintenance)] ``` --- ### Affected Codebase Areas | Area | Impact | Key Files | |------|--------|-----------| | Docker | Add mvp-ocr container | `docker-compose.yml`, new `ocr/` directory | | Backend Core | OCR proxy routes, job queue | `backend/src/core/`, new routes | | Vehicles Feature | Camera button, VIN OCR integration | `VehicleForm.tsx`, `vehicles.controller.ts` | | Fuel-Logs Feature | Camera button, receipt OCR integration | `FuelLogForm.tsx`, `fuel-logs.controller.ts` | | Maintenance Feature | Schedule creation from manual OCR | `maintenance.service.ts` | | Frontend Shared | Camera capture component | New `frontend/src/shared/components/CameraCapture/` | --- ### Acceptance Criteria Mapping | Original Criteria | Covered By | |-------------------|------------| | Camera button in vehicle create/edit dialog | #12e | | VIN OCR → NHTSA decode → vehicle record | #12d, #12e | | Camera button in fuel log create dialog | #12g | | Fuel receipt OCR → review → fuel log creation | #12f, #12g | | Owner's manual PDF upload with async processing | #12h | | Maintenance schedule extraction | #12h | | Mobile-responsive camera capture | #12c | | Desktop file upload fallback | #12c | | Review/correct screen before save | #12e, #12g | | Low confidence field highlighting | #12e, #12g | | Small file processing <3 seconds | #12d, #12f | | Large file processing with progress | #12h | --- ### Open Questions for Clarification 1. **VIN Guidance Hints**: Should the camera show a translucent aspect-ratio guide (not a strict box) to help users frame VINs, or prefer completely unguided full-frame capture? 2. **Post-Capture Crop**: Should users be able to manually crop/adjust the captured image before OCR, or should we rely entirely on server-side smart cropping? 3. **Issue Granularity**: Are 8 sub-issues the right level of breakdown, or would you prefer fewer/more? 4. **Priority Order**: The plan assumes VIN (P1) → Receipts (P2) → Manuals (P3). Confirm this priority or adjust? 5. **Container Architecture**: Is adding a 6th container (mvp-ocr) acceptable, or should OCR be embedded in the existing backend container? --- *Verdict*: AWAITING_REVIEW | *Next*: Address clarifying questions, then QR plan-completeness review

egullickson commented

2026-02-01 18:45:43 +00:00

Plan Update: Decisions Confirmed

Phase: Planning | Agent: Planner | Status: DECISIONS_CONFIRMED

Confirmed Decisions

Question	Decision
VIN Guidance Hints	Translucent aspect-ratio guide (not rigid box)
Post-Capture Crop	Users can manually crop before OCR
Issue Granularity	8 sub-issues confirmed
Priority Order	VIN (P1) → Receipts (P2) → Manuals (P3)
Container Architecture	6th container (mvp-ocr) acceptable

Finalized Architecture

Camera Component Behavior:

1. Open camera with translucent VIN aspect-ratio guide overlay
2. User captures full-frame photo
3. Post-capture crop tool appears (optional adjustment)
4. User confirms → upload HEIC/JPEG as-is to server
5. Server converts HEIC → preprocesses → OCR → returns structured data
6. Review/correct UI with confidence indicators
7. User confirms → save to feature

Container Architecture (6 containers):

┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ mvp-traefik │ │ mvp-frontend│ │ mvp-backend │
└─────────────┘ └─────────────┘ └──────┬──────┘
                                       │
                    ┌──────────────────┼──────────────────┐
                    │                  │                  │
              ┌─────▼─────┐     ┌──────▼──────┐    ┌──────▼──────┐
              │ mvp-postgres│    │  mvp-redis  │    │  mvp-ocr   │
              └───────────┘     └─────────────┘    └─────────────┘

Verdict: DECISIONS_CONFIRMED | Next: Create 8 sub-issues, then QR plan-completeness review

## Plan Update: Decisions Confirmed **Phase**: Planning | **Agent**: Planner | **Status**: DECISIONS_CONFIRMED --- ### Confirmed Decisions | Question | Decision | |----------|----------| | VIN Guidance Hints | Translucent aspect-ratio guide (not rigid box) | | Post-Capture Crop | Users can manually crop before OCR | | Issue Granularity | 8 sub-issues confirmed | | Priority Order | VIN (P1) → Receipts (P2) → Manuals (P3) | | Container Architecture | 6th container (mvp-ocr) acceptable | --- ### Finalized Architecture **Camera Component Behavior:** ``` 1. Open camera with translucent VIN aspect-ratio guide overlay 2. User captures full-frame photo 3. Post-capture crop tool appears (optional adjustment) 4. User confirms → upload HEIC/JPEG as-is to server 5. Server converts HEIC → preprocesses → OCR → returns structured data 6. Review/correct UI with confidence indicators 7. User confirms → save to feature ``` **Container Architecture (6 containers):** ``` ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ mvp-traefik │ │ mvp-frontend│ │ mvp-backend │ └─────────────┘ └─────────────┘ └──────┬──────┘ │ ┌──────────────────┼──────────────────┐ │ │ │ ┌─────▼─────┐ ┌──────▼──────┐ ┌──────▼──────┐ │ mvp-postgres│ │ mvp-redis │ │ mvp-ocr │ └───────────┘ └─────────────┘ └─────────────┘ ``` --- *Verdict*: DECISIONS_CONFIRMED | *Next*: Create 8 sub-issues, then QR plan-completeness review

egullickson referenced this issue

2026-02-01 18:46:02 +00:00

feat: OCR Service Container Setup #64

egullickson referenced this issue

2026-02-01 18:46:19 +00:00

feat: Core OCR API Integration #65

egullickson referenced this issue

2026-02-01 18:46:41 +00:00

feat: Camera Capture Component #66

egullickson referenced this issue

2026-02-01 18:47:01 +00:00

feat: VIN Photo OCR Pipeline #67

egullickson referenced this issue

2026-02-01 18:47:23 +00:00

feat: VIN Capture Integration #68

egullickson referenced this issue

2026-02-01 18:47:42 +00:00

feat: Receipt OCR Pipeline #69

egullickson referenced this issue

2026-02-01 18:47:59 +00:00

feat: Receipt Capture Integration #70

egullickson referenced this issue

2026-02-01 18:48:26 +00:00

feat: Owner's Manual OCR Pipeline #71

egullickson commented

2026-02-01 18:49:00 +00:00

Plan: Sub-Issues Created

Phase: Planning | Agent: Planner | Status: PLAN_COMPLETE

Sub-Issue Summary

This epic has been broken into 8 manageable sub-issues:

Issue	Title	Priority	Dependencies
#64	OCR Service Container Setup	P0	None
#65	Core OCR API Integration	P0	#64
#66	Camera Capture Component	P1	None
#67	VIN Photo OCR Pipeline	P1	#64, #65
#68	VIN Capture Integration	P1	#66, #67
#69	Receipt OCR Pipeline	P2	#64, #65
#70	Receipt Capture Integration	P2	#66, #69
#71	Owner's Manual OCR Pipeline	P3	#64, #65

Dependency Graph

                    ┌──────────────┐
                    │ #64 OCR      │
                    │ Container    │
                    └──────┬───────┘
                           │
                    ┌──────▼───────┐
                    │ #65 Core     │
          ┌─────────┤ OCR API      ├─────────┐
          │         └──────────────┘         │
          │                                  │
    ┌─────▼─────┐                     ┌──────▼──────┐
    │ #67 VIN   │                     │ #69 Receipt │
    │ Pipeline  │                     │ Pipeline    │
    └─────┬─────┘                     └──────┬──────┘
          │                                  │
          │    ┌──────────────┐              │
          │    │ #66 Camera   │              │
          │    │ Component    │              │
          │    └──────┬───────┘              │
          │           │                      │
    ┌─────▼───────────▼───┐          ┌───────▼──────┐
    │ #68 VIN Integration │          │ #70 Receipt  │
    └─────────────────────┘          │ Integration  │
                                     └──────────────┘

    ┌──────────────┐
    │ #71 Owner's  │ (depends on #64, #65)
    │ Manual OCR   │
    └──────────────┘

Recommended Execution Order

Sprint 1: Foundation

#64 - OCR Service Container Setup
#65 - Core OCR API Integration
#66 - Camera Capture Component (parallel with #64/#65)

Sprint 2: VIN OCR (P1)
4. #67 - VIN Photo OCR Pipeline
5. #68 - VIN Capture Integration

Sprint 3: Receipt OCR (P2)
6. #69 - Receipt OCR Pipeline
7. #70 - Receipt Capture Integration

Sprint 4: Manual OCR (P3)
8. #71 - Owner's Manual OCR Pipeline

Key Architectural Decisions (Finalized)

Decision	Choice	Rationale
VIN Camera Guidance	Translucent aspect-ratio guide	Helps alignment without being rigid
Post-Capture Crop	User can manually crop	Flexibility for varied VIN placements
HEIC Conversion	Server-side (pillow-heif)	Smaller upload, proven reliability
Container Architecture	6th container (mvp-ocr)	Isolation, Python ecosystem for OCR

Verdict: PLAN_COMPLETE | Next: QR plan-completeness review

This issue (#12) now serves as the epic tracker. Individual sub-issues should be worked in the order above, with each sub-issue following the standard workflow (branch, implement, PR, merge).

## Plan: Sub-Issues Created **Phase**: Planning | **Agent**: Planner | **Status**: PLAN_COMPLETE --- ### Sub-Issue Summary This epic has been broken into 8 manageable sub-issues: | Issue | Title | Priority | Dependencies | |-------|-------|----------|--------------| | #64 | OCR Service Container Setup | P0 | None | | #65 | Core OCR API Integration | P0 | #64 | | #66 | Camera Capture Component | P1 | None | | #67 | VIN Photo OCR Pipeline | P1 | #64, #65 | | #68 | VIN Capture Integration | P1 | #66, #67 | | #69 | Receipt OCR Pipeline | P2 | #64, #65 | | #70 | Receipt Capture Integration | P2 | #66, #69 | | #71 | Owner's Manual OCR Pipeline | P3 | #64, #65 | --- ### Dependency Graph ``` ┌──────────────┐ │ #64 OCR │ │ Container │ └──────┬───────┘ │ ┌──────▼───────┐ │ #65 Core │ ┌─────────┤ OCR API ├─────────┐ │ └──────────────┘ │ │ │ ┌─────▼─────┐ ┌──────▼──────┐ │ #67 VIN │ │ #69 Receipt │ │ Pipeline │ │ Pipeline │ └─────┬─────┘ └──────┬──────┘ │ │ │ ┌──────────────┐ │ │ │ #66 Camera │ │ │ │ Component │ │ │ └──────┬───────┘ │ │ │ │ ┌─────▼───────────▼───┐ ┌───────▼──────┐ │ #68 VIN Integration │ │ #70 Receipt │ └─────────────────────┘ │ Integration │ └──────────────┘ ┌──────────────┐ │ #71 Owner's │ (depends on #64, #65) │ Manual OCR │ └──────────────┘ ``` --- ### Recommended Execution Order **Sprint 1: Foundation** 1. #64 - OCR Service Container Setup 2. #65 - Core OCR API Integration 3. #66 - Camera Capture Component (parallel with #64/#65) **Sprint 2: VIN OCR (P1)** 4. #67 - VIN Photo OCR Pipeline 5. #68 - VIN Capture Integration **Sprint 3: Receipt OCR (P2)** 6. #69 - Receipt OCR Pipeline 7. #70 - Receipt Capture Integration **Sprint 4: Manual OCR (P3)** 8. #71 - Owner's Manual OCR Pipeline --- ### Key Architectural Decisions (Finalized) | Decision | Choice | Rationale | |----------|--------|-----------| | VIN Camera Guidance | Translucent aspect-ratio guide | Helps alignment without being rigid | | Post-Capture Crop | User can manually crop | Flexibility for varied VIN placements | | HEIC Conversion | Server-side (pillow-heif) | Smaller upload, proven reliability | | Container Architecture | 6th container (mvp-ocr) | Isolation, Python ecosystem for OCR | --- *Verdict*: PLAN_COMPLETE | *Next*: QR plan-completeness review This issue (#12) now serves as the **epic tracker**. Individual sub-issues should be worked in the order above, with each sub-issue following the standard workflow (branch, implement, PR, merge).

egullickson closed this issue

2026-02-02 03:52:59 +00:00

egullickson referenced this issue

2026-02-07 16:00:34 +00:00

feat: Improve OCR process - replace Tesseract with PaddleOCR and add cloud fallback for VIN scanning #115

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: egullickson/motovaultpro#12