feat: Core OCR API Integration #65

New Issue

egullickson · 2026-02-01T18:46:19Z

egullickson commented

2026-02-01 18:46:19 +00:00

Overview

Create the API layer connecting the Fastify backend to the OCR service, including job queue for async processing.

Parent Issue: #12 (OCR-powered smart capture)
Priority: P0 - Foundation
Dependencies: OCR Service Container Setup

Scope

OCR Service Endpoints (Python/FastAPI)

POST /extract          # Generic OCR extraction
POST /extract/vin      # VIN-specific extraction (future)
POST /extract/receipt  # Receipt-specific extraction (future)
POST /jobs             # Submit async job (for large files)
GET  /jobs/{job_id}    # Poll job status

Backend Proxy Routes (Fastify)

POST /api/ocr/extract         # Proxy to OCR service
POST /api/ocr/jobs            # Submit async job
GET  /api/ocr/jobs/:jobId     # Poll job status

File Upload Handling

Accept multipart/form-data with image file
Validate file type (HEIC, JPEG, PNG, PDF)
Stream file to OCR service
Return structured JSON response

Async Job Queue (for large files like PDFs)

Use existing Redis (mvp-redis) as queue backend
Simple job status: pending, processing, completed, failed
Job metadata stored in Redis with TTL (1 hour)
Polling endpoint returns progress percentage

Response Format

interface OcrResponse {
  success: boolean;
  documentType: 'vin' | 'receipt' | 'manual' | 'unknown';
  rawText: string;
  confidence: number;
  extractedFields: Record<string, {
    value: string;
    confidence: number;
  }>;
  processingTimeMs: number;
}

interface JobResponse {
  jobId: string;
  status: 'pending' | 'processing' | 'completed' | 'failed';
  progress?: number;
  result?: OcrResponse;
  error?: string;
}

Directory Structure

OCR Service (Python)

ocr/app/
├── main.py
├── config.py
├── routers/
│   ├── __init__.py
│   ├── extract.py      # Extraction endpoints
│   └── jobs.py         # Job queue endpoints
├── services/
│   ├── __init__.py
│   ├── ocr_service.py  # Core OCR logic
│   ├── preprocessor.py # Image preprocessing
│   └── job_queue.py    # Redis job management
└── models/
    ├── __init__.py
    └── schemas.py      # Pydantic models

Backend (Fastify)

backend/src/features/ocr/
├── README.md
├── index.ts
├── api/
│   ├── ocr.controller.ts
│   ├── ocr.routes.ts
│   └── ocr.validation.ts
├── domain/
│   ├── ocr.service.ts
│   └── ocr.types.ts
└── external/
    └── ocr-client.ts   # HTTP client to OCR service

Acceptance Criteria

POST /api/ocr/extract accepts image upload and returns OCR result
Supports HEIC, JPEG, PNG input formats
HEIC files converted server-side via pillow-heif
Image preprocessing applied (deskew, denoise)
Response includes raw text and confidence score
Async job submission works for large files
Job polling returns status and progress
Processing time < 3 seconds for typical photos
Authentication required (JWT)
Error handling for invalid files, OCR failures

Technical Notes

OCR service communicates via internal Docker network (no external exposure)
Backend acts as authenticated proxy
File size limit: 10MB for sync, 200MB for async
Reference existing storage patterns from documents feature

Out of Scope

VIN-specific extraction logic (see #12d)
Receipt-specific extraction logic (see #12f)
Owner's manual table extraction (see #12h)
Frontend integration (see #12c, #12e, #12g)

## Overview Create the API layer connecting the Fastify backend to the OCR service, including job queue for async processing. **Parent Issue**: #12 (OCR-powered smart capture) **Priority**: P0 - Foundation **Dependencies**: OCR Service Container Setup ## Scope ### OCR Service Endpoints (Python/FastAPI) ``` POST /extract # Generic OCR extraction POST /extract/vin # VIN-specific extraction (future) POST /extract/receipt # Receipt-specific extraction (future) POST /jobs # Submit async job (for large files) GET /jobs/{job_id} # Poll job status ``` ### Backend Proxy Routes (Fastify) ``` POST /api/ocr/extract # Proxy to OCR service POST /api/ocr/jobs # Submit async job GET /api/ocr/jobs/:jobId # Poll job status ``` ### File Upload Handling - Accept multipart/form-data with image file - Validate file type (HEIC, JPEG, PNG, PDF) - Stream file to OCR service - Return structured JSON response ### Async Job Queue (for large files like PDFs) - Use existing Redis (mvp-redis) as queue backend - Simple job status: `pending`, `processing`, `completed`, `failed` - Job metadata stored in Redis with TTL (1 hour) - Polling endpoint returns progress percentage ### Response Format ```typescript interface OcrResponse { success: boolean; documentType: 'vin' | 'receipt' | 'manual' | 'unknown'; rawText: string; confidence: number; extractedFields: Record<string, { value: string; confidence: number; }>; processingTimeMs: number; } interface JobResponse { jobId: string; status: 'pending' | 'processing' | 'completed' | 'failed'; progress?: number; result?: OcrResponse; error?: string; } ``` ## Directory Structure ### OCR Service (Python) ``` ocr/app/ ├── main.py ├── config.py ├── routers/ │ ├── __init__.py │ ├── extract.py # Extraction endpoints │ └── jobs.py # Job queue endpoints ├── services/ │ ├── __init__.py │ ├── ocr_service.py # Core OCR logic │ ├── preprocessor.py # Image preprocessing │ └── job_queue.py # Redis job management └── models/ ├── __init__.py └── schemas.py # Pydantic models ``` ### Backend (Fastify) ``` backend/src/features/ocr/ ├── README.md ├── index.ts ├── api/ │ ├── ocr.controller.ts │ ├── ocr.routes.ts │ └── ocr.validation.ts ├── domain/ │ ├── ocr.service.ts │ └── ocr.types.ts └── external/ └── ocr-client.ts # HTTP client to OCR service ``` ## Acceptance Criteria - [ ] POST `/api/ocr/extract` accepts image upload and returns OCR result - [ ] Supports HEIC, JPEG, PNG input formats - [ ] HEIC files converted server-side via pillow-heif - [ ] Image preprocessing applied (deskew, denoise) - [ ] Response includes raw text and confidence score - [ ] Async job submission works for large files - [ ] Job polling returns status and progress - [ ] Processing time < 3 seconds for typical photos - [ ] Authentication required (JWT) - [ ] Error handling for invalid files, OCR failures ## Technical Notes - OCR service communicates via internal Docker network (no external exposure) - Backend acts as authenticated proxy - File size limit: 10MB for sync, 200MB for async - Reference existing storage patterns from documents feature ## Out of Scope - VIN-specific extraction logic (see #12d) - Receipt-specific extraction logic (see #12f) - Owner's manual table extraction (see #12h) - Frontend integration (see #12c, #12e, #12g)

egullickson added the

labels 2026-02-01 18:48:34 +00:00

egullickson referenced this issue

2026-02-01 18:49:00 +00:00

feat: OCR-powered smart capture for VIN, receipts, and owner's manuals #12

egullickson added

status

in-progress