feat: Core OCR API Integration #65

Closed
opened 2026-02-01 18:46:19 +00:00 by egullickson · 0 comments
Owner

Overview

Create the API layer connecting the Fastify backend to the OCR service, including job queue for async processing.

Parent Issue: #12 (OCR-powered smart capture)
Priority: P0 - Foundation
Dependencies: OCR Service Container Setup

Scope

OCR Service Endpoints (Python/FastAPI)

POST /extract          # Generic OCR extraction
POST /extract/vin      # VIN-specific extraction (future)
POST /extract/receipt  # Receipt-specific extraction (future)
POST /jobs             # Submit async job (for large files)
GET  /jobs/{job_id}    # Poll job status

Backend Proxy Routes (Fastify)

POST /api/ocr/extract         # Proxy to OCR service
POST /api/ocr/jobs            # Submit async job
GET  /api/ocr/jobs/:jobId     # Poll job status

File Upload Handling

  • Accept multipart/form-data with image file
  • Validate file type (HEIC, JPEG, PNG, PDF)
  • Stream file to OCR service
  • Return structured JSON response

Async Job Queue (for large files like PDFs)

  • Use existing Redis (mvp-redis) as queue backend
  • Simple job status: pending, processing, completed, failed
  • Job metadata stored in Redis with TTL (1 hour)
  • Polling endpoint returns progress percentage

Response Format

interface OcrResponse {
  success: boolean;
  documentType: 'vin' | 'receipt' | 'manual' | 'unknown';
  rawText: string;
  confidence: number;
  extractedFields: Record<string, {
    value: string;
    confidence: number;
  }>;
  processingTimeMs: number;
}

interface JobResponse {
  jobId: string;
  status: 'pending' | 'processing' | 'completed' | 'failed';
  progress?: number;
  result?: OcrResponse;
  error?: string;
}

Directory Structure

OCR Service (Python)

ocr/app/
├── main.py
├── config.py
├── routers/
│   ├── __init__.py
│   ├── extract.py      # Extraction endpoints
│   └── jobs.py         # Job queue endpoints
├── services/
│   ├── __init__.py
│   ├── ocr_service.py  # Core OCR logic
│   ├── preprocessor.py # Image preprocessing
│   └── job_queue.py    # Redis job management
└── models/
    ├── __init__.py
    └── schemas.py      # Pydantic models

Backend (Fastify)

backend/src/features/ocr/
├── README.md
├── index.ts
├── api/
│   ├── ocr.controller.ts
│   ├── ocr.routes.ts
│   └── ocr.validation.ts
├── domain/
│   ├── ocr.service.ts
│   └── ocr.types.ts
└── external/
    └── ocr-client.ts   # HTTP client to OCR service

Acceptance Criteria

  • POST /api/ocr/extract accepts image upload and returns OCR result
  • Supports HEIC, JPEG, PNG input formats
  • HEIC files converted server-side via pillow-heif
  • Image preprocessing applied (deskew, denoise)
  • Response includes raw text and confidence score
  • Async job submission works for large files
  • Job polling returns status and progress
  • Processing time < 3 seconds for typical photos
  • Authentication required (JWT)
  • Error handling for invalid files, OCR failures

Technical Notes

  • OCR service communicates via internal Docker network (no external exposure)
  • Backend acts as authenticated proxy
  • File size limit: 10MB for sync, 200MB for async
  • Reference existing storage patterns from documents feature

Out of Scope

  • VIN-specific extraction logic (see #12d)
  • Receipt-specific extraction logic (see #12f)
  • Owner's manual table extraction (see #12h)
  • Frontend integration (see #12c, #12e, #12g)
## Overview Create the API layer connecting the Fastify backend to the OCR service, including job queue for async processing. **Parent Issue**: #12 (OCR-powered smart capture) **Priority**: P0 - Foundation **Dependencies**: OCR Service Container Setup ## Scope ### OCR Service Endpoints (Python/FastAPI) ``` POST /extract # Generic OCR extraction POST /extract/vin # VIN-specific extraction (future) POST /extract/receipt # Receipt-specific extraction (future) POST /jobs # Submit async job (for large files) GET /jobs/{job_id} # Poll job status ``` ### Backend Proxy Routes (Fastify) ``` POST /api/ocr/extract # Proxy to OCR service POST /api/ocr/jobs # Submit async job GET /api/ocr/jobs/:jobId # Poll job status ``` ### File Upload Handling - Accept multipart/form-data with image file - Validate file type (HEIC, JPEG, PNG, PDF) - Stream file to OCR service - Return structured JSON response ### Async Job Queue (for large files like PDFs) - Use existing Redis (mvp-redis) as queue backend - Simple job status: `pending`, `processing`, `completed`, `failed` - Job metadata stored in Redis with TTL (1 hour) - Polling endpoint returns progress percentage ### Response Format ```typescript interface OcrResponse { success: boolean; documentType: 'vin' | 'receipt' | 'manual' | 'unknown'; rawText: string; confidence: number; extractedFields: Record<string, { value: string; confidence: number; }>; processingTimeMs: number; } interface JobResponse { jobId: string; status: 'pending' | 'processing' | 'completed' | 'failed'; progress?: number; result?: OcrResponse; error?: string; } ``` ## Directory Structure ### OCR Service (Python) ``` ocr/app/ ├── main.py ├── config.py ├── routers/ │ ├── __init__.py │ ├── extract.py # Extraction endpoints │ └── jobs.py # Job queue endpoints ├── services/ │ ├── __init__.py │ ├── ocr_service.py # Core OCR logic │ ├── preprocessor.py # Image preprocessing │ └── job_queue.py # Redis job management └── models/ ├── __init__.py └── schemas.py # Pydantic models ``` ### Backend (Fastify) ``` backend/src/features/ocr/ ├── README.md ├── index.ts ├── api/ │ ├── ocr.controller.ts │ ├── ocr.routes.ts │ └── ocr.validation.ts ├── domain/ │ ├── ocr.service.ts │ └── ocr.types.ts └── external/ └── ocr-client.ts # HTTP client to OCR service ``` ## Acceptance Criteria - [ ] POST `/api/ocr/extract` accepts image upload and returns OCR result - [ ] Supports HEIC, JPEG, PNG input formats - [ ] HEIC files converted server-side via pillow-heif - [ ] Image preprocessing applied (deskew, denoise) - [ ] Response includes raw text and confidence score - [ ] Async job submission works for large files - [ ] Job polling returns status and progress - [ ] Processing time < 3 seconds for typical photos - [ ] Authentication required (JWT) - [ ] Error handling for invalid files, OCR failures ## Technical Notes - OCR service communicates via internal Docker network (no external exposure) - Backend acts as authenticated proxy - File size limit: 10MB for sync, 200MB for async - Reference existing storage patterns from documents feature ## Out of Scope - VIN-specific extraction logic (see #12d) - Receipt-specific extraction logic (see #12f) - Owner's manual table extraction (see #12h) - Frontend integration (see #12c, #12e, #12g)
egullickson added the
status
backlog
type
feature
labels 2026-02-01 18:48:34 +00:00
egullickson added
status
in-progress
and removed
status
backlog
labels 2026-02-01 21:54:09 +00:00
egullickson added
status
review
and removed
status
in-progress
labels 2026-02-01 22:02:37 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: egullickson/motovaultpro#65