feat: Core OCR API Integration (#65) #74
Reference in New Issue
Block a user
Delete Branch "issue-65-core-ocr-api"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Changes
OCR Service (Python/FastAPI)
ocr/app/models/schemas.py- Pydantic models for request/response validationocr/app/services/preprocessor.py- Image preprocessing (deskew, denoise)ocr/app/services/ocr_service.py- Core OCR logic with Tesseract and HEIC supportocr/app/services/job_queue.py- Redis-based async job managementocr/app/routers/extract.py- Sync extraction endpointocr/app/routers/jobs.py- Async job submission and pollingBackend (Fastify)
backend/src/features/ocr/- Complete feature capsule following project patternsInfrastructure
Test Plan
Fixes #65
OCR Service (Python/FastAPI): - POST /extract for synchronous OCR extraction - POST /jobs and GET /jobs/{job_id} for async processing - Image preprocessing (deskew, denoise) for accuracy - HEIC conversion via pillow-heif - Redis job queue for async processing Backend (Fastify): - POST /api/ocr/extract - authenticated proxy to OCR - POST /api/ocr/jobs - async job submission - GET /api/ocr/jobs/:jobId - job polling - Multipart file upload handling - JWT authentication required File size limits: 10MB sync, 200MB async Processing time target: <3 seconds for typical photos Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>