OCR Service (Python/FastAPI):
- POST /extract for synchronous OCR extraction
- POST /jobs and GET /jobs/{job_id} for async processing
- Image preprocessing (deskew, denoise) for accuracy
- HEIC conversion via pillow-heif
- Redis job queue for async processing
Backend (Fastify):
- POST /api/ocr/extract - authenticated proxy to OCR
- POST /api/ocr/jobs - async job submission
- GET /api/ocr/jobs/:jobId - job polling
- Multipart file upload handling
- JWT authentication required
File size limits: 10MB sync, 200MB async
Processing time target: <3 seconds for typical photos
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add Python-based OCR service container (mvp-ocr) as the 6th service:
- Python 3.11-slim with FastAPI/uvicorn
- Tesseract OCR with English language pack
- pillow-heif for HEIC image support
- opencv-python-headless for image preprocessing
- Health endpoint at /health
- Unit tests for health, HEIC support, and Tesseract availability
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>