feat: add core OCR API integration (refs #65)

OCR Service (Python/FastAPI): - POST /extract for synchronous OCR extraction - POST /jobs and GET /jobs/{job_id} for async processing - Image preprocessing (deskew, denoise) for accuracy - HEIC conversion via pillow-heif - Redis job queue for async processing Backend (Fastify): - POST /api/ocr/extract - authenticated proxy to OCR - POST /api/ocr/jobs - async job submission - GET /api/ocr/jobs/:jobId - job polling - Multipart file upload handling - JWT authentication required File size limits: 10MB sync, 200MB async Processing time target: <3 seconds for typical photos Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 16:02:11 -06:00
parent 94e49306dc
commit 852c9013b5
25 changed files with 1931 additions and 3 deletions
--- a/ocr/requirements.txt
+++ b/ocr/requirements.txt
@@ -2,6 +2,7 @@
 fastapi>=0.100.0
 uvicorn[standard]>=0.23.0
 python-multipart>=0.0.6
+pydantic>=2.0.0

 # File Detection & Handling
 python-magic>=0.4.27
@@ -15,6 +16,12 @@ numpy>=1.24.0
 # OCR Engines
 pytesseract>=0.3.10

+# Redis for job queue
+redis>=5.0.0
+
+# HTTP client for callbacks
+httpx>=0.24.0
+
 # Testing
 pytest>=7.4.0
-httpx>=0.24.0
+pytest-asyncio>=0.21.0