feat: add OCR engine abstraction layer (refs #116)

Introduce pluggable OcrEngine ABC with PaddleOCR PP-OCRv4 as primary
engine and Tesseract wrapper for backward compatibility. Engine factory
reads OCR_PRIMARY_ENGINE config to instantiate the correct engine.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Eric Gullickson
2026-02-07 10:47:40 -06:00
parent 6b0c18a41c
commit ebc633fb36
7 changed files with 422 additions and 0 deletions

View File

@@ -15,6 +15,8 @@ numpy>=1.24.0
# OCR Engines
pytesseract>=0.3.10
paddlepaddle>=2.6.0
paddleocr>=2.8.0
# PDF Processing
PyMuPDF>=1.23.0