feat: OCR engine abstraction layer and PaddleOCR integration (#115) #116
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Relates to #115
Create an OCR engine abstraction layer in
ocr/app/engines/to decouple extractors from Tesseract. Implement PaddleOCR as the primary engine using PP-OCRv4 models.Changes
ocr/app/engines/base_engine.py-OcrEngineABC withrecognize(image_bytes, config)methodocr/app/engines/paddle_engine.py- PaddleOCR wrapper with PP-OCRv4, angle classification, CPU-onlyocr/app/engines/engine_factory.py- Factory to instantiate configured engineocr/app/config.py- AddOCR_PRIMARY_ENGINE,OCR_CONFIDENCE_THRESHOLDenv varsocr/requirements.txt- Addpaddlepaddle,paddleocrdependenciesAcceptance Criteria
recognize()returning structured OcrEngineResult (text, confidence, word boxes)