feat: add OCR engine abstraction layer (refs #116)

Introduce pluggable OcrEngine ABC with PaddleOCR PP-OCRv4 as primary
engine and Tesseract wrapper for backward compatibility. Engine factory
reads OCR_PRIMARY_ENGINE config to instantiate the correct engine.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Eric Gullickson
2026-02-07 10:47:40 -06:00
parent 6b0c18a41c
commit ebc633fb36
7 changed files with 422 additions and 0 deletions

View File

@@ -0,0 +1,27 @@
"""OCR engine abstraction layer.
Provides a pluggable engine interface for OCR processing,
decoupling extractors from specific OCR libraries.
"""
from app.engines.base_engine import (
EngineError,
EngineProcessingError,
EngineUnavailableError,
OcrConfig,
OcrEngine,
OcrEngineResult,
WordBox,
)
from app.engines.engine_factory import create_engine
__all__ = [
"OcrEngine",
"OcrConfig",
"OcrEngineResult",
"WordBox",
"EngineError",
"EngineUnavailableError",
"EngineProcessingError",
"create_engine",
]