feat: add optional Google Vision cloud fallback engine (refs #118)

CloudEngine wraps Google Vision TEXT_DETECTION with lazy init.
HybridEngine runs primary engine, falls back to cloud when confidence
is below threshold. Disabled by default (OCR_FALLBACK_ENGINE=none).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Eric Gullickson
2026-02-07 11:12:08 -06:00
parent 013fb0c67a
commit 4ef942cb9d
6 changed files with 351 additions and 18 deletions

View File

@@ -2,6 +2,12 @@
Provides a pluggable engine interface for OCR processing,
decoupling extractors from specific OCR libraries.
Engines:
- PaddleOcrEngine: PaddleOCR PP-OCRv4 (primary, CPU-only)
- TesseractEngine: pytesseract wrapper (backward compatibility)
- CloudEngine: Google Vision TEXT_DETECTION (optional cloud fallback)
- HybridEngine: Primary + fallback with confidence threshold
"""
from app.engines.base_engine import (