Files
motovaultpro/ocr/app/CLAUDE.md
Eric Gullickson 96e1dde7b2
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 8m4s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 24s
Deploy to Staging / Verify Staging (pull_request) Successful in 9s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 9s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
docs: update CLAUDE.md references from Vertex AI to google-genai (refs #231)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 11:21:58 -06:00

1.6 KiB

ocr/app/

Python OCR microservice (FastAPI). Primary engine: PaddleOCR PP-OCRv4 with optional Google Vision cloud fallback. Gemini 2.5 Flash for maintenance manual PDF extraction (standalone module, not an OcrEngine subclass).

Files

File What When to read
main.py FastAPI application entry point Route registration, app setup
config.py Configuration settings (OCR engines, Google GenAI, Redis, Vision API limits) Environment variables, settings
__init__.py Package init Package structure

Subdirectories

Directory What When to read
engines/ OCR engine abstraction (PaddleOCR, Google Vision, Hybrid) and Gemini module Engine changes, adding new engines
extractors/ Domain-specific data extraction (receipts, fuel receipts, maintenance manuals) Adding new extraction types, modifying extraction logic
models/ Data models and schemas Request/response types
patterns/ Regex patterns and service name mapping (27 maintenance subtypes) Pattern matching rules, service categorization
preprocessors/ Image preprocessing pipeline Image preparation before OCR
routers/ FastAPI route handlers (/extract, /extract/receipt, /extract/manual, /decode, /jobs) API endpoint changes
services/ Business logic services (job queue with Redis) Core OCR processing, async job management
table_extraction/ Table detection and parsing Structured data extraction from images
validators/ Input validation Validation rules