Files
motovaultpro/ocr/app/CLAUDE.md
Eric Gullickson f590421058 chore: remove NHTSA code and update documentation (refs #227)
Delete vehicles/external/nhtsa/ directory (3 files), remove VPICVariable
and VPICResponse from platform models. Update all documentation to
reflect Gemini VIN decode via OCR service architecture.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 21:51:38 -06:00

1.6 KiB

ocr/app/

Python OCR microservice (FastAPI). Primary engine: PaddleOCR PP-OCRv4 with optional Google Vision cloud fallback. Gemini 2.5 Flash for maintenance manual PDF extraction (standalone module, not an OcrEngine subclass).

Files

File What When to read
main.py FastAPI application entry point Route registration, app setup
config.py Configuration settings (OCR engines, Vertex AI, Redis, Vision API limits) Environment variables, settings
__init__.py Package init Package structure

Subdirectories

Directory What When to read
engines/ OCR engine abstraction (PaddleOCR, Google Vision, Hybrid) and Gemini module Engine changes, adding new engines
extractors/ Domain-specific data extraction (receipts, fuel receipts, maintenance manuals) Adding new extraction types, modifying extraction logic
models/ Data models and schemas Request/response types
patterns/ Regex patterns and service name mapping (27 maintenance subtypes) Pattern matching rules, service categorization
preprocessors/ Image preprocessing pipeline Image preparation before OCR
routers/ FastAPI route handlers (/extract, /extract/receipt, /extract/manual, /decode, /jobs) API endpoint changes
services/ Business logic services (job queue with Redis) Core OCR processing, async job management
table_extraction/ Table detection and parsing Structured data extraction from images
validators/ Input validation Validation rules