Add/update documentation across backend, Python OCR service, and frontend for receipt scanning, manual extraction, and Gemini integration. Create new CLAUDE.md files for engines/, fuel-logs/, documents/, and maintenance/ features. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1.6 KiB
1.6 KiB
ocr/app/
Python OCR microservice (FastAPI). Primary engine: PaddleOCR PP-OCRv4 with optional Google Vision cloud fallback. Gemini 2.5 Flash for maintenance manual PDF extraction (standalone module, not an OcrEngine subclass).
Files
| File | What | When to read |
|---|---|---|
main.py |
FastAPI application entry point | Route registration, app setup |
config.py |
Configuration settings (OCR engines, Vertex AI, Redis, Vision API limits) | Environment variables, settings |
__init__.py |
Package init | Package structure |
Subdirectories
| Directory | What | When to read |
|---|---|---|
engines/ |
OCR engine abstraction (PaddleOCR, Google Vision, Hybrid) and Gemini module | Engine changes, adding new engines |
extractors/ |
Domain-specific data extraction (receipts, fuel receipts, maintenance manuals) | Adding new extraction types, modifying extraction logic |
models/ |
Data models and schemas | Request/response types |
patterns/ |
Regex patterns and service name mapping (27 maintenance subtypes) | Pattern matching rules, service categorization |
preprocessors/ |
Image preprocessing pipeline | Image preparation before OCR |
routers/ |
FastAPI route handlers (/extract, /extract/receipt, /extract/manual, /jobs) | API endpoint changes |
services/ |
Business logic services (job queue with Redis) | Core OCR processing, async job management |
table_extraction/ |
Table detection and parsing | Structured data extraction from images |
validators/ |
Input validation | Validation rules |