docs: update CLAUDE.md indexes and README for OCR expansion (refs #137)

Add/update documentation across backend, Python OCR service, and frontend
for receipt scanning, manual extraction, and Gemini integration. Create
new CLAUDE.md files for engines/, fuel-logs/, documents/, and maintenance/
features.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Eric Gullickson
2026-02-11 11:04:19 -06:00
parent 40df5e5b58
commit ab0d8463be
11 changed files with 385 additions and 45 deletions

View File

@@ -1,23 +1,25 @@
# ocr/app/
Python OCR microservice (FastAPI). Primary engine: PaddleOCR PP-OCRv4 with optional Google Vision cloud fallback. Gemini 2.5 Flash for maintenance manual PDF extraction (standalone module, not an OcrEngine subclass).
## Files
| File | What | When to read |
| ---- | ---- | ------------ |
| `main.py` | FastAPI application entry point | Route registration, app setup |
| `config.py` | Configuration settings | Environment variables, settings |
| `config.py` | Configuration settings (OCR engines, Vertex AI, Redis, Vision API limits) | Environment variables, settings |
| `__init__.py` | Package init | Package structure |
## Subdirectories
| Directory | What | When to read |
| --------- | ---- | ------------ |
| `engines/` | OCR engine abstraction (PaddleOCR primary, Google Vision fallback) | Engine changes, adding new engines |
| `extractors/` | Data extraction logic | Adding new extraction types |
| `engines/` | OCR engine abstraction (PaddleOCR, Google Vision, Hybrid) and Gemini module | Engine changes, adding new engines |
| `extractors/` | Domain-specific data extraction (receipts, fuel receipts, maintenance manuals) | Adding new extraction types, modifying extraction logic |
| `models/` | Data models and schemas | Request/response types |
| `patterns/` | Regex and parsing patterns | Pattern matching rules |
| `patterns/` | Regex patterns and service name mapping (27 maintenance subtypes) | Pattern matching rules, service categorization |
| `preprocessors/` | Image preprocessing pipeline | Image preparation before OCR |
| `routers/` | FastAPI route handlers | API endpoint changes |
| `services/` | Business logic services | Core OCR processing |
| `table_extraction/` | Table detection and parsing | Structured data extraction |
| `routers/` | FastAPI route handlers (/extract, /extract/receipt, /extract/manual, /jobs) | API endpoint changes |
| `services/` | Business logic services (job queue with Redis) | Core OCR processing, async job management |
| `table_extraction/` | Table detection and parsing | Structured data extraction from images |
| `validators/` | Input validation | Validation rules |