All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 8m4s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 24s
Deploy to Staging / Verify Staging (pull_request) Successful in 9s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 9s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
26 lines
1.6 KiB
Markdown
26 lines
1.6 KiB
Markdown
# ocr/app/
|
|
|
|
Python OCR microservice (FastAPI). Primary engine: PaddleOCR PP-OCRv4 with optional Google Vision cloud fallback. Gemini 2.5 Flash for maintenance manual PDF extraction (standalone module, not an OcrEngine subclass).
|
|
|
|
## Files
|
|
|
|
| File | What | When to read |
|
|
| ---- | ---- | ------------ |
|
|
| `main.py` | FastAPI application entry point | Route registration, app setup |
|
|
| `config.py` | Configuration settings (OCR engines, Google GenAI, Redis, Vision API limits) | Environment variables, settings |
|
|
| `__init__.py` | Package init | Package structure |
|
|
|
|
## Subdirectories
|
|
|
|
| Directory | What | When to read |
|
|
| --------- | ---- | ------------ |
|
|
| `engines/` | OCR engine abstraction (PaddleOCR, Google Vision, Hybrid) and Gemini module | Engine changes, adding new engines |
|
|
| `extractors/` | Domain-specific data extraction (receipts, fuel receipts, maintenance manuals) | Adding new extraction types, modifying extraction logic |
|
|
| `models/` | Data models and schemas | Request/response types |
|
|
| `patterns/` | Regex patterns and service name mapping (27 maintenance subtypes) | Pattern matching rules, service categorization |
|
|
| `preprocessors/` | Image preprocessing pipeline | Image preparation before OCR |
|
|
| `routers/` | FastAPI route handlers (/extract, /extract/receipt, /extract/manual, /decode, /jobs) | API endpoint changes |
|
|
| `services/` | Business logic services (job queue with Redis) | Core OCR processing, async job management |
|
|
| `table_extraction/` | Table detection and parsing | Structured data extraction from images |
|
|
| `validators/` | Input validation | Validation rules |
|