From ab0d8463be97b6f991ae225660819f6b52ad66ac Mon Sep 17 00:00:00 2001 From: Eric Gullickson <16152721+ericgullickson@users.noreply.github.com> Date: Wed, 11 Feb 2026 11:04:19 -0600 Subject: [PATCH] docs: update CLAUDE.md indexes and README for OCR expansion (refs #137) Add/update documentation across backend, Python OCR service, and frontend for receipt scanning, manual extraction, and Gemini integration. Create new CLAUDE.md files for engines/, fuel-logs/, documents/, and maintenance/ features. Co-Authored-By: Claude Opus 4.6 --- backend/src/core/CLAUDE.md | 2 +- backend/src/features/CLAUDE.md | 2 +- backend/src/features/ocr/CLAUDE.md | 39 ++++- backend/src/features/ocr/README.md | 180 +++++++++++++++++--- frontend/src/features/CLAUDE.md | 6 +- frontend/src/features/documents/CLAUDE.md | 49 ++++++ frontend/src/features/fuel-logs/CLAUDE.md | 48 ++++++ frontend/src/features/maintenance/CLAUDE.md | 51 ++++++ ocr/CLAUDE.md | 4 +- ocr/app/CLAUDE.md | 16 +- ocr/app/engines/CLAUDE.md | 33 ++++ 11 files changed, 385 insertions(+), 45 deletions(-) create mode 100644 frontend/src/features/documents/CLAUDE.md create mode 100644 frontend/src/features/fuel-logs/CLAUDE.md create mode 100644 frontend/src/features/maintenance/CLAUDE.md create mode 100644 ocr/app/engines/CLAUDE.md diff --git a/backend/src/core/CLAUDE.md b/backend/src/core/CLAUDE.md index cf75124..8b25831 100644 --- a/backend/src/core/CLAUDE.md +++ b/backend/src/core/CLAUDE.md @@ -14,7 +14,7 @@ | `config/` | Configuration loading (env, database, redis) | Environment setup, connection pools | | `logging/` | Winston structured logging | Log configuration, debugging | | `middleware/` | Fastify middleware | Request processing, user extraction | -| `plugins/` | Fastify plugins (auth, error, logging) | Plugin registration, hooks | +| `plugins/` | Fastify plugins (auth, error, logging, tier guard) | Plugin registration, hooks, tier gating | | `scheduler/` | Job scheduling infrastructure | Scheduled tasks, cron jobs | | `storage/` | Storage abstraction and adapters | File storage, S3/filesystem | | `user-preferences/` | User preferences data and migrations | User settings storage | diff --git a/backend/src/features/CLAUDE.md b/backend/src/features/CLAUDE.md index da31caf..1576a6d 100644 --- a/backend/src/features/CLAUDE.md +++ b/backend/src/features/CLAUDE.md @@ -12,7 +12,7 @@ | `fuel-logs/` | Fuel consumption tracking | Fuel log CRUD, statistics | | `maintenance/` | Maintenance record management | Service records, reminders | | `notifications/` | Email and push notifications | Alert system, email templates | -| `ocr/` | OCR proxy to mvp-ocr service | Image text extraction, async jobs | +| `ocr/` | OCR proxy to mvp-ocr service (VIN, receipt, manual extraction) | Image text extraction, receipt scanning, manual PDF extraction, async jobs | | `onboarding/` | User onboarding flow | First-time user setup | | `ownership-costs/` | Ownership cost tracking and reports | Cost aggregation, expense analysis | | `platform/` | Vehicle data and VIN decoding | Make/model lookup, VIN validation | diff --git a/backend/src/features/ocr/CLAUDE.md b/backend/src/features/ocr/CLAUDE.md index 9c0d6cd..e57bce8 100644 --- a/backend/src/features/ocr/CLAUDE.md +++ b/backend/src/features/ocr/CLAUDE.md @@ -1,16 +1,47 @@ # ocr/ +Backend proxy for the Python OCR microservice. Handles authentication, tier gating, file validation, and request forwarding for VIN extraction, fuel receipt scanning, and maintenance manual extraction. + ## Files | File | What | When to read | | ---- | ---- | ------------ | -| `README.md` | Feature documentation | Understanding OCR proxy | +| `README.md` | Feature documentation with architecture diagrams | Understanding OCR proxy, data flows | | `index.ts` | Feature barrel export | Importing OCR services | ## Subdirectories | Directory | What | When to read | | --------- | ---- | ------------ | -| `api/` | HTTP endpoints and routes | API changes | -| `domain/` | Business logic, types | Core OCR proxy logic | -| `external/` | External OCR service client | OCR service integration | +| `api/` | HTTP endpoints, routes, request validation | API changes, adding endpoints | +| `domain/` | Business logic, TypeScript types | Core OCR proxy logic, type definitions | +| `external/` | HTTP client to Python OCR service | OCR service integration, error handling | +| `tests/` | Unit tests for receipt and manual extraction | Test changes, adding test coverage | + +## api/ + +| File | What | When to read | +| ---- | ---- | ------------ | +| `ocr.controller.ts` | Request handlers for all OCR endpoints (extract, extractVin, extractReceipt, extractManual, submitJob, getJobStatus) | Adding/modifying endpoint behavior | +| `ocr.routes.ts` | Fastify route registration with auth and tier guard preHandlers | Route configuration, middleware changes | +| `ocr.validation.ts` | Request/response type definitions for route schemas | Changing request/response shapes | + +## domain/ + +| File | What | When to read | +| ---- | ---- | ------------ | +| `ocr.service.ts` | Business logic layer: file validation, size limits (10MB sync, 200MB async), content type checks, service delegation | Core logic changes, validation rules | +| `ocr.types.ts` | TypeScript types: OcrResponse, VinExtractionResponse, ReceiptExtractionResponse, ManualExtractionResult, JobResponse, ManualJobResponse | Type changes, adding new response shapes | + +## external/ + +| File | What | When to read | +| ---- | ---- | ------------ | +| `ocr-client.ts` | HTTP client to mvp-ocr Python service (extract, extractVin, extractReceipt, submitJob, submitManualJob, getJobStatus, isHealthy) | OCR service communication, error handling | + +## tests/ + +| File | What | When to read | +| ---- | ---- | ------------ | +| `unit/ocr-receipt.test.ts` | Receipt extraction tests with mock client | Receipt flow changes | +| `unit/ocr-manual.test.ts` | Manual PDF extraction tests | Manual extraction flow changes | diff --git a/backend/src/features/ocr/README.md b/backend/src/features/ocr/README.md index 20442a4..83b4d65 100644 --- a/backend/src/features/ocr/README.md +++ b/backend/src/features/ocr/README.md @@ -1,54 +1,180 @@ # OCR Feature -Backend proxy for OCR service communication. Handles authentication, validation, and file streaming to the OCR container. +Backend proxy for the Python OCR microservice. Handles authentication, tier gating, file validation, and request forwarding for three extraction types: VIN decoding, fuel receipt scanning, and maintenance manual extraction. ## API Endpoints -| Method | Endpoint | Description | -|--------|----------|-------------| -| POST | `/api/ocr/extract` | Synchronous OCR extraction (max 10MB) | -| POST | `/api/ocr/jobs` | Submit async OCR job (max 200MB) | -| GET | `/api/ocr/jobs/:jobId` | Poll async job status | +| Method | Endpoint | Description | Auth | Tier | Max Size | +|--------|----------|-------------|------|------|----------| +| POST | `/api/ocr/extract` | Synchronous general OCR extraction | Required | - | 10MB | +| POST | `/api/ocr/extract/vin` | VIN-specific extraction | Required | - | 10MB | +| POST | `/api/ocr/extract/receipt` | Fuel receipt extraction | Required | - | 10MB | +| POST | `/api/ocr/extract/manual` | Async maintenance manual extraction | Required | Pro | 200MB | +| POST | `/api/ocr/jobs` | Submit async OCR job | Required | - | 200MB | +| GET | `/api/ocr/jobs/:jobId` | Poll async job status | Required | - | - | ## Architecture ``` -api/ - ocr.controller.ts # Request handlers - ocr.routes.ts # Route registration - ocr.validation.ts # Request validation types -domain/ - ocr.service.ts # Business logic - ocr.types.ts # TypeScript types -external/ - ocr-client.ts # HTTP client to OCR service +Frontend + | + v +Backend Proxy (this feature) + | + +-- ocr.routes.ts --------> Route registration (auth + tier preHandlers) + | + +-- ocr.controller.ts ----> Request handlers (file validation, size checks) + | + +-- ocr.service.ts -------> Business logic (content type validation, delegation) + | + +-- ocr-client.ts --------> HTTP client to mvp-ocr:8000 + | + v + Python OCR Service ``` +## Receipt OCR Flow + +``` +Mobile Camera / File Upload + | + v +POST /api/ocr/extract/receipt (multipart/form-data) + | + v +OcrController.extractReceipt() + - Validates file size (<= 10MB) + - Validates content type (JPEG, PNG, HEIC) + | + v +OcrService.extractReceipt() + | + v +OcrClient.extractReceipt() --> HTTP POST --> Python /extract/receipt + | | + v v +ReceiptExtractionResponse ReceiptExtractor + HybridEngine + | (Vision API / PaddleOCR fallback) + v +Frontend receives extractedFields: + merchantName, transactionDate, totalAmount, + fuelQuantity, pricePerUnit, fuelGrade +``` + +After receipt extraction, the frontend calls `POST /api/stations/match` with the `merchantName` to auto-match a gas station via Google Places API. The station match is a separate request handled by the stations feature. + +## Manual Extraction Flow + +``` +PDF Upload + "Scan for Maintenance Schedule" + | + v +POST /api/ocr/extract/manual (multipart/form-data) + - Requires Pro tier (document.scanMaintenanceSchedule) + - Validates file size (<= 200MB) + - Validates content type (application/pdf) + - Validates PDF magic bytes (%PDF header) + | + v +OcrService.submitManualJob() + | + v +OcrClient.submitManualJob() --> HTTP POST --> Python /extract/manual + | | + v v +{ jobId, status: 'pending' } GeminiEngine (Vertex AI) + Gemini 2.5 Flash + Frontend polls: (structured JSON output) + GET /api/ocr/jobs/:jobId | + (progress: 10% -> 50% -> 95% -> 100%) v + | ManualExtractionResult + v { vehicleInfo, maintenanceSchedules[] } +ManualJobResponse with result + | + v +Frontend displays MaintenanceScheduleReviewScreen + - User selects/edits items + - Batch creates maintenance schedules +``` + +Jobs expire after 2 hours (Redis TTL). Expired job polling returns HTTP 410 Gone. + ## Supported File Types +### Sync Endpoints (extract, extractVin, extractReceipt) - HEIC (converted server-side) - JPEG - PNG -- PDF (first page only) -## Response Format +### Async Endpoints (extractManual) +- PDF (validated via magic bytes) +## Response Types + +### ReceiptExtractionResponse ```typescript -interface OcrResponse { +{ success: boolean; - documentType: 'vin' | 'receipt' | 'manual' | 'unknown'; + receiptType: string; + extractedFields: { + merchantName: { value: string; confidence: number }; + transactionDate: { value: string; confidence: number }; + totalAmount: { value: string; confidence: number }; + fuelQuantity: { value: string; confidence: number }; + pricePerUnit: { value: string; confidence: number }; + fuelGrade: { value: string; confidence: number }; + }; rawText: string; - confidence: number; // 0.0 - 1.0 - extractedFields: Record; processingTimeMs: number; } ``` -## Async Job Flow +### ManualJobResponse +```typescript +{ + jobId: string; + status: 'pending' | 'processing' | 'completed' | 'failed'; + progress?: { percent: number; message: string }; + estimatedSeconds?: number; + result?: ManualExtractionResult; + error?: string; +} +``` -1. POST `/api/ocr/jobs` with file -2. Receive `{ jobId, status: 'pending' }` -3. Poll GET `/api/ocr/jobs/:jobId` -4. When `status: 'completed'`, result contains OCR data +### ManualExtractionResult +```typescript +{ + success: boolean; + vehicleInfo?: { make: string; model: string; year: number }; + maintenanceSchedules: Array<{ + serviceName: string; + intervalMiles: number | null; + intervalMonths: number | null; + details: string; + confidence: number; + subtypes: string[]; + }>; + rawTables: any[]; + processingTimeMs: number; + totalPages: number; + pagesProcessed: number; +} +``` -Jobs expire after 1 hour. +## Error Handling + +The backend proxy translates Python service error codes: + +| Python Status | Backend Status | Meaning | +|---------------|----------------|---------| +| 413 | 413 | File too large | +| 415 | 415 | Unsupported media type | +| 422 | 422 | Extraction failed | +| 410 | 410 | Job expired (TTL) | +| Other | 500 | Internal server error | + +## Tier Gating + +Manual extraction requires Pro tier. The tier guard middleware (`requireTier` plugin) validates the user's subscription tier before processing. Free-tier users receive HTTP 403 with `TIER_REQUIRED` error code and an upgrade prompt. + +Receipt and VIN extraction are available to all tiers. diff --git a/frontend/src/features/CLAUDE.md b/frontend/src/features/CLAUDE.md index d0b3c7e..2480029 100644 --- a/frontend/src/features/CLAUDE.md +++ b/frontend/src/features/CLAUDE.md @@ -7,9 +7,9 @@ | `admin/` | Admin panel and catalog management | Admin UI, user management | | `auth/` | Authentication pages and components | Login, logout, auth flows | | `dashboard/` | Dashboard and fleet overview | Home page, summary widgets | -| `documents/` | Document management UI | File upload, document viewer | -| `fuel-logs/` | Fuel log tracking UI | Fuel entry forms, statistics | -| `maintenance/` | Maintenance record UI | Service tracking, reminders | +| `documents/` | Document management UI with maintenance manual extraction | File upload, document viewer, manual OCR extraction | +| `fuel-logs/` | Fuel log tracking UI with receipt OCR scanning | Fuel entry forms, receipt scanning, statistics | +| `maintenance/` | Maintenance record and schedule UI with OCR batch creation | Service tracking, extraction review, schedule management | | `notifications/` | Notification display | Alert UI, notification center | | `onboarding/` | Onboarding wizard | First-time user experience | | `ownership-costs/` | Ownership cost tracking UI | Cost displays, expense forms | diff --git a/frontend/src/features/documents/CLAUDE.md b/frontend/src/features/documents/CLAUDE.md new file mode 100644 index 0000000..b3b0e9c --- /dev/null +++ b/frontend/src/features/documents/CLAUDE.md @@ -0,0 +1,49 @@ +# documents/ + +Document management UI with maintenance manual extraction. Handles file uploads, document viewing, and PDF-based maintenance schedule extraction via Gemini. + +## Subdirectories + +| Directory | What | When to read | +| --------- | ---- | ------------ | +| `api/` | Document API endpoints | API integration | +| `components/` | Document forms, dialogs, preview, metadata display | UI changes | +| `hooks/` | Document CRUD, manual extraction, upload progress | Business logic | +| `mobile/` | Mobile-specific document layout | Mobile UI | +| `pages/` | DocumentsPage, DocumentDetailPage | Page layout | +| `types/` | TypeScript type definitions | Type changes | +| `utils/` | Utility functions (vehicle label formatting) | Helper logic | + +## Key Files + +| File | What | When to read | +| ---- | ---- | ------------ | +| `hooks/useManualExtraction.ts` | Manual extraction orchestration: submit PDF to /ocr/extract/manual, poll job status via /ocr/jobs/:jobId, return extraction results | Manual extraction flow, job polling | +| `components/DocumentForm.tsx` | Document metadata form with "Scan for Maintenance Schedule" checkbox (Pro tier) | Document upload, extraction trigger | +| `components/AddDocumentDialog.tsx` | Add document dialog integrating DocumentForm, upload progress, and manual extraction trigger | Document creation flow | +| `hooks/useDocuments.ts` | CRUD operations for documents | Document data management | +| `hooks/useUploadWithProgress.ts` | File upload with progress tracking | Upload UI | +| `components/DocumentPreview.tsx` | Document viewer/preview | Document display | +| `components/EditDocumentDialog.tsx` | Edit document metadata | Document editing | +| `types/documents.types.ts` | DocumentType, DocumentRecord, CreateDocumentRequest | Type definitions | + +## Manual Extraction Flow + +``` +DocumentForm ("Scan for Maintenance Schedule" checkbox, Pro tier) + | + v +AddDocumentDialog -> useManualExtraction.submit(file, vehicleId) + | + v +POST /api/ocr/extract/manual (async job) + | + v +Poll GET /api/ocr/jobs/:jobId (progress: 10% -> 50% -> 95% -> 100%) + | + v +Job completed -> MaintenanceScheduleReviewScreen (in maintenance/ feature) + | + v +User selects/edits items -> Batch create maintenance schedules +``` diff --git a/frontend/src/features/fuel-logs/CLAUDE.md b/frontend/src/features/fuel-logs/CLAUDE.md new file mode 100644 index 0000000..3bcbc3f --- /dev/null +++ b/frontend/src/features/fuel-logs/CLAUDE.md @@ -0,0 +1,48 @@ +# fuel-logs/ + +Fuel log tracking UI with receipt OCR scanning. Captures fuel purchases, calculates statistics, and supports camera-based receipt scanning that auto-extracts fields and matches gas stations. + +## Subdirectories + +| Directory | What | When to read | +| --------- | ---- | ------------ | +| `api/` | Fuel log API endpoints | API integration | +| `components/` | Form components, receipt OCR UI, stats display | UI changes | +| `hooks/` | Data fetching, receipt OCR orchestration, user settings | Business logic | +| `pages/` | FuelLogsPage | Page layout | +| `types/` | TypeScript type definitions | Type changes | + +## Key Files + +| File | What | When to read | +| ---- | ---- | ------------ | +| `hooks/useReceiptOcr.ts` | Receipt OCR orchestration: camera capture, OCR extraction via /ocr/extract/receipt, station matching via /stations/match, field mapping | Receipt scanning flow, OCR integration | +| `components/ReceiptOcrReviewModal.tsx` | Modal for reviewing OCR-extracted receipt fields with confidence indicators, inline editing, station match display | Receipt review UI, field editing | +| `components/ReceiptCameraButton.tsx` | Button to trigger receipt camera capture (tier-gated) | Receipt capture entry point | +| `components/FuelLogForm.tsx` | Main fuel log form with OCR integration (setValue from accepted receipt) | Form fields, OCR field mapping | +| `components/ReceiptPreview.tsx` | Receipt image preview | Receipt display | +| `components/StationPicker.tsx` | Gas station selection with search | Station selection UI | +| `components/FuelLogsList.tsx` | Fuel log list display | Log listing | +| `components/FuelStatsCard.tsx` | Fuel statistics summary | Statistics display | +| `hooks/useFuelLogs.tsx` | CRUD operations for fuel logs | Data management | +| `types/fuel-logs.types.ts` | FuelLogResponse, CreateFuelLogRequest, LocationData, UnitSystem | Type definitions | + +## Receipt OCR Flow + +``` +ReceiptCameraButton (tier check) + | + v +useReceiptOcr.startCapture() -> CameraCapture (shared component) + | + v +useReceiptOcr.processImage() -> POST /api/ocr/extract/receipt + | + v +ReceiptOcrReviewModal (display extracted fields, confidence indicators) + | + +-- POST /api/stations/match (merchantName -> station match) + | + v +useReceiptOcr.acceptResult() -> FuelLogForm.setValue() (pre-fill form) +``` diff --git a/frontend/src/features/maintenance/CLAUDE.md b/frontend/src/features/maintenance/CLAUDE.md new file mode 100644 index 0000000..cf70b9f --- /dev/null +++ b/frontend/src/features/maintenance/CLAUDE.md @@ -0,0 +1,51 @@ +# maintenance/ + +Maintenance record and schedule management UI. Supports manual schedule creation and batch creation from OCR-extracted maintenance data. Three categories: routine maintenance, repair, performance upgrade. + +## Subdirectories + +| Directory | What | When to read | +| --------- | ---- | ------------ | +| `api/` | Maintenance API endpoints | API integration | +| `components/` | Forms, lists, review screen, subtype selection | UI changes | +| `hooks/` | Data fetching, batch schedule creation from extraction | Business logic | +| `mobile/` | Mobile-specific maintenance layout | Mobile UI | +| `pages/` | MaintenancePage (tabs: records, schedules) | Page layout | +| `types/` | TypeScript type definitions (categories, subtypes, schedules) | Type changes | + +## Key Files + +| File | What | When to read | +| ---- | ---- | ------------ | +| `hooks/useCreateSchedulesFromExtraction.ts` | Batch-creates maintenance schedules from OCR extraction results, maps MaintenanceScheduleItem to CreateScheduleRequest | OCR-to-schedule creation flow | +| `components/MaintenanceScheduleReviewScreen.tsx` | Dialog for reviewing OCR-extracted maintenance items: checkboxes for selection, confidence indicators, inline editing, batch create action | Extraction review UI, item editing | +| `components/MaintenanceScheduleForm.tsx` | Form for manual schedule creation | Schedule creation UI | +| `components/MaintenanceRecordForm.tsx` | Form for manual record creation | Record creation UI | +| `components/MaintenanceSchedulesList.tsx` | Schedule list with edit/delete | Schedule display | +| `components/MaintenanceRecordsList.tsx` | Record list display | Record display | +| `components/SubtypeCheckboxGroup.tsx` | Multi-select checkbox group for maintenance subtypes (27 routine, repair, performance) | Subtype selection UI | +| `hooks/useMaintenanceRecords.ts` | CRUD operations for maintenance records and schedules | Data management | +| `types/maintenance.types.ts` | MaintenanceCategory, ScheduleType, ROUTINE_MAINTENANCE_SUBTYPES, MaintenanceSchedule | Type definitions, subtype constants | +| `components/MaintenanceScheduleReviewScreen.test.tsx` | Tests for extraction review screen | Test changes | + +## Extraction Review Flow + +``` +ManualExtractionResult (from documents/ feature useManualExtraction) + | + v +MaintenanceScheduleReviewScreen + - Displays extracted items with confidence scores + - Checkboxes for select/deselect + - Inline editing of service name, intervals, details + - Touch targets >= 44px for mobile + | + v +useCreateSchedulesFromExtraction.mutate(selectedItems) + | + v +POST /api/maintenance/schedules (batch create) + | + v +Query invalidation -> MaintenanceSchedulesList refreshes +``` diff --git a/ocr/CLAUDE.md b/ocr/CLAUDE.md index 1f3988d..e25dc65 100644 --- a/ocr/CLAUDE.md +++ b/ocr/CLAUDE.md @@ -1,6 +1,6 @@ # ocr/ -Python OCR microservice. Primary engine: PaddleOCR PP-OCRv4 with optional Google Vision cloud fallback. Pluggable engine abstraction in `app/engines/`. +Python OCR microservice. Primary engine: PaddleOCR PP-OCRv4 with optional Google Vision cloud fallback. Gemini 2.5 Flash for maintenance manual PDF extraction. Pluggable engine abstraction in `app/engines/`. ## Files @@ -14,5 +14,5 @@ Python OCR microservice. Primary engine: PaddleOCR PP-OCRv4 with optional Google | Directory | What | When to read | | --------- | ---- | ------------ | | `app/` | FastAPI application source | OCR endpoint development | -| `app/engines/` | Engine abstraction layer (OcrEngine ABC, factory, hybrid) | Adding or changing OCR engines | +| `app/engines/` | Engine abstraction layer (OcrEngine ABC, factory, hybrid) and Gemini module | Adding or changing OCR engines, Gemini integration | | `tests/` | Test suite | Adding or modifying tests | diff --git a/ocr/app/CLAUDE.md b/ocr/app/CLAUDE.md index 7d0441b..a91a2be 100644 --- a/ocr/app/CLAUDE.md +++ b/ocr/app/CLAUDE.md @@ -1,23 +1,25 @@ # ocr/app/ +Python OCR microservice (FastAPI). Primary engine: PaddleOCR PP-OCRv4 with optional Google Vision cloud fallback. Gemini 2.5 Flash for maintenance manual PDF extraction (standalone module, not an OcrEngine subclass). + ## Files | File | What | When to read | | ---- | ---- | ------------ | | `main.py` | FastAPI application entry point | Route registration, app setup | -| `config.py` | Configuration settings | Environment variables, settings | +| `config.py` | Configuration settings (OCR engines, Vertex AI, Redis, Vision API limits) | Environment variables, settings | | `__init__.py` | Package init | Package structure | ## Subdirectories | Directory | What | When to read | | --------- | ---- | ------------ | -| `engines/` | OCR engine abstraction (PaddleOCR primary, Google Vision fallback) | Engine changes, adding new engines | -| `extractors/` | Data extraction logic | Adding new extraction types | +| `engines/` | OCR engine abstraction (PaddleOCR, Google Vision, Hybrid) and Gemini module | Engine changes, adding new engines | +| `extractors/` | Domain-specific data extraction (receipts, fuel receipts, maintenance manuals) | Adding new extraction types, modifying extraction logic | | `models/` | Data models and schemas | Request/response types | -| `patterns/` | Regex and parsing patterns | Pattern matching rules | +| `patterns/` | Regex patterns and service name mapping (27 maintenance subtypes) | Pattern matching rules, service categorization | | `preprocessors/` | Image preprocessing pipeline | Image preparation before OCR | -| `routers/` | FastAPI route handlers | API endpoint changes | -| `services/` | Business logic services | Core OCR processing | -| `table_extraction/` | Table detection and parsing | Structured data extraction | +| `routers/` | FastAPI route handlers (/extract, /extract/receipt, /extract/manual, /jobs) | API endpoint changes | +| `services/` | Business logic services (job queue with Redis) | Core OCR processing, async job management | +| `table_extraction/` | Table detection and parsing | Structured data extraction from images | | `validators/` | Input validation | Validation rules | diff --git a/ocr/app/engines/CLAUDE.md b/ocr/app/engines/CLAUDE.md new file mode 100644 index 0000000..7df7de1 --- /dev/null +++ b/ocr/app/engines/CLAUDE.md @@ -0,0 +1,33 @@ +# ocr/app/engines/ + +OCR engine abstraction layer. Two categories of engines: + +1. **OcrEngine subclasses** (image-to-text): PaddleOCR, Google Vision, Hybrid. Accept image bytes, return text + confidence + word boxes. +2. **GeminiEngine** (PDF-to-structured-data): Standalone module for maintenance schedule extraction via Vertex AI. Accepts PDF bytes, returns structured JSON. Not an OcrEngine subclass because the interface signatures differ. + +## Files + +| File | What | When to read | +| ---- | ---- | ------------ | +| `__init__.py` | Public engine API exports (OcrEngine, create_engine, exceptions) | Importing engine interfaces | +| `base_engine.py` | OcrEngine ABC, OcrConfig, OcrEngineResult, WordBox, exception hierarchy | Engine interface contract, adding new engines | +| `paddle_engine.py` | PaddleOCR PP-OCRv4 primary engine | Local OCR debugging, accuracy tuning | +| `cloud_engine.py` | Google Vision TEXT_DETECTION fallback engine (WIF authentication) | Cloud OCR configuration, API quota | +| `hybrid_engine.py` | Combines primary + fallback engine with confidence threshold switching | Engine selection logic, fallback behavior | +| `engine_factory.py` | Factory function and engine registry for instantiation | Adding new engine types | +| `gemini_engine.py` | Gemini 2.5 Flash integration for maintenance schedule extraction (Vertex AI SDK, 20MB PDF limit, structured JSON output) | Manual extraction debugging, Gemini configuration | + +## Engine Selection + +``` +create_engine(config) + | + +-- Primary: PaddleOCR (local, fast, no API limits) + | + +-- Fallback: Google Vision (cloud, 1000/month limit) + | + v +HybridEngine (tries primary, falls back if confidence < threshold) +``` + +GeminiEngine is created independently by ManualExtractor, not through the engine factory.