feat: Maintenance receipt extraction pipeline in OCR microservice (#16) #150
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Relates to #16
Summary
Add maintenance receipt extraction to the Python OCR microservice. Uses Gemini-primary extraction (send OCR text to Gemini for all field extraction) with regex cross-validation for structured fields (dates, amounts, odometer).
Scope
MaintenanceReceiptExtractorclass (mirrorsFuelReceiptExtractorpattern)POST /extract/maintenance-receiptReceiptExtractionResultdataclass andReceiptExtractionResponsePydantic modelFiles
ocr/app/extractors/maintenance_receipt_extractor.py(NEW)ocr/app/patterns/maintenance_receipt_validation.py(NEW - cross-validation patterns)ocr/app/routers/extract.py(MODIFY - add new route)Technical Notes
FuelReceiptExtractorpattern: extract first (via Gemini), cross-validate second (via regex)manual_extractor.pypatternReceiptExtractionResultdataclass (genericdict[str, ExtractedField]supports arbitrary field keys)ReceiptExtractionResponsePydantic model (same response shape,receiptType: "maintenance")maintenance_receipt_validation.py(NOTmaintenance_receipt_patterns.pyto avoid collision with existingmaintenance_patterns.py)Acceptance Criteria
Milestone: Implementation Complete
Phase: Execution | Agent: Developer | Status: PASS
Summary
All three milestones implemented in a single commit (
90401dc):Milestone 1: Cross-validation patterns
ocr/app/patterns/maintenance_receipt_validation.py(NEW)MaintenanceReceiptValidatorclass with per-field validationdate_matcher(existing patterns)currency_matcher(existing patterns)MaintenanceReceiptValidationwith per-field confidence adjustmentsMilestone 2: Maintenance receipt extractor
ocr/app/extractors/maintenance_receipt_extractor.py(NEW)MaintenanceReceiptExtractorclass followingFuelReceiptExtractorpatternMilestone 3: New endpoint
ocr/app/routers/extract.py(MODIFIED)POST /extract/maintenance-receiptendpointReceiptExtractionResponsePydantic model (receiptType: "maintenance")/receiptendpoint validation pattern (400/413/422 errors)extractors/__init__.pyandpatterns/__init__.pyexportsTechnical Decisions
GeminiEngine)receipt_extractor.extract()for OCR, then replaces fields with Gemini resultsVerdict: PASS | Next: QR post-implementation review