feat: add receipt OCR pipeline (refs #69)

Implement receipt-specific OCR extraction for fuel receipts: - Pattern matching modules for date, currency, and fuel data extraction - Receipt-optimized image preprocessing for thermal receipts - POST /extract/receipt endpoint with field extraction - Confidence scoring per extracted field - Cross-validation of fuel receipt data - Unit tests for all pattern matchers Extracted fields: merchantName, transactionDate, totalAmount, fuelQuantity, pricePerUnit, fuelGrade Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 20:43:30 -06:00
parent a2f0abb14c
commit 6319d50fb1
16 changed files with 2845 additions and 2 deletions
--- a/ocr/app/preprocessors/init.py
+++ b/ocr/app/preprocessors/init.py
@@ -1,10 +1,16 @@
 """Image preprocessors for OCR optimization."""
 from app.services.preprocessor import ImagePreprocessor, preprocessor
 from app.preprocessors.vin_preprocessor import VinPreprocessor, vin_preprocessor
+from app.preprocessors.receipt_preprocessor import (
+    ReceiptPreprocessor,
+    receipt_preprocessor,
+)

 __all__ = [
    "ImagePreprocessor",
    "preprocessor",
    "VinPreprocessor",
    "vin_preprocessor",
+    "ReceiptPreprocessor",
+    "receipt_preprocessor",
 ]