motovaultpro

Author	SHA1	Message	Date
Eric Gullickson	5e4515da7c	fix: use PyMuPDF instead of pdf2image for PDF-to-image conversion (refs #182 ) All checks were successful Deploy to Staging / Build Images (pull_request) Successful in 37s Details Deploy to Staging / Deploy to Staging (pull_request) Successful in 52s Details Deploy to Staging / Verify Staging (pull_request) Successful in 9s Details Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s Details Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped Details pdf2image requires poppler-utils which is not installed in the OCR container. PyMuPDF is already in requirements.txt and can render PDF pages to PNG at 300 DPI natively without extra system dependencies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 21:34:17 -06:00
Eric Gullickson	653c535165	chore: add PDF support to receipt OCR pipeline (refs #182 ) All checks were successful Deploy to Staging / Build Images (pull_request) Successful in 38s Details Deploy to Staging / Deploy to Staging (pull_request) Successful in 22s Details Deploy to Staging / Verify Staging (pull_request) Successful in 8s Details Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s Details Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped Details The receipt extractor only accepted image MIME types, rejecting PDFs at the OCR layer. Added application/pdf to supported types and PDF-to-image conversion (first page at 300 DPI) before OCR preprocessing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 21:22:40 -06:00
Eric Gullickson	013fb0c67a	feat: migrate VIN/receipt extractors and OCR service to engine abstraction (refs #117 ) Replace direct pytesseract calls with OcrEngine interface in vin_extractor.py, receipt_extractor.py, and ocr_service.py. PSM mode fallbacks replaced with engine-agnostic single-line/single-word configs. Dead _process_ocr_data removed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-07 10:56:27 -06:00
Eric Gullickson	6319d50fb1	feat: add receipt OCR pipeline (refs #69 ) All checks were successful Deploy to Staging / Build Images (pull_request) Successful in 32s Details Deploy to Staging / Deploy to Staging (pull_request) Successful in 31s Details Deploy to Staging / Verify Staging (pull_request) Successful in 2m20s Details Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s Details Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped Details Implement receipt-specific OCR extraction for fuel receipts: - Pattern matching modules for date, currency, and fuel data extraction - Receipt-optimized image preprocessing for thermal receipts - POST /extract/receipt endpoint with field extraction - Confidence scoring per extracted field - Cross-validation of fuel receipt data - Unit tests for all pattern matchers Extracted fields: merchantName, transactionDate, totalAmount, fuelQuantity, pricePerUnit, fuelGrade Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-01 20:43:30 -06:00

4 Commits