chore: add PDF support to receipt OCR pipeline (refs #182)

The receipt extractor only accepted image MIME types, rejecting PDFs at the OCR layer. Added application/pdf to supported types and PDF-to-image conversion (first page at 300 DPI) before OCR preprocessing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 21:22:40 -06:00
parent 83bacf0e2f
commit 653c535165
3 changed files with 33 additions and 4 deletions
--- a/ocr/app/routers/extract.py
+++ b/ocr/app/routers/extract.py
@@ -281,9 +281,9 @@ async def extract_maintenance_receipt(
    - Gemini semantic field extraction from OCR text
    - Regex cross-validation for dates, amounts, odometer

-    Supports HEIC, JPEG, PNG formats.
+    Supports HEIC, JPEG, PNG, and PDF formats.

-    - **file**: Maintenance receipt image file (max 10MB)
+    - **file**: Maintenance receipt image or PDF file (max 10MB)

    Returns:
    - **receiptType**: "maintenance"