chore: add PDF support to receipt OCR pipeline (refs #182)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 38s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 22s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped

The receipt extractor only accepted image MIME types, rejecting PDFs at
the OCR layer. Added application/pdf to supported types and PDF-to-image
conversion (first page at 300 DPI) before OCR preprocessing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Eric Gullickson
2026-02-13 21:22:40 -06:00
parent 83bacf0e2f
commit 653c535165
3 changed files with 33 additions and 4 deletions

View File

@@ -98,7 +98,7 @@ class MaintenanceReceiptExtractor:
"""Extract maintenance receipt fields from an image.
Args:
image_bytes: Raw image bytes (HEIC, JPEG, PNG).
image_bytes: Raw image or PDF bytes (HEIC, JPEG, PNG, PDF).
content_type: MIME type (auto-detected if not provided).
Returns: