feat: Migrate VIN extractor to engine abstraction (#115) #117

Closed
opened 2026-02-07 16:12:52 +00:00 by egullickson · 0 comments
Owner

Relates to #115

Migrate vin_extractor.py and ocr_service.py to use the new OcrEngine abstraction instead of calling pytesseract directly. PaddleOCR becomes primary engine.

Changes

  • Modify ocr/app/extractors/vin_extractor.py - Replace pytesseract.image_to_data() calls with engine.recognize()
  • Modify ocr/app/services/ocr_service.py - Replace pytesseract calls with engine interface
  • Update ocr/app/extractors/receipt_extractor.py - Same migration (if uses Tesseract directly)
  • Preserve VIN preprocessing pipeline (vin_preprocessor.py unchanged)
  • Preserve VIN validation pipeline (vin_validator.py unchanged)

Acceptance Criteria

  • VIN extractor uses engine.recognize() instead of pytesseract directly
  • Generic OCR service uses engine interface
  • PSM mode fallback strategy adapted for PaddleOCR (angle detection replaces PSM modes)
  • VIN character whitelist equivalent implemented for PaddleOCR
  • Confidence scoring works with PaddleOCR output format
  • Receipt and manual extraction endpoints still function (no regression)
Relates to #115 Migrate `vin_extractor.py` and `ocr_service.py` to use the new OcrEngine abstraction instead of calling pytesseract directly. PaddleOCR becomes primary engine. ## Changes - Modify `ocr/app/extractors/vin_extractor.py` - Replace `pytesseract.image_to_data()` calls with `engine.recognize()` - Modify `ocr/app/services/ocr_service.py` - Replace `pytesseract` calls with engine interface - Update `ocr/app/extractors/receipt_extractor.py` - Same migration (if uses Tesseract directly) - Preserve VIN preprocessing pipeline (vin_preprocessor.py unchanged) - Preserve VIN validation pipeline (vin_validator.py unchanged) ## Acceptance Criteria - [ ] VIN extractor uses engine.recognize() instead of pytesseract directly - [ ] Generic OCR service uses engine interface - [ ] PSM mode fallback strategy adapted for PaddleOCR (angle detection replaces PSM modes) - [ ] VIN character whitelist equivalent implemented for PaddleOCR - [ ] Confidence scoring works with PaddleOCR output format - [ ] Receipt and manual extraction endpoints still function (no regression)
egullickson added the
status
backlog
type
feature
labels 2026-02-07 16:13:23 +00:00
egullickson added this to the Sprint 2026-02-02 milestone 2026-02-07 16:13:31 +00:00
egullickson added
status
in-progress
and removed
status
backlog
labels 2026-02-07 17:30:56 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: egullickson/motovaultpro#117