motovaultpro/ocr/app/preprocessors/vin_preprocessor.py at 9ce08cbb8917ca89fb67b421105105a5705aa97a

egullickson/motovaultpro

Fork 0

Files

Eric Gullickson 6a4c2137f7

Deploy to Staging / Build Images (pull_request) Successful in 35s

Details

Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s

Details

Deploy to Staging / Verify Staging (pull_request) Successful in 2m31s

Details

Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s

Details

Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped

Details

fix: resolve VIN OCR scanning failures on all images (refs #113 )

Root cause: Tesseract fragments VINs into multiple words but candidate
extraction required continuous 17-char sequences, rejecting all results.

Changes:
- Fix candidate extraction to concatenate adjacent OCR fragments
- Disable Tesseract dictionaries (VINs are not dictionary words)
- Set OEM 1 (LSTM engine) for better accuracy
- Add PSM 11 (sparse text) and PSM 13 (raw line) fallback modes
- Add Otsu's thresholding as alternative preprocessing pipeline
- Upscale small images to meet Tesseract's 300 DPI requirement
- Remove incorrect B->8 and S->5 transliterations (valid VIN chars)
- Fix pre-existing test bug in check digit expected value

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-06 15:57:14 -06:00

12 KiB

Raw Blame History

View Raw

12 KiB Raw Blame History

12 KiB

Raw Blame History