fix: VIN OCR scanning fails with "No VIN Pattern found" on all images (#113) #114

Merged
egullickson merged 15 commits from issue-113-fix-vin-ocr-scanning into main 2026-02-07 15:47:37 +00:00
Owner

Fixes #113

Summary

VIN scanning from the "Add Vehicle" screen failed on all images with "No VIN Pattern found in image". Root cause: Tesseract fragments VINs into multiple words (e.g., "1HGBH 41JXMN 109186") but candidate extraction required a continuous 17-char sequence, rejecting everything.

Changes

  • Fix candidate extraction (vin_validator.py): Two-strategy approach -- continuous run matching + sliding window concatenation of adjacent OCR fragments
  • Fix Tesseract config (vin_extractor.py): Disable dictionaries, set LSTM engine (OEM 1)
  • Add PSM 11/13 fallbacks (vin_extractor.py): Sparse text and raw line modes for difficult images
  • Add DPI upscaling (vin_preprocessor.py): Upscale images below 600px width for Tesseract accuracy
  • Add Otsu's thresholding (vin_preprocessor.py): Alternative preprocessing fallback when adaptive thresholding fails
  • Fix transliterations (vin_validator.py): Remove incorrect B->8 and S->5 mappings (valid VIN chars)

Test Plan

  • 41 VIN-related tests pass (25 validator + 16 preprocessor)
  • New tests for fragmented VINs, Otsu preprocessing, resolution upscaling
  • No regressions in unrelated OCR tests (8 pre-existing failures unrelated to VIN)
  • End-to-end test on iPhone Safari with door jamb VIN photo
  • End-to-end test on desktop Chrome with VIN plate photo
Fixes #113 ## Summary VIN scanning from the "Add Vehicle" screen failed on **all images** with "No VIN Pattern found in image". Root cause: Tesseract fragments VINs into multiple words (e.g., "1HGBH 41JXMN 109186") but candidate extraction required a continuous 17-char sequence, rejecting everything. ## Changes - **Fix candidate extraction** (`vin_validator.py`): Two-strategy approach -- continuous run matching + sliding window concatenation of adjacent OCR fragments - **Fix Tesseract config** (`vin_extractor.py`): Disable dictionaries, set LSTM engine (OEM 1) - **Add PSM 11/13 fallbacks** (`vin_extractor.py`): Sparse text and raw line modes for difficult images - **Add DPI upscaling** (`vin_preprocessor.py`): Upscale images below 600px width for Tesseract accuracy - **Add Otsu's thresholding** (`vin_preprocessor.py`): Alternative preprocessing fallback when adaptive thresholding fails - **Fix transliterations** (`vin_validator.py`): Remove incorrect B->8 and S->5 mappings (valid VIN chars) ## Test Plan - [x] 41 VIN-related tests pass (25 validator + 16 preprocessor) - [x] New tests for fragmented VINs, Otsu preprocessing, resolution upscaling - [x] No regressions in unrelated OCR tests (8 pre-existing failures unrelated to VIN) - [ ] End-to-end test on iPhone Safari with door jamb VIN photo - [ ] End-to-end test on desktop Chrome with VIN plate photo
egullickson added 2 commits 2026-02-06 21:57:54 +00:00
fix: resolve VIN OCR scanning failures on all images (refs #113)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 35s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 2m31s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
6a4c2137f7
Root cause: Tesseract fragments VINs into multiple words but candidate
extraction required continuous 17-char sequences, rejecting all results.

Changes:
- Fix candidate extraction to concatenate adjacent OCR fragments
- Disable Tesseract dictionaries (VINs are not dictionary words)
- Set OEM 1 (LSTM engine) for better accuracy
- Add PSM 11 (sparse text) and PSM 13 (raw line) fallback modes
- Add Otsu's thresholding as alternative preprocessing pipeline
- Upscale small images to meet Tesseract's 300 DPI requirement
- Remove incorrect B->8 and S->5 transliterations (valid VIN chars)
- Fix pre-existing test bug in check digit expected value

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
egullickson added 1 commit 2026-02-07 01:36:47 +00:00
fix: align VIN OCR logging with unified logging design (refs #113)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 3m25s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 2m36s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 9s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
d5696320f1
Replace filesystem-based debug system (VIN_DEBUG_DIR) with standard
logger.debug() calls that flow through Loki when LOG_LEVEL=DEBUG.
Use .env.logging variable for OCR LOG_LEVEL. Increase image capture
quality to 0.95 for better OCR accuracy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
egullickson added 1 commit 2026-02-07 01:53:45 +00:00
fix: Postgres Data paths
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 35s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 19s
Deploy to Staging / Verify Staging (pull_request) Successful in 2m30s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
3f0e243087
egullickson added 1 commit 2026-02-07 02:20:22 +00:00
fix: Fixed debug env variable.
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 35s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 50s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
488a267fc7
egullickson added 1 commit 2026-02-07 02:26:10 +00:00
fix: add debug image saving gated on LOG_LEVEL=debug (refs #113)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 36s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 21s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
ff3858f750
Save original, adaptive, and Otsu preprocessed images to
/tmp/vin-debug/{timestamp}/ when LOG_LEVEL is set to debug.
No images saved at info level. Volume mount added for access.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
egullickson added 1 commit 2026-02-07 02:42:09 +00:00
fix: Debug variables
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 35s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
9ce08cbb89
egullickson added 1 commit 2026-02-07 02:55:17 +00:00
fix: Mobile image crop fix
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 3m20s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 9s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
ce2a8d88f9
egullickson added 1 commit 2026-02-07 03:15:12 +00:00
fix: use best-contrast color channel for VIN preprocessing (refs #113)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 36s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 1m7s
Deploy to Staging / Verify Staging (pull_request) Successful in 10s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 9s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
0de34983bb
White text on green VIN stickers has only ~12% contrast in standard
grayscale conversion because the green channel dominates luminance.
The new _best_contrast_channel method evaluates each RGB channel's
standard deviation and selects the one with highest contrast, giving
~2x improvement for green-tinted VIN stickers. Falls back to standard
grayscale for neutral-colored images.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
egullickson added 1 commit 2026-02-07 03:23:48 +00:00
fix: use min-channel grayscale and morphological cleanup for VIN OCR (refs #113)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 35s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
a07ec324fe
Replace std-based channel selection (which incorrectly picked green for
green-tinted VIN stickers) with per-pixel min(B,G,R). White text stays
255 in all channels while colored backgrounds drop to their weakest
channel value, giving 2x contrast improvement. Add morphological
opening after thresholding to remove noise speckles from car body
surface that were confusing Tesseract's page segmentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
egullickson added 1 commit 2026-02-07 03:32:57 +00:00
fix: always use min-channel and add grayscale-only OCR path (refs #113)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 35s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 50s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
63c027a454
Two fixes:
1. Always use min-channel for color images instead of gated comparison
   that was falling back to standard grayscale (which has only 23%
   contrast for white-on-green VIN stickers).
2. Add grayscale-only OCR path (CLAHE + denoise, no thresholding)
   between adaptive and Otsu attempts. Tesseract's LSTM engine is
   designed to handle grayscale input directly and often outperforms
   binarized input where thresholding creates artifacts.

Pipeline order: adaptive threshold → grayscale-only → Otsu threshold

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
egullickson added 1 commit 2026-02-07 03:39:52 +00:00
fix: invert min-channel so Tesseract gets dark-on-light text (refs #113)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 35s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
ae5221c759
The min-channel correctly extracts contrast (white text=255 vs green
sticker bg=130), but Tesseract expects dark text on light background.
Without inversion, the grayscale-only path returned empty text for
every PSM mode because Tesseract couldn't see bright-on-dark text.
Invert via bitwise_not: text becomes 0 (black), sticker bg becomes
125 (gray). Fixes all three OCR paths (adaptive, grayscale, Otsu).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
egullickson added 1 commit 2026-02-07 03:52:12 +00:00
fix: remove char whitelist incompatible with Tesseract LSTM (refs #113)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 36s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
432b3bda36
tessedit_char_whitelist does not work with OEM 1 (LSTM engine) and
causes empty/erratic output. This was the root cause of Tesseract
returning empty text despite clear, well-preprocessed images.
Character filtering is already handled post-OCR by the VIN validator's
correct_ocr_errors() method (I->1, O->0, Q->0, etc).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
egullickson added 1 commit 2026-02-07 04:00:11 +00:00
fix: extract VIN from noisy OCR via sliding window + char deletion (refs #113)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 37s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
e4336ce9da
When OCR reads extra characters (e.g. sticker border as 'C', spurious
'Z' insertion), the raw text exceeds 17 chars and the old first-17
trim produced wrong VINs. New strategy tries all 17-char sliding
windows and single/double character deletions, validating each via
check digit. For 'CWVGGNPE2Z4NP069500', this finds the correct VIN
'WVGGNPE24NP069500' (valid check digit) instead of 'CWVGGNPE2Z4NP0695'
(invalid).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
egullickson added 1 commit 2026-02-07 04:15:53 +00:00
chore: Change crop to remove locked aspect ratio
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 3m21s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 22s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
75ce316aa5
egullickson merged commit 6b0c18a41c into main 2026-02-07 15:47:37 +00:00
egullickson deleted branch issue-113-fix-vin-ocr-scanning 2026-02-07 15:47:37 +00:00
Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: egullickson/motovaultpro#114