fix: VIN OCR scanning fails with "No VIN Pattern found" on all images (#113) #114

egullickson · 2026-02-06T21:57:53Z

egullickson commented

2026-02-06 21:57:53 +00:00

Fixes #113

Summary

VIN scanning from the "Add Vehicle" screen failed on all images with "No VIN Pattern found in image". Root cause: Tesseract fragments VINs into multiple words (e.g., "1HGBH 41JXMN 109186") but candidate extraction required a continuous 17-char sequence, rejecting everything.

Changes

Fix candidate extraction (vin_validator.py): Two-strategy approach -- continuous run matching + sliding window concatenation of adjacent OCR fragments
Fix Tesseract config (vin_extractor.py): Disable dictionaries, set LSTM engine (OEM 1)
Add PSM 11/13 fallbacks (vin_extractor.py): Sparse text and raw line modes for difficult images
Add DPI upscaling (vin_preprocessor.py): Upscale images below 600px width for Tesseract accuracy
Add Otsu's thresholding (vin_preprocessor.py): Alternative preprocessing fallback when adaptive thresholding fails
Fix transliterations (vin_validator.py): Remove incorrect B->8 and S->5 mappings (valid VIN chars)

Test Plan

41 VIN-related tests pass (25 validator + 16 preprocessor)
New tests for fragmented VINs, Otsu preprocessing, resolution upscaling
No regressions in unrelated OCR tests (8 pre-existing failures unrelated to VIN)
End-to-end test on iPhone Safari with door jamb VIN photo
End-to-end test on desktop Chrome with VIN plate photo

Fixes #113 ## Summary VIN scanning from the "Add Vehicle" screen failed on **all images** with "No VIN Pattern found in image". Root cause: Tesseract fragments VINs into multiple words (e.g., "1HGBH 41JXMN 109186") but candidate extraction required a continuous 17-char sequence, rejecting everything. ## Changes - **Fix candidate extraction** (`vin_validator.py`): Two-strategy approach -- continuous run matching + sliding window concatenation of adjacent OCR fragments - **Fix Tesseract config** (`vin_extractor.py`): Disable dictionaries, set LSTM engine (OEM 1) - **Add PSM 11/13 fallbacks** (`vin_extractor.py`): Sparse text and raw line modes for difficult images - **Add DPI upscaling** (`vin_preprocessor.py`): Upscale images below 600px width for Tesseract accuracy - **Add Otsu's thresholding** (`vin_preprocessor.py`): Alternative preprocessing fallback when adaptive thresholding fails - **Fix transliterations** (`vin_validator.py`): Remove incorrect B->8 and S->5 mappings (valid VIN chars) ## Test Plan - [x] 41 VIN-related tests pass (25 validator + 16 preprocessor) - [x] New tests for fragmented VINs, Otsu preprocessing, resolution upscaling - [x] No regressions in unrelated OCR tests (8 pre-existing failures unrelated to VIN) - [ ] End-to-end test on iPhone Safari with door jamb VIN photo - [ ] End-to-end test on desktop Chrome with VIN plate photo

egullickson added 2 commits 2026-02-06 21:57:54 +00:00

chore: update context.json 45aaeab973

fix: resolve VIN OCR scanning failures on all images (refs #113 )

Deploy to Staging / Build Images (pull_request) Successful in 35s

Details

Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s

Details

Deploy to Staging / Verify Staging (pull_request) Successful in 2m31s

Details

Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s

Details

Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped

Details

6a4c2137f7

Root cause: Tesseract fragments VINs into multiple words but candidate
extraction required continuous 17-char sequences, rejecting all results.

Changes:
- Fix candidate extraction to concatenate adjacent OCR fragments
- Disable Tesseract dictionaries (VINs are not dictionary words)
- Set OEM 1 (LSTM engine) for better accuracy
- Add PSM 11 (sparse text) and PSM 13 (raw line) fallback modes
- Add Otsu's thresholding as alternative preprocessing pipeline
- Upscale small images to meet Tesseract's 300 DPI requirement
- Remove incorrect B->8 and S->5 transliterations (valid VIN chars)
- Fix pre-existing test bug in check digit expected value

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

egullickson referenced this pull request

2026-02-06 22:00:09 +00:00

fix: VIN OCR scanning fails with "No VIN Pattern found in image" on all images #113

egullickson added 1 commit 2026-02-07 01:36:47 +00:00

fix: align VIN OCR logging with unified logging design (refs #113 )

Deploy to Staging / Build Images (pull_request) Successful in 3m25s

Details

Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s

Details

Deploy to Staging / Verify Staging (pull_request) Successful in 2m36s

Details

Deploy to Staging / Notify Staging Ready (pull_request) Successful in 9s

Details

Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped

Details

d5696320f1

Replace filesystem-based debug system (VIN_DEBUG_DIR) with standard
logger.debug() calls that flow through Loki when LOG_LEVEL=DEBUG.
Use .env.logging variable for OCR LOG_LEVEL. Increase image capture
quality to 0.95 for better OCR accuracy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

egullickson added 1 commit 2026-02-07 01:53:45 +00:00

fix: Postgres Data paths

Deploy to Staging / Build Images (pull_request) Successful in 35s

Details

Deploy to Staging / Deploy to Staging (pull_request) Successful in 19s

Details

Deploy to Staging / Verify Staging (pull_request) Successful in 2m30s

Details

Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s

Details

Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped

Details

3f0e243087

egullickson added 1 commit 2026-02-07 02:20:22 +00:00

fix: Fixed debug env variable.

Deploy to Staging / Build Images (pull_request) Successful in 35s

Details

Deploy to Staging / Deploy to Staging (pull_request) Successful in 50s

Details

Deploy to Staging / Verify Staging (pull_request) Successful in 8s

Details

Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s

Details

Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped

Details

488a267fc7

egullickson added 1 commit 2026-02-07 02:26:10 +00:00

fix: add debug image saving gated on LOG_LEVEL=debug (refs #113 )

Deploy to Staging / Build Images (pull_request) Successful in 36s

Details

Deploy to Staging / Deploy to Staging (pull_request) Successful in 21s

Details

Deploy to Staging / Verify Staging (pull_request) Successful in 8s

Details

Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s

Details

Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped

Details

ff3858f750

Save original, adaptive, and Otsu preprocessed images to
/tmp/vin-debug/{timestamp}/ when LOG_LEVEL is set to debug.
No images saved at info level. Volume mount added for access.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

egullickson added 1 commit 2026-02-07 02:42:09 +00:00

fix: Debug variables

Deploy to Staging / Build Images (pull_request) Successful in 35s

Details

Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s

Details

Deploy to Staging / Verify Staging (pull_request) Successful in 8s

Details

Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s

Details

Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped

Details

9ce08cbb89

egullickson added 1 commit 2026-02-07 02:55:17 +00:00

fix: Mobile image crop fix

Deploy to Staging / Build Images (pull_request) Successful in 3m20s

Details

Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s

Details

Deploy to Staging / Verify Staging (pull_request) Successful in 9s

Details

Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s

Details

Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped

Details

ce2a8d88f9

egullickson added 1 commit 2026-02-07 03:15:12 +00:00

fix: use best-contrast color channel for VIN preprocessing (refs #113 )

Deploy to Staging / Build Images (pull_request) Successful in 36s

Details

Deploy to Staging / Deploy to Staging (pull_request) Successful in 1m7s

Details

Deploy to Staging / Verify Staging (pull_request) Successful in 10s

Details

Deploy to Staging / Notify Staging Ready (pull_request) Successful in 9s

Details

Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped

Details

0de34983bb

White text on green VIN stickers has only ~12% contrast in standard
grayscale conversion because the green channel dominates luminance.
The new _best_contrast_channel method evaluates each RGB channel's
standard deviation and selects the one with highest contrast, giving
~2x improvement for green-tinted VIN stickers. Falls back to standard
grayscale for neutral-colored images.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

egullickson added 1 commit 2026-02-07 03:23:48 +00:00

fix: use min-channel grayscale and morphological cleanup for VIN OCR (refs #113 )

Deploy to Staging / Build Images (pull_request) Successful in 35s

Details

Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s

Details

Deploy to Staging / Verify Staging (pull_request) Successful in 8s

Details

Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s

Details

Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped

Details

a07ec324fe

Replace std-based channel selection (which incorrectly picked green for
green-tinted VIN stickers) with per-pixel min(B,G,R). White text stays
255 in all channels while colored backgrounds drop to their weakest
channel value, giving 2x contrast improvement. Add morphological
opening after thresholding to remove noise speckles from car body
surface that were confusing Tesseract's page segmentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

egullickson added 1 commit 2026-02-07 03:32:57 +00:00

fix: always use min-channel and add grayscale-only OCR path (refs #113 )

Deploy to Staging / Build Images (pull_request) Successful in 35s

Details

Deploy to Staging / Deploy to Staging (pull_request) Successful in 50s

Details

Deploy to Staging / Verify Staging (pull_request) Successful in 8s

Details

Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s

Details

Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped

Details

63c027a454

Two fixes:
1. Always use min-channel for color images instead of gated comparison
   that was falling back to standard grayscale (which has only 23%
   contrast for white-on-green VIN stickers).
2. Add grayscale-only OCR path (CLAHE + denoise, no thresholding)
   between adaptive and Otsu attempts. Tesseract's LSTM engine is
   designed to handle grayscale input directly and often outperforms
   binarized input where thresholding creates artifacts.

Pipeline order: adaptive threshold → grayscale-only → Otsu threshold

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

egullickson added 1 commit 2026-02-07 03:39:52 +00:00

fix: invert min-channel so Tesseract gets dark-on-light text (refs #113 )

Deploy to Staging / Build Images (pull_request) Successful in 35s

Details

Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s

Details

Deploy to Staging / Verify Staging (pull_request) Successful in 8s

Details

Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s

Details

Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped

Details

ae5221c759

The min-channel correctly extracts contrast (white text=255 vs green
sticker bg=130), but Tesseract expects dark text on light background.
Without inversion, the grayscale-only path returned empty text for
every PSM mode because Tesseract couldn't see bright-on-dark text.
Invert via bitwise_not: text becomes 0 (black), sticker bg becomes
125 (gray). Fixes all three OCR paths (adaptive, grayscale, Otsu).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

egullickson added 1 commit 2026-02-07 03:52:12 +00:00

fix: remove char whitelist incompatible with Tesseract LSTM (refs #113 )

Deploy to Staging / Build Images (pull_request) Successful in 36s

Details

Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s

Details

Deploy to Staging / Verify Staging (pull_request) Successful in 8s

Details

Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s

Details

Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped

Details

432b3bda36

tessedit_char_whitelist does not work with OEM 1 (LSTM engine) and
causes empty/erratic output. This was the root cause of Tesseract
returning empty text despite clear, well-preprocessed images.
Character filtering is already handled post-OCR by the VIN validator's
correct_ocr_errors() method (I->1, O->0, Q->0, etc).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

egullickson added 1 commit 2026-02-07 04:00:11 +00:00

fix: extract VIN from noisy OCR via sliding window + char deletion (refs #113 )

Deploy to Staging / Build Images (pull_request) Successful in 37s

Details

Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s

Details

Deploy to Staging / Verify Staging (pull_request) Successful in 8s

Details

Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s

Details

Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped

Details

e4336ce9da

When OCR reads extra characters (e.g. sticker border as 'C', spurious
'Z' insertion), the raw text exceeds 17 chars and the old first-17
trim produced wrong VINs. New strategy tries all 17-char sliding
windows and single/double character deletions, validating each via
check digit. For 'CWVGGNPE2Z4NP069500', this finds the correct VIN
'WVGGNPE24NP069500' (valid check digit) instead of 'CWVGGNPE2Z4NP0695'
(invalid).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

egullickson added 1 commit 2026-02-07 04:15:53 +00:00

chore: Change crop to remove locked aspect ratio

Deploy to Staging / Build Images (pull_request) Successful in 3m21s

Details

Deploy to Staging / Deploy to Staging (pull_request) Successful in 22s

Details

Deploy to Staging / Verify Staging (pull_request) Successful in 8s

Details

Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s

Details

Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped

Details