Replace direct pytesseract calls with OcrEngine interface in vin_extractor.py,
receipt_extractor.py, and ocr_service.py. PSM mode fallbacks replaced with
engine-agnostic single-line/single-word configs. Dead _process_ocr_data removed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tessedit_char_whitelist does not work with OEM 1 (LSTM engine) and
causes empty/erratic output. This was the root cause of Tesseract
returning empty text despite clear, well-preprocessed images.
Character filtering is already handled post-OCR by the VIN validator's
correct_ocr_errors() method (I->1, O->0, Q->0, etc).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two fixes:
1. Always use min-channel for color images instead of gated comparison
that was falling back to standard grayscale (which has only 23%
contrast for white-on-green VIN stickers).
2. Add grayscale-only OCR path (CLAHE + denoise, no thresholding)
between adaptive and Otsu attempts. Tesseract's LSTM engine is
designed to handle grayscale input directly and often outperforms
binarized input where thresholding creates artifacts.
Pipeline order: adaptive threshold → grayscale-only → Otsu threshold
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Save original, adaptive, and Otsu preprocessed images to
/tmp/vin-debug/{timestamp}/ when LOG_LEVEL is set to debug.
No images saved at info level. Volume mount added for access.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace filesystem-based debug system (VIN_DEBUG_DIR) with standard
logger.debug() calls that flow through Loki when LOG_LEVEL=DEBUG.
Use .env.logging variable for OCR LOG_LEVEL. Increase image capture
quality to 0.95 for better OCR accuracy.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>