Commit Graph

58 Commits

Author SHA1 Message Date
Eric Gullickson
1add6c8240 fix: remove unsupported AutomaticFunctionCallingConfig parameter (refs #231)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 39s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 53s
Deploy to Staging / Verify Staging (pull_request) Successful in 9s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
The installed google-genai version does not support max_remote_calls on
AutomaticFunctionCallingConfig, causing a pydantic validation error that
broke VIN decode on staging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 12:59:04 -06:00
Eric Gullickson
936753fac2 fix: VIN Decoding timeouts and logic errors
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 3m33s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 52s
Deploy to Staging / Verify Staging (pull_request) Successful in 9s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
2026-02-28 12:02:26 -06:00
Eric Gullickson
96e1dde7b2 docs: update CLAUDE.md references from Vertex AI to google-genai (refs #231)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 8m4s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 24s
Deploy to Staging / Verify Staging (pull_request) Successful in 9s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 9s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 11:21:58 -06:00
Eric Gullickson
1464a0e1af feat: update test mocks for google-genai SDK (refs #235)
Replace engine._model/engine._generation_config mocks with
engine._client/engine._model_name. Update sys.modules patches
from vertexai to google.genai. Remove dead if-False branch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 11:21:10 -06:00
Eric Gullickson
9f51e62b94 feat: migrate MaintenanceReceiptExtractor to google-genai SDK (refs #234)
Replace vertexai.generative_models with google.genai client pattern.
Fix pre-existing bug: raise GeminiUnavailableError instead of bare
RuntimeError for missing credentials. Add proper try/except blocks
matching GeminiEngine error handling pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 11:17:14 -06:00
Eric Gullickson
b7f472b3e8 feat: migrate GeminiEngine to google-genai SDK with Google Search grounding (refs #233)
Replace vertexai.generative_models with google.genai client pattern.
Add Google Search grounding tool to VIN decode for improved accuracy.
Convert response schema types to uppercase per Vertex AI Schema spec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 11:16:18 -06:00
Eric Gullickson
398d67304f feat: replace google-cloud-aiplatform with google-genai dependency (refs #232)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 11:13:54 -06:00
Eric Gullickson
0055d9f0f3 fix: VIN decoding year fixes
All checks were successful
Deploy to Staging / Build Images (push) Successful in 35s
Deploy to Staging / Deploy to Staging (push) Successful in 53s
Deploy to Staging / Verify Staging (push) Successful in 9s
Deploy to Staging / Notify Staging Ready (push) Successful in 9s
Deploy to Staging / Notify Staging Failure (push) Has been skipped
Mirror Base Images / Mirror Base Images (push) Successful in 1m2s
2026-02-28 11:09:46 -06:00
Eric Gullickson
7d90f4b25a fix: add VIN year code table to Gemini decode prompt (refs #229)
All checks were successful
Deploy to Staging / Build Images (push) Successful in 37s
Deploy to Staging / Deploy to Staging (push) Successful in 51s
Deploy to Staging / Verify Staging (push) Successful in 8s
Deploy to Staging / Notify Staging Ready (push) Successful in 7s
Deploy to Staging / Notify Staging Failure (push) Has been skipped
gemini-3-flash-preview was hallucinating year (e.g., returning 1993
instead of 2023 for position-10 code P). Prompt now includes the full
1980-2039 year code table and position-7 disambiguation rule.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 21:55:21 -06:00
Eric Gullickson
781241966c chore: change google region
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 38s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 9s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
2026-02-19 20:59:40 -06:00
Eric Gullickson
f590421058 chore: remove NHTSA code and update documentation (refs #227)
Delete vehicles/external/nhtsa/ directory (3 files), remove VPICVariable
and VPICResponse from platform models. Update all documentation to
reflect Gemini VIN decode via OCR service architecture.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 21:51:38 -06:00
Eric Gullickson
a75f7b5583 feat: add VIN decode endpoint to OCR Python service (refs #224)
Add POST /decode/vin endpoint using Gemini 2.5 Flash for VIN string
decoding. Returns structured vehicle data (year, make, model, trim,
body/drive/fuel type, engine, transmission) with confidence score.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 21:40:10 -06:00
Eric Gullickson
220f8ea3ac fix: increase hybrid engine cloud timeout for WIF token exchange (refs #182)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 37s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 9s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
The 5s cloud timeout was too tight for the initial WIF authentication
which requires 3 HTTP round-trips (STS, IAM credentials, resource
manager). First call took 5.5s and was discarded, falling back to slow
CPU-based PaddleOCR. Increased to 10s to accommodate cold-start auth.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 21:38:05 -06:00
Eric Gullickson
5e4515da7c fix: use PyMuPDF instead of pdf2image for PDF-to-image conversion (refs #182)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 37s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 52s
Deploy to Staging / Verify Staging (pull_request) Successful in 9s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
pdf2image requires poppler-utils which is not installed in the OCR
container. PyMuPDF is already in requirements.txt and can render PDF
pages to PNG at 300 DPI natively without extra system dependencies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 21:34:17 -06:00
Eric Gullickson
653c535165 chore: add PDF support to receipt OCR pipeline (refs #182)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 38s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 22s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
The receipt extractor only accepted image MIME types, rejecting PDFs at
the OCR layer. Added application/pdf to supported types and PDF-to-image
conversion (first page at 300 DPI) before OCR preprocessing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 21:22:40 -06:00
Eric Gullickson
90401dc1ba feat: add maintenance receipt extraction pipeline with Gemini + regex (refs #150)
- New MaintenanceReceiptExtractor: Gemini-primary extraction with regex
  cross-validation for dates, amounts, and odometer readings
- New maintenance_receipt_validation.py: cross-validation patterns for
  structured field confidence adjustment
- New POST /extract/maintenance-receipt endpoint reusing
  ReceiptExtractionResponse model
- Per-field confidence scores (0.0-1.0) with Gemini base 0.85,
  boosted/reduced by regex agreement

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 21:14:13 -06:00
Eric Gullickson
55a7bcc874 fix: Manual polling typo
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 36s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
2026-02-11 20:06:03 -06:00
Eric Gullickson
a078962d3f fix: Manual scanning
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 35s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
2026-02-11 19:57:32 -06:00
Eric Gullickson
209425a908 feat: rewrite ManualExtractor progress to spec-aligned 10/50/95/100 pattern (refs #143)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 14:40:11 -06:00
Eric Gullickson
f9a650a4d7 feat: add traceback logging and spec-aligned error message to GeminiEngine (refs #142)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 14:35:06 -06:00
Eric Gullickson
ab0d8463be docs: update CLAUDE.md indexes and README for OCR expansion (refs #137)
Add/update documentation across backend, Python OCR service, and frontend
for receipt scanning, manual extraction, and Gemini integration. Create
new CLAUDE.md files for engines/, fuel-logs/, documents/, and maintenance/
features.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 11:04:19 -06:00
Eric Gullickson
57ed04d955 feat: rewrite ManualExtractor to use Gemini engine (refs #134)
Replace traditional OCR pipeline (table_detector, table_parser,
maintenance_patterns) with GeminiEngine for semantic PDF extraction.
Map Gemini serviceName values to 27 maintenance subtypes via
ServiceMapper fuzzy matching. Add 8 unit tests covering normal
extraction, unusual names, empty response, and error handling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 10:24:11 -06:00
Eric Gullickson
3705e63fde feat: add Gemini engine module and configuration (refs #133)
Add standalone GeminiEngine class for maintenance schedule extraction
from PDF owners manuals using Vertex AI Gemini 2.5 Flash with structured
JSON output enforcement, 20MB size limit, and lazy initialization.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 10:00:47 -06:00
Eric Gullickson
91dc847f56 fix: use correct Auth0 US region domain in WIF token script (refs #127)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 34s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
Domain was motovaultpro.auth0.com (404) instead of
motovaultpro.us.auth0.com.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 18:44:30 -06:00
Eric Gullickson
7bba28154d fix: capture Auth0 error response in WIF token script (refs #127)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 35s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 9s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
The set -e + curl --fail-with-body inside $() caused the script to exit
with code 22 and empty stderr, hiding the actual Auth0 error. Switch to
writing the body to a temp file and checking HTTP status manually so the
error response is visible in logs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 18:41:34 -06:00
Eric Gullickson
e6dd7492a1 test: add monthly limit, counter, and cloud-primary engine tests (refs #127)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 8m46s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 22s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
- Update existing hybrid engine tests for new Redis counter behavior
- Add cloud-primary path tests (under/at limit, fallback, errors)
- Add Redis counter increment and TTL verification tests
- Add Redis failure graceful handling test
- Update cloud engine error message assertion for WIF config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 20:56:51 -06:00
Eric Gullickson
9209739e75 feat: add Auth0 WIF token script and update Dockerfile (refs #127)
- Create fetch-auth0-token.sh for Auth0 M2M -> GCP WIF token exchange
- Add jq to Dockerfile system dependencies
- Ensure script is executable in container image

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 20:51:30 -06:00
Eric Gullickson
4abd7d8d5b feat: add Vision monthly cap, WIF auth, and cloud-primary hybrid engine (refs #127)
- Add VISION_MONTHLY_LIMIT config setting (default 1000)
- Update CloudEngine to use WIF credential config via ADC
- Rewrite HybridEngine to support cloud-primary with Redis counter
- Pass monthly_limit through engine factory

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 20:50:02 -06:00
Eric Gullickson
e7471d5c27 fix: Python Image Pinning
All checks were successful
Deploy to Staging / Build Images (push) Successful in 8m28s
Deploy to Staging / Deploy to Staging (push) Successful in 22s
Deploy to Staging / Verify Staging (push) Successful in 8s
Deploy to Staging / Notify Staging Ready (push) Successful in 8s
Deploy to Staging / Notify Staging Failure (push) Has been skipped
2026-02-08 19:11:13 -06:00
Eric Gullickson
2c3e432fcf fix: Build errors with python3.13
All checks were successful
Deploy to Staging / Build Images (push) Successful in 8m50s
Deploy to Staging / Deploy to Staging (push) Successful in 23s
Deploy to Staging / Verify Staging (push) Successful in 8s
Deploy to Staging / Notify Staging Ready (push) Successful in 7s
Deploy to Staging / Notify Staging Failure (push) Has been skipped
2026-02-08 18:54:49 -06:00
Eric Gullickson
9a2b12c5dc fix: No matches
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 37s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 22s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
2026-02-07 16:35:28 -06:00
Eric Gullickson
9d2d4e57b7 fix: PaddleOCR error
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 36s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 52s
Deploy to Staging / Verify Staging (pull_request) Successful in 9s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
2026-02-07 16:12:07 -06:00
Eric Gullickson
dab4a3bdf3 fix: PaddleOCR error
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 3m46s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
2026-02-07 15:51:04 -06:00
Eric Gullickson
639ca117f1 fix: Update PaddleOCR API
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 5m6s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
2026-02-07 14:44:06 -06:00
Eric Gullickson
b9fe222f12 fix: Build errors and tesseract removal
Some checks failed
Deploy to Staging / Build Images (pull_request) Failing after 4m14s
Deploy to Staging / Deploy to Staging (pull_request) Has been skipped
Deploy to Staging / Verify Staging (pull_request) Has been skipped
Deploy to Staging / Notify Staging Ready (pull_request) Has been skipped
Deploy to Staging / Notify Staging Failure (pull_request) Successful in 8s
2026-02-07 12:12:04 -06:00
Eric Gullickson
cf114fad3c fix: build errors for OpenCV
Some checks failed
Deploy to Staging / Build Images (pull_request) Failing after 3m16s
Deploy to Staging / Deploy to Staging (pull_request) Has been skipped
Deploy to Staging / Verify Staging (pull_request) Has been skipped
Deploy to Staging / Notify Staging Ready (pull_request) Has been skipped
Deploy to Staging / Notify Staging Failure (pull_request) Successful in 8s
2026-02-07 11:58:00 -06:00
Eric Gullickson
47c5676498 chore: update OCR tests and documentation (refs #121)
Some checks failed
Deploy to Staging / Build Images (pull_request) Failing after 7m4s
Deploy to Staging / Deploy to Staging (pull_request) Has been skipped
Deploy to Staging / Verify Staging (pull_request) Has been skipped
Deploy to Staging / Notify Staging Ready (pull_request) Has been skipped
Deploy to Staging / Notify Staging Failure (pull_request) Successful in 7s
Add engine abstraction tests and update docs to reflect PaddleOCR primary
architecture with optional Google Vision cloud fallback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 11:42:51 -06:00
Eric Gullickson
9b6417379b chore: update Docker and compose files for PaddleOCR engine (refs #119)
- Replace libtesseract-dev with libgomp1 (OpenMP for PaddlePaddle)
- Pre-download PP-OCRv4 models during Docker build
- Add OCR engine env vars to all compose files (base, staging, prod)
- Add optional Google Vision secret mount (commented, enable on demand)
- Create google-vision-key.json.example placeholder

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 11:17:44 -06:00
Eric Gullickson
4ef942cb9d feat: add optional Google Vision cloud fallback engine (refs #118)
CloudEngine wraps Google Vision TEXT_DETECTION with lazy init.
HybridEngine runs primary engine, falls back to cloud when confidence
is below threshold. Disabled by default (OCR_FALLBACK_ENGINE=none).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 11:12:08 -06:00
Eric Gullickson
013fb0c67a feat: migrate VIN/receipt extractors and OCR service to engine abstraction (refs #117)
Replace direct pytesseract calls with OcrEngine interface in vin_extractor.py,
receipt_extractor.py, and ocr_service.py. PSM mode fallbacks replaced with
engine-agnostic single-line/single-word configs. Dead _process_ocr_data removed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 10:56:27 -06:00
Eric Gullickson
ebc633fb36 feat: add OCR engine abstraction layer (refs #116)
Introduce pluggable OcrEngine ABC with PaddleOCR PP-OCRv4 as primary
engine and Tesseract wrapper for backward compatibility. Engine factory
reads OCR_PRIMARY_ENGINE config to instantiate the correct engine.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 10:47:40 -06:00
Eric Gullickson
e4336ce9da fix: extract VIN from noisy OCR via sliding window + char deletion (refs #113)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 37s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
When OCR reads extra characters (e.g. sticker border as 'C', spurious
'Z' insertion), the raw text exceeds 17 chars and the old first-17
trim produced wrong VINs. New strategy tries all 17-char sliding
windows and single/double character deletions, validating each via
check digit. For 'CWVGGNPE2Z4NP069500', this finds the correct VIN
'WVGGNPE24NP069500' (valid check digit) instead of 'CWVGGNPE2Z4NP0695'
(invalid).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 22:00:07 -06:00
Eric Gullickson
432b3bda36 fix: remove char whitelist incompatible with Tesseract LSTM (refs #113)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 36s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
tessedit_char_whitelist does not work with OEM 1 (LSTM engine) and
causes empty/erratic output. This was the root cause of Tesseract
returning empty text despite clear, well-preprocessed images.
Character filtering is already handled post-OCR by the VIN validator's
correct_ocr_errors() method (I->1, O->0, Q->0, etc).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 21:52:08 -06:00
Eric Gullickson
ae5221c759 fix: invert min-channel so Tesseract gets dark-on-light text (refs #113)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 35s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
The min-channel correctly extracts contrast (white text=255 vs green
sticker bg=130), but Tesseract expects dark text on light background.
Without inversion, the grayscale-only path returned empty text for
every PSM mode because Tesseract couldn't see bright-on-dark text.
Invert via bitwise_not: text becomes 0 (black), sticker bg becomes
125 (gray). Fixes all three OCR paths (adaptive, grayscale, Otsu).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 21:39:48 -06:00
Eric Gullickson
63c027a454 fix: always use min-channel and add grayscale-only OCR path (refs #113)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 35s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 50s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
Two fixes:
1. Always use min-channel for color images instead of gated comparison
   that was falling back to standard grayscale (which has only 23%
   contrast for white-on-green VIN stickers).
2. Add grayscale-only OCR path (CLAHE + denoise, no thresholding)
   between adaptive and Otsu attempts. Tesseract's LSTM engine is
   designed to handle grayscale input directly and often outperforms
   binarized input where thresholding creates artifacts.

Pipeline order: adaptive threshold → grayscale-only → Otsu threshold

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 21:32:52 -06:00
Eric Gullickson
a07ec324fe fix: use min-channel grayscale and morphological cleanup for VIN OCR (refs #113)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 35s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
Replace std-based channel selection (which incorrectly picked green for
green-tinted VIN stickers) with per-pixel min(B,G,R). White text stays
255 in all channels while colored backgrounds drop to their weakest
channel value, giving 2x contrast improvement. Add morphological
opening after thresholding to remove noise speckles from car body
surface that were confusing Tesseract's page segmentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 21:23:43 -06:00
Eric Gullickson
0de34983bb fix: use best-contrast color channel for VIN preprocessing (refs #113)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 36s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 1m7s
Deploy to Staging / Verify Staging (pull_request) Successful in 10s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 9s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
White text on green VIN stickers has only ~12% contrast in standard
grayscale conversion because the green channel dominates luminance.
The new _best_contrast_channel method evaluates each RGB channel's
standard deviation and selects the one with highest contrast, giving
~2x improvement for green-tinted VIN stickers. Falls back to standard
grayscale for neutral-colored images.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 21:14:56 -06:00
Eric Gullickson
ff3858f750 fix: add debug image saving gated on LOG_LEVEL=debug (refs #113)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 36s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 21s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
Save original, adaptive, and Otsu preprocessed images to
/tmp/vin-debug/{timestamp}/ when LOG_LEVEL is set to debug.
No images saved at info level. Volume mount added for access.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 20:26:06 -06:00
Eric Gullickson
d5696320f1 fix: align VIN OCR logging with unified logging design (refs #113)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 3m25s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 2m36s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 9s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
Replace filesystem-based debug system (VIN_DEBUG_DIR) with standard
logger.debug() calls that flow through Loki when LOG_LEVEL=DEBUG.
Use .env.logging variable for OCR LOG_LEVEL. Increase image capture
quality to 0.95 for better OCR accuracy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 19:36:35 -06:00
Eric Gullickson
6a4c2137f7 fix: resolve VIN OCR scanning failures on all images (refs #113)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 35s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 2m31s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
Root cause: Tesseract fragments VINs into multiple words but candidate
extraction required continuous 17-char sequences, rejecting all results.

Changes:
- Fix candidate extraction to concatenate adjacent OCR fragments
- Disable Tesseract dictionaries (VINs are not dictionary words)
- Set OEM 1 (LSTM engine) for better accuracy
- Add PSM 11 (sparse text) and PSM 13 (raw line) fallback modes
- Add Otsu's thresholding as alternative preprocessing pipeline
- Upscale small images to meet Tesseract's 300 DPI requirement
- Remove incorrect B->8 and S->5 transliterations (valid VIN chars)
- Fix pre-existing test bug in check digit expected value

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 15:57:14 -06:00