chore: remove NHTSA code and update documentation (refs #227)

Delete vehicles/external/nhtsa/ directory (3 files), remove VPICVariable and VPICResponse from platform models. Update all documentation to reflect Gemini VIN decode via OCR service architecture. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 21:51:38 -06:00
parent 5cbf9c764d
commit f590421058
16 changed files with 35 additions and 408 deletions
--- a/ocr/CLAUDE.md
+++ b/ocr/CLAUDE.md
@@ -1,6 +1,6 @@
 # ocr/

-Python OCR microservice. Primary engine: PaddleOCR PP-OCRv4 with optional Google Vision cloud fallback. Gemini 2.5 Flash for maintenance manual PDF extraction. Pluggable engine abstraction in `app/engines/`.
+Python OCR microservice. Primary engine: PaddleOCR PP-OCRv4 with optional Google Vision cloud fallback. Gemini 2.5 Flash for maintenance manual PDF extraction and VIN decode. Pluggable engine abstraction in `app/engines/`.

 ## Files

--- a/ocr/app/CLAUDE.md
+++ b/ocr/app/CLAUDE.md
@@ -19,7 +19,7 @@ Python OCR microservice (FastAPI). Primary engine: PaddleOCR PP-OCRv4 with optio
 | `models/` | Data models and schemas | Request/response types |
 | `patterns/` | Regex patterns and service name mapping (27 maintenance subtypes) | Pattern matching rules, service categorization |
 | `preprocessors/` | Image preprocessing pipeline | Image preparation before OCR |
-| `routers/` | FastAPI route handlers (/extract, /extract/receipt, /extract/manual, /jobs) | API endpoint changes |
+| `routers/` | FastAPI route handlers (/extract, /extract/receipt, /extract/manual, /decode, /jobs) | API endpoint changes |
 | `services/` | Business logic services (job queue with Redis) | Core OCR processing, async job management |
 | `table_extraction/` | Table detection and parsing | Structured data extraction from images |
 | `validators/` | Input validation | Validation rules |
--- a/ocr/app/engines/CLAUDE.md
+++ b/ocr/app/engines/CLAUDE.md
@@ -3,7 +3,7 @@
 OCR engine abstraction layer. Two categories of engines:

 1. **OcrEngine subclasses** (image-to-text): PaddleOCR, Google Vision, Hybrid. Accept image bytes, return text + confidence + word boxes.
-2. **GeminiEngine** (PDF-to-structured-data): Standalone module for maintenance schedule extraction via Vertex AI. Accepts PDF bytes, returns structured JSON. Not an OcrEngine subclass because the interface signatures differ.
+2. **GeminiEngine** (PDF-to-structured-data and VIN decode): Standalone module for maintenance schedule extraction and VIN decoding via Vertex AI. Accepts PDF bytes or VIN strings, returns structured JSON. Not an OcrEngine subclass because the interface signatures differ.

 ## Files

@@ -15,7 +15,7 @@ OCR engine abstraction layer. Two categories of engines:
 | `cloud_engine.py` | Google Vision TEXT_DETECTION fallback engine (WIF authentication) | Cloud OCR configuration, API quota |
 | `hybrid_engine.py` | Combines primary + fallback engine with confidence threshold switching | Engine selection logic, fallback behavior |
 | `engine_factory.py` | Factory function and engine registry for instantiation | Adding new engine types |
-| `gemini_engine.py` | Gemini 2.5 Flash integration for maintenance schedule extraction (Vertex AI SDK, 20MB PDF limit, structured JSON output) | Manual extraction debugging, Gemini configuration |
+| `gemini_engine.py` | Gemini 2.5 Flash integration for maintenance schedule extraction and VIN decoding (Vertex AI SDK, 20MB PDF limit, structured JSON output) | Manual extraction debugging, VIN decode, Gemini configuration |

 ## Engine Selection

@@ -30,4 +30,4 @@ create_engine(config)
 HybridEngine (tries primary, falls back if confidence < threshold)
 ```

-GeminiEngine is created independently by ManualExtractor, not through the engine factory.
+GeminiEngine is created independently by ManualExtractor and the VIN decode router, not through the engine factory.