chore: remove NHTSA code and update documentation (refs #227)

Delete vehicles/external/nhtsa/ directory (3 files), remove VPICVariable
and VPICResponse from platform models. Update all documentation to
reflect Gemini VIN decode via OCR service architecture.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Eric Gullickson
2026-02-18 21:51:38 -06:00
parent 5cbf9c764d
commit f590421058
16 changed files with 35 additions and 408 deletions

View File

@@ -37,7 +37,7 @@ Backend proxy for the Python OCR microservice. Handles authentication, tier gati
| File | What | When to read | | File | What | When to read |
| ---- | ---- | ------------ | | ---- | ---- | ------------ |
| `ocr-client.ts` | HTTP client to mvp-ocr Python service (extract, extractVin, extractReceipt, submitJob, submitManualJob, getJobStatus, isHealthy) | OCR service communication, error handling | | `ocr-client.ts` | HTTP client to mvp-ocr Python service (extract, extractVin, extractReceipt, decodeVin, submitJob, submitManualJob, getJobStatus, isHealthy) | OCR service communication, error handling |
## tests/ ## tests/

View File

@@ -117,7 +117,7 @@ platform/
When implemented, VIN decoding will use: When implemented, VIN decoding will use:
1. **Cache First**: Check Redis (7-day TTL for success, 1-hour for failures) 1. **Cache First**: Check Redis (7-day TTL for success, 1-hour for failures)
2. **PostgreSQL**: Database function for high-confidence decode 2. **PostgreSQL**: Database function for high-confidence decode
3. **vPIC Fallback**: NHTSA vPIC API with circuit breaker protection 3. **OCR Service Fallback**: Gemini VIN decode via OCR service
4. **Graceful Degradation**: Return meaningful errors when all sources fail 4. **Graceful Degradation**: Return meaningful errors when all sources fail
### Database Schema ### Database Schema
@@ -164,7 +164,7 @@ When VIN decoding is implemented:
### External APIs (Planned/Future) ### External APIs (Planned/Future)
When VIN decoding is implemented: When VIN decoding is implemented:
- **NHTSA vPIC**: https://vpic.nhtsa.dot.gov/api (VIN decoding fallback) - **OCR Service**: Gemini VIN decode via mvp-ocr (VIN decoding fallback)
### Database Tables ### Database Tables
- **vehicle_options** - Hierarchical vehicle data (years, makes, models, trims, engines, transmissions) - **vehicle_options** - Hierarchical vehicle data (years, makes, models, trims, engines, transmissions)
@@ -269,7 +269,7 @@ npm run lint
## Future Considerations ## Future Considerations
### Planned Features ### Planned Features
- VIN decoding endpoint with PostgreSQL + vPIC fallback - VIN decoding endpoint with PostgreSQL + Gemini/OCR service fallback
- Circuit breaker pattern for external API resilience - Circuit breaker pattern for external API resilience
### Potential Enhancements ### Potential Enhancements

View File

@@ -61,19 +61,3 @@ export interface VINDecodeResponse {
error?: string; error?: string;
} }
/**
* vPIC API response structure (NHTSA)
*/
export interface VPICVariable {
Variable: string;
Value: string | null;
ValueId: string | null;
VariableId: number;
}
export interface VPICResponse {
Count: number;
Message: string;
SearchCriteria: string;
Results: VPICVariable[];
}

View File

@@ -16,6 +16,6 @@
| `data/` | Repository, database queries | Database operations | | `data/` | Repository, database queries | Database operations |
| `docs/` | Feature-specific documentation | Vehicle design details | | `docs/` | Feature-specific documentation | Vehicle design details |
| `events/` | Event handlers and emitters | Cross-feature event integration | | `events/` | Event handlers and emitters | Cross-feature event integration |
| `external/` | External service integrations (NHTSA) | VIN decoding, third-party APIs | | `external/` | External service integrations | VIN decoding, third-party APIs |
| `migrations/` | Database schema | Schema changes | | `migrations/` | Database schema | Schema changes |
| `tests/` | Unit and integration tests | Adding or modifying tests | | `tests/` | Unit and integration tests | Adding or modifying tests |

View File

@@ -13,7 +13,7 @@ Primary entity for vehicle management consuming MVP Platform Vehicles Service. H
- `DELETE /api/vehicles/:id` - Soft delete vehicle - `DELETE /api/vehicles/:id` - Soft delete vehicle
### VIN Decoding (Pro/Enterprise Only) ### VIN Decoding (Pro/Enterprise Only)
- `POST /api/vehicles/decode-vin` - Decode VIN using NHTSA vPIC API - `POST /api/vehicles/decode-vin` - Decode VIN using Gemini via OCR service
### Hierarchical Vehicle Dropdowns ### Hierarchical Vehicle Dropdowns
**Status**: Vehicles service now proxies the platform vehicle catalog to provide fully dynamic dropdowns. Each selection step filters the next list, ensuring only valid combinations are shown. **Status**: Vehicles service now proxies the platform vehicle catalog to provide fully dynamic dropdowns. Each selection step filters the next list, ensuring only valid combinations are shown.
@@ -104,11 +104,7 @@ vehicles/
├── data/ # Database layer ├── data/ # Database layer
│ └── vehicles.repository.ts │ └── vehicles.repository.ts
├── external/ # External service integrations ├── external/ # External service integrations
── CLAUDE.md # Integration pattern docs ── CLAUDE.md # Integration pattern docs
│ └── nhtsa/ # NHTSA vPIC API client
│ ├── nhtsa.client.ts
│ ├── nhtsa.types.ts
│ └── index.ts
├── migrations/ # Feature schema ├── migrations/ # Feature schema
│ └── 001_create_vehicles_tables.sql │ └── 001_create_vehicles_tables.sql
├── tests/ # All tests ├── tests/ # All tests
@@ -121,14 +117,14 @@ vehicles/
## Key Features ## Key Features
### 🔍 VIN Decoding (NHTSA vPIC API) ### VIN Decoding (Gemini via OCR Service)
- **Tier Gating**: Pro and Enterprise users only (`vehicle.vinDecode` feature key) - **Tier Gating**: Pro and Enterprise users only (`vehicle.vinDecode` feature key)
- **NHTSA API**: Calls official NHTSA vPIC API for authoritative vehicle data - **Gemini**: Calls OCR service Gemini VIN decode for authoritative vehicle data
- **Caching**: Results cached in `vin_cache` table (1-year TTL, VIN data is static) - **Caching**: Results cached in `vin_cache` table (1-year TTL, VIN data is static)
- **Validation**: 17-character VIN format, excludes I/O/Q characters - **Validation**: 17-character VIN format, excludes I/O/Q characters
- **Matching**: Case-insensitive exact match against dropdown options - **Matching**: Case-insensitive exact match against dropdown options
- **Confidence Levels**: High (exact match), Medium (normalized match), None (hint only) - **Confidence Levels**: High (exact match), Medium (normalized match), None (hint only)
- **Timeout**: 5-second timeout for NHTSA API calls - **Timeout**: 5-second timeout for OCR service calls
#### Decode VIN Request #### Decode VIN Request
```json ```json
@@ -140,15 +136,15 @@ Authorization: Bearer <jwt>
Response (200): Response (200):
{ {
"year": { "value": 2021, "nhtsaValue": "2021", "confidence": "high" }, "year": { "value": 2021, "decodedValue": "2021", "confidence": "high" },
"make": { "value": "Honda", "nhtsaValue": "HONDA", "confidence": "high" }, "make": { "value": "Honda", "decodedValue": "HONDA", "confidence": "high" },
"model": { "value": "Civic", "nhtsaValue": "Civic", "confidence": "high" }, "model": { "value": "Civic", "decodedValue": "Civic", "confidence": "high" },
"trimLevel": { "value": "EX", "nhtsaValue": "EX", "confidence": "high" }, "trimLevel": { "value": "EX", "decodedValue": "EX", "confidence": "high" },
"engine": { "value": null, "nhtsaValue": "2.0L L4 DOHC 16V", "confidence": "none" }, "engine": { "value": null, "decodedValue": "2.0L L4 DOHC 16V", "confidence": "none" },
"transmission": { "value": null, "nhtsaValue": "CVT", "confidence": "none" }, "transmission": { "value": null, "decodedValue": "CVT", "confidence": "none" },
"bodyType": { "value": null, "nhtsaValue": "Sedan", "confidence": "none" }, "bodyType": { "value": null, "decodedValue": "Sedan", "confidence": "none" },
"driveType": { "value": null, "nhtsaValue": "FWD", "confidence": "none" }, "driveType": { "value": null, "decodedValue": "FWD", "confidence": "none" },
"fuelType": { "value": null, "nhtsaValue": "Gasoline", "confidence": "none" } "fuelType": { "value": null, "decodedValue": "Gasoline", "confidence": "none" }
} }
Error (400 - Invalid VIN): Error (400 - Invalid VIN):
@@ -157,7 +153,7 @@ Error (400 - Invalid VIN):
Error (403 - Tier Required): Error (403 - Tier Required):
{ "error": "TIER_REQUIRED", "requiredTier": "pro", "currentTier": "free", ... } { "error": "TIER_REQUIRED", "requiredTier": "pro", "currentTier": "free", ... }
Error (502 - NHTSA Failure): Error (502 - OCR Service Failure):
{ "error": "VIN_DECODE_FAILED", "message": "Unable to decode VIN from external service" } { "error": "VIN_DECODE_FAILED", "message": "Unable to decode VIN from external service" }
``` ```
@@ -230,7 +226,7 @@ Error (502 - NHTSA Failure):
## Testing ## Testing
### Unit Tests ### Unit Tests
- `vehicles.service.test.ts` - Business logic with mocked dependencies (VIN decode, caching, CRUD operations) - `vehicles.service.test.ts` - Business logic with mocked dependencies (VIN decode via OCR service mock, caching, CRUD operations)
### Integration Tests ### Integration Tests
- `vehicles.integration.test.ts` - Complete API workflow with test database (create, read, update, delete vehicles) - `vehicles.integration.test.ts` - Complete API workflow with test database (create, read, update, delete vehicles)

View File

@@ -5,9 +5,3 @@
| File | What | When to read | | File | What | When to read |
| ---- | ---- | ------------ | | ---- | ---- | ------------ |
| `README.md` | Integration patterns, adding new services | Understanding external service conventions | | `README.md` | Integration patterns, adding new services | Understanding external service conventions |
## Subdirectories
| Directory | What | When to read |
| --------- | ---- | ------------ |
| `nhtsa/` | NHTSA vPIC API client for VIN decoding | VIN decode feature work |

View File

@@ -15,7 +15,7 @@ Each integration follows this structure:
## Adding New Integrations ## Adding New Integrations
1. Create subdirectory: `external/{service}/` 1. Create subdirectory: `external/{service}/`
2. Add client: `{service}.client.ts` following NHTSAClient pattern 2. Add client: `{service}.client.ts` following the axios-based client pattern
3. Add types: `{service}.types.ts` 3. Add types: `{service}.types.ts`
4. Update `CLAUDE.md` with new directory 4. Update `CLAUDE.md` with new directory
5. Add tests in `tests/unit/{service}.client.test.ts` 5. Add tests in `tests/unit/{service}.client.test.ts`

View File

@@ -1,16 +0,0 @@
/**
* @ai-summary NHTSA vPIC integration exports
* @ai-context Public API for VIN decoding functionality
*/
export { NHTSAClient } from './nhtsa.client';
export type {
NHTSADecodeResponse,
NHTSAResult,
DecodedVehicleData,
MatchedField,
MatchConfidence,
VinCacheEntry,
DecodeVinRequest,
VinDecodeError,
} from './nhtsa.types';

View File

@@ -1,235 +0,0 @@
/**
* @ai-summary NHTSA vPIC API client for VIN decoding
* @ai-context Fetches vehicle data from NHTSA and caches results
*/
import axios, { AxiosError } from 'axios';
import { logger } from '../../../../core/logging/logger';
import { NHTSADecodeResponse, VinCacheEntry } from './nhtsa.types';
import { Pool } from 'pg';
/**
* VIN validation regex
* - 17 characters
* - Excludes I, O, Q (not used in VINs)
* - Alphanumeric only
*/
const VIN_REGEX = /^[A-HJ-NPR-Z0-9]{17}$/;
/**
* Cache TTL: 1 year (VIN data is static - vehicle specs don't change)
*/
const CACHE_TTL_SECONDS = 365 * 24 * 60 * 60;
export class NHTSAClient {
private readonly baseURL = 'https://vpic.nhtsa.dot.gov/api';
private readonly timeout = 5000; // 5 seconds
constructor(private readonly pool: Pool) {}
/**
* Validate VIN format
* @throws Error if VIN format is invalid
*/
validateVin(vin: string): string {
const sanitized = vin.trim().toUpperCase();
if (!sanitized) {
throw new Error('VIN is required');
}
if (!VIN_REGEX.test(sanitized)) {
throw new Error('Invalid VIN format. VIN must be exactly 17 characters and contain only letters (except I, O, Q) and numbers.');
}
return sanitized;
}
/**
* Check cache for existing VIN data
*/
async getCached(vin: string): Promise<VinCacheEntry | null> {
try {
const result = await this.pool.query<{
vin: string;
make: string | null;
model: string | null;
year: number | null;
engine_type: string | null;
body_type: string | null;
raw_data: NHTSADecodeResponse;
cached_at: Date;
}>(
`SELECT vin, make, model, year, engine_type, body_type, raw_data, cached_at
FROM vin_cache
WHERE vin = $1
AND cached_at > NOW() - INTERVAL '${CACHE_TTL_SECONDS} seconds'`,
[vin]
);
if (result.rows.length === 0) {
return null;
}
const row = result.rows[0];
return {
vin: row.vin,
make: row.make,
model: row.model,
year: row.year,
engineType: row.engine_type,
bodyType: row.body_type,
rawData: row.raw_data,
cachedAt: row.cached_at,
};
} catch (error) {
logger.error('Failed to check VIN cache', { vin, error });
return null;
}
}
/**
* Save VIN data to cache
*/
async saveToCache(vin: string, response: NHTSADecodeResponse): Promise<void> {
try {
const findValue = (variable: string): string | null => {
const result = response.Results.find(r => r.Variable === variable);
return result?.Value || null;
};
const year = findValue('Model Year');
const make = findValue('Make');
const model = findValue('Model');
const engineType = findValue('Engine Model');
const bodyType = findValue('Body Class');
await this.pool.query(
`INSERT INTO vin_cache (vin, make, model, year, engine_type, body_type, raw_data, cached_at)
VALUES ($1, $2, $3, $4, $5, $6, $7, NOW())
ON CONFLICT (vin) DO UPDATE SET
make = EXCLUDED.make,
model = EXCLUDED.model,
year = EXCLUDED.year,
engine_type = EXCLUDED.engine_type,
body_type = EXCLUDED.body_type,
raw_data = EXCLUDED.raw_data,
cached_at = NOW()`,
[vin, make, model, year ? parseInt(year) : null, engineType, bodyType, JSON.stringify(response)]
);
logger.debug('VIN cached', { vin });
} catch (error) {
logger.error('Failed to cache VIN data', { vin, error });
// Don't throw - caching failure shouldn't break the decode flow
}
}
/**
* Decode VIN using NHTSA vPIC API
* @param vin - 17-character VIN
* @returns Raw NHTSA decode response
* @throws Error if VIN is invalid or API call fails
*/
async decodeVin(vin: string): Promise<NHTSADecodeResponse> {
// Validate and sanitize VIN
const sanitizedVin = this.validateVin(vin);
// Check cache first
const cached = await this.getCached(sanitizedVin);
if (cached) {
logger.debug('VIN cache hit', { vin: sanitizedVin });
return cached.rawData;
}
// Call NHTSA API
logger.info('Calling NHTSA vPIC API', { vin: sanitizedVin });
try {
const response = await axios.get<NHTSADecodeResponse>(
`${this.baseURL}/vehicles/decodevin/${sanitizedVin}`,
{
params: { format: 'json' },
timeout: this.timeout,
}
);
// Check for NHTSA-level errors
if (response.data.Count === 0) {
throw new Error('NHTSA returned no results for this VIN');
}
// Check for error messages in results
const errorResult = response.data.Results.find(
r => r.Variable === 'Error Code' && r.Value && r.Value !== '0'
);
if (errorResult) {
const errorText = response.data.Results.find(r => r.Variable === 'Error Text');
throw new Error(`NHTSA error: ${errorText?.Value || 'Unknown error'}`);
}
// Cache the successful response
await this.saveToCache(sanitizedVin, response.data);
return response.data;
} catch (error) {
if (axios.isAxiosError(error)) {
const axiosError = error as AxiosError;
if (axiosError.code === 'ECONNABORTED') {
logger.error('NHTSA API timeout', { vin: sanitizedVin });
throw new Error('NHTSA API request timed out. Please try again.');
}
if (axiosError.response) {
logger.error('NHTSA API error response', {
vin: sanitizedVin,
status: axiosError.response.status,
data: axiosError.response.data,
});
throw new Error(`NHTSA API error: ${axiosError.response.status}`);
}
logger.error('NHTSA API network error', { vin: sanitizedVin, message: axiosError.message });
throw new Error('Unable to connect to NHTSA API. Please try again later.');
}
throw error;
}
}
/**
* Extract a specific value from NHTSA response
*/
static extractValue(response: NHTSADecodeResponse, variable: string): string | null {
const result = response.Results.find(r => r.Variable === variable);
return result?.Value?.trim() || null;
}
/**
* Extract year from NHTSA response
*/
static extractYear(response: NHTSADecodeResponse): number | null {
const value = NHTSAClient.extractValue(response, 'Model Year');
if (!value) return null;
const parsed = parseInt(value, 10);
return isNaN(parsed) ? null : parsed;
}
/**
* Extract engine description from NHTSA response
* Combines multiple engine-related fields
*/
static extractEngine(response: NHTSADecodeResponse): string | null {
const engineModel = NHTSAClient.extractValue(response, 'Engine Model');
if (engineModel) return engineModel;
// Build engine description from components
const cylinders = NHTSAClient.extractValue(response, 'Engine Number of Cylinders');
const displacement = NHTSAClient.extractValue(response, 'Displacement (L)');
const fuelType = NHTSAClient.extractValue(response, 'Fuel Type - Primary');
const parts: string[] = [];
if (cylinders) parts.push(`${cylinders}-Cylinder`);
if (displacement) parts.push(`${displacement}L`);
if (fuelType && fuelType !== 'Gasoline') parts.push(fuelType);
return parts.length > 0 ? parts.join(' ') : null;
}
}

View File

@@ -1,96 +0,0 @@
/**
* @ai-summary Type definitions for NHTSA vPIC API
* @ai-context Defines request/response types for VIN decoding
*/
/**
* Individual result from NHTSA DecodeVin API
*/
export interface NHTSAResult {
Value: string | null;
ValueId: string | null;
Variable: string;
VariableId: number;
}
/**
* Raw response from NHTSA DecodeVin API
* GET https://vpic.nhtsa.dot.gov/api/vehicles/decodevin/{VIN}?format=json
*/
export interface NHTSADecodeResponse {
Count: number;
Message: string;
SearchCriteria: string;
Results: NHTSAResult[];
}
/**
* Confidence level for matched dropdown values
*/
export type MatchConfidence = 'high' | 'medium' | 'none';
/**
* Matched field with confidence indicator
*/
export interface MatchedField<T> {
value: T | null;
nhtsaValue: string | null;
confidence: MatchConfidence;
}
/**
* Decoded vehicle data with match confidence per field
* Maps NHTSA response fields to internal field names (camelCase)
*
* NHTSA Field Mappings:
* - ModelYear -> year
* - Make -> make
* - Model -> model
* - Trim -> trimLevel
* - BodyClass -> bodyType
* - DriveType -> driveType
* - FuelTypePrimary -> fuelType
* - EngineModel / EngineCylinders + EngineDisplacementL -> engine
* - TransmissionStyle -> transmission
*/
export interface DecodedVehicleData {
year: MatchedField<number>;
make: MatchedField<string>;
model: MatchedField<string>;
trimLevel: MatchedField<string>;
bodyType: MatchedField<string>;
driveType: MatchedField<string>;
fuelType: MatchedField<string>;
engine: MatchedField<string>;
transmission: MatchedField<string>;
}
/**
* Cached VIN data from vin_cache table
*/
export interface VinCacheEntry {
vin: string;
make: string | null;
model: string | null;
year: number | null;
engineType: string | null;
bodyType: string | null;
rawData: NHTSADecodeResponse;
cachedAt: Date;
}
/**
* VIN decode request body
*/
export interface DecodeVinRequest {
vin: string;
}
/**
* VIN decode error response
*/
export interface VinDecodeError {
error: 'INVALID_VIN' | 'VIN_DECODE_FAILED' | 'TIER_REQUIRED';
message: string;
details?: string;
}

View File

@@ -35,7 +35,7 @@ The platform provides vehicle hierarchical data lookups:
VIN decoding is planned but not yet implemented. Future capabilities will include: VIN decoding is planned but not yet implemented. Future capabilities will include:
- `GET /api/platform/vehicle?vin={vin}` - Decode VIN to vehicle details - `GET /api/platform/vehicle?vin={vin}` - Decode VIN to vehicle details
- PostgreSQL-based VIN decode function - PostgreSQL-based VIN decode function
- NHTSA vPIC API fallback with circuit breaker - Gemini VIN decode via OCR service
- Redis caching (7-day TTL for successful decodes) - Redis caching (7-day TTL for successful decodes)
**Data Source**: Vehicle data from standardized sources **Data Source**: Vehicle data from standardized sources

View File

@@ -74,7 +74,7 @@ docker compose exec mvp-frontend npm test -- --coverage
Example: `vehicles.service.test.ts` Example: `vehicles.service.test.ts`
- Tests VIN validation logic - Tests VIN validation logic
- Tests vehicle creation with mocked vPIC responses - Tests vehicle creation with mocked OCR service responses
- Tests caching behavior with mocked Redis - Tests caching behavior with mocked Redis
- Tests error handling paths - Tests error handling paths
@@ -194,7 +194,7 @@ All 15 features have test suites with unit and/or integration tests:
- `vehicles` - Unit + integration tests - `vehicles` - Unit + integration tests
### Mock Strategy ### Mock Strategy
- **External APIs**: Completely mocked (vPIC, Google Maps) - **External APIs**: Completely mocked (OCR service, Google Maps)
- **Database**: Real database with transactions - **Database**: Real database with transactions
- **Redis**: Mocked for unit tests, real for integration - **Redis**: Mocked for unit tests, real for integration
- **Auth**: Mocked JWT tokens for protected endpoints - **Auth**: Mocked JWT tokens for protected endpoints
@@ -319,8 +319,8 @@ describe('Error Handling', () => {
).rejects.toThrow('Invalid VIN format'); ).rejects.toThrow('Invalid VIN format');
}); });
it('should handle vPIC API failure', async () => { it('should handle OCR service failure', async () => {
mockVpicClient.decode.mockRejectedValue(new Error('API down')); mockOcrClient.decodeVin.mockRejectedValue(new Error('API down'));
const result = await vehicleService.create(validVehicle, 'user123'); const result = await vehicleService.create(validVehicle, 'user123');
expect(result.make).toBeNull(); // Graceful degradation expect(result.make).toBeNull(); // Graceful degradation

View File

@@ -644,7 +644,7 @@ When you attempt to use a Pro feature on the Free tier, an **Upgrade Required**
### VIN Camera Scanning and Decode (Pro) ### VIN Camera Scanning and Decode (Pro)
**What it does:** Use your device camera to photograph your vehicle's VIN plate, and the system automatically reads the VIN using OCR (Optical Character Recognition) and decodes it from the NHTSA database. **What it does:** Use your device camera to photograph your vehicle's VIN plate, and the system automatically reads the VIN using OCR (Optical Character Recognition) and decodes it from the vehicle database.
**How to use it:** **How to use it:**
@@ -655,7 +655,7 @@ When you attempt to use a Pro feature on the Free tier, an **Upgrade Required**
5. A **VIN OCR Review modal** appears showing the detected VIN with confidence indicators 5. A **VIN OCR Review modal** appears showing the detected VIN with confidence indicators
6. Confirm or correct the VIN, then click **Accept** 6. Confirm or correct the VIN, then click **Accept**
7. Click the **Decode VIN** button 7. Click the **Decode VIN** button
8. The system queries the NHTSA database and auto-populates: Year, Make, Model, Engine, Transmission, and Trim 8. The system queries the vehicle database and auto-populates: Year, Make, Model, Engine, Transmission, and Trim
9. Review the pre-filled fields and complete the remaining details 9. Review the pre-filled fields and complete the remaining details
This eliminates manual data entry errors and ensures accurate vehicle specifications. This eliminates manual data entry errors and ensures accurate vehicle specifications.

View File

@@ -1,6 +1,6 @@
# ocr/ # ocr/
Python OCR microservice. Primary engine: PaddleOCR PP-OCRv4 with optional Google Vision cloud fallback. Gemini 2.5 Flash for maintenance manual PDF extraction. Pluggable engine abstraction in `app/engines/`. Python OCR microservice. Primary engine: PaddleOCR PP-OCRv4 with optional Google Vision cloud fallback. Gemini 2.5 Flash for maintenance manual PDF extraction and VIN decode. Pluggable engine abstraction in `app/engines/`.
## Files ## Files

View File

@@ -19,7 +19,7 @@ Python OCR microservice (FastAPI). Primary engine: PaddleOCR PP-OCRv4 with optio
| `models/` | Data models and schemas | Request/response types | | `models/` | Data models and schemas | Request/response types |
| `patterns/` | Regex patterns and service name mapping (27 maintenance subtypes) | Pattern matching rules, service categorization | | `patterns/` | Regex patterns and service name mapping (27 maintenance subtypes) | Pattern matching rules, service categorization |
| `preprocessors/` | Image preprocessing pipeline | Image preparation before OCR | | `preprocessors/` | Image preprocessing pipeline | Image preparation before OCR |
| `routers/` | FastAPI route handlers (/extract, /extract/receipt, /extract/manual, /jobs) | API endpoint changes | | `routers/` | FastAPI route handlers (/extract, /extract/receipt, /extract/manual, /decode, /jobs) | API endpoint changes |
| `services/` | Business logic services (job queue with Redis) | Core OCR processing, async job management | | `services/` | Business logic services (job queue with Redis) | Core OCR processing, async job management |
| `table_extraction/` | Table detection and parsing | Structured data extraction from images | | `table_extraction/` | Table detection and parsing | Structured data extraction from images |
| `validators/` | Input validation | Validation rules | | `validators/` | Input validation | Validation rules |

View File

@@ -3,7 +3,7 @@
OCR engine abstraction layer. Two categories of engines: OCR engine abstraction layer. Two categories of engines:
1. **OcrEngine subclasses** (image-to-text): PaddleOCR, Google Vision, Hybrid. Accept image bytes, return text + confidence + word boxes. 1. **OcrEngine subclasses** (image-to-text): PaddleOCR, Google Vision, Hybrid. Accept image bytes, return text + confidence + word boxes.
2. **GeminiEngine** (PDF-to-structured-data): Standalone module for maintenance schedule extraction via Vertex AI. Accepts PDF bytes, returns structured JSON. Not an OcrEngine subclass because the interface signatures differ. 2. **GeminiEngine** (PDF-to-structured-data and VIN decode): Standalone module for maintenance schedule extraction and VIN decoding via Vertex AI. Accepts PDF bytes or VIN strings, returns structured JSON. Not an OcrEngine subclass because the interface signatures differ.
## Files ## Files
@@ -15,7 +15,7 @@ OCR engine abstraction layer. Two categories of engines:
| `cloud_engine.py` | Google Vision TEXT_DETECTION fallback engine (WIF authentication) | Cloud OCR configuration, API quota | | `cloud_engine.py` | Google Vision TEXT_DETECTION fallback engine (WIF authentication) | Cloud OCR configuration, API quota |
| `hybrid_engine.py` | Combines primary + fallback engine with confidence threshold switching | Engine selection logic, fallback behavior | | `hybrid_engine.py` | Combines primary + fallback engine with confidence threshold switching | Engine selection logic, fallback behavior |
| `engine_factory.py` | Factory function and engine registry for instantiation | Adding new engine types | | `engine_factory.py` | Factory function and engine registry for instantiation | Adding new engine types |
| `gemini_engine.py` | Gemini 2.5 Flash integration for maintenance schedule extraction (Vertex AI SDK, 20MB PDF limit, structured JSON output) | Manual extraction debugging, Gemini configuration | | `gemini_engine.py` | Gemini 2.5 Flash integration for maintenance schedule extraction and VIN decoding (Vertex AI SDK, 20MB PDF limit, structured JSON output) | Manual extraction debugging, VIN decode, Gemini configuration |
## Engine Selection ## Engine Selection
@@ -30,4 +30,4 @@ create_engine(config)
HybridEngine (tries primary, falls back if confidence < threshold) HybridEngine (tries primary, falls back if confidence < threshold)
``` ```
GeminiEngine is created independently by ManualExtractor, not through the engine factory. GeminiEngine is created independently by ManualExtractor and the VIN decode router, not through the engine factory.