Files
Eric Gullickson 54cbd49171
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 31s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 31s
Deploy to Staging / Verify Staging (pull_request) Successful in 2m19s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
feat: add VIN photo OCR pipeline (refs #67)
Implement VIN-specific OCR extraction with optimized preprocessing:

- Add POST /extract/vin endpoint for VIN extraction
- VIN preprocessor: CLAHE, deskew, denoise, adaptive threshold
- VIN validator: check digit validation, OCR error correction (I->1, O->0)
- VIN extractor: PSM modes 6/7/8, character whitelist, alternatives
- Response includes confidence, bounding box, and alternatives
- Unit tests for validator and preprocessor
- Integration tests for VIN extraction endpoint

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 19:31:36 -06:00

48 lines
1.1 KiB
Python

"""Base extractor class for domain-specific OCR extraction."""
from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from typing import Any, Optional
@dataclass
class ExtractionResult:
"""Base result for extraction operations."""
success: bool
confidence: float
raw_text: str
processing_time_ms: int
extracted_data: dict[str, Any] = field(default_factory=dict)
error: Optional[str] = None
class BaseExtractor(ABC):
"""Abstract base class for domain-specific extractors."""
@abstractmethod
def extract(self, image_bytes: bytes, content_type: Optional[str] = None) -> ExtractionResult:
"""
Extract domain-specific data from an image.
Args:
image_bytes: Raw image bytes
content_type: MIME type of the image
Returns:
ExtractionResult with extracted data
"""
pass
@abstractmethod
def validate(self, data: Any) -> bool:
"""
Validate extracted data.
Args:
data: Extracted data to validate
Returns:
True if data is valid
"""
pass