All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 31s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 31s
Deploy to Staging / Verify Staging (pull_request) Successful in 2m19s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
Implement VIN-specific OCR extraction with optimized preprocessing: - Add POST /extract/vin endpoint for VIN extraction - VIN preprocessor: CLAHE, deskew, denoise, adaptive threshold - VIN validator: check digit validation, OCR error correction (I->1, O->0) - VIN extractor: PSM modes 6/7/8, character whitelist, alternatives - Response includes confidence, bounding box, and alternatives - Unit tests for validator and preprocessor - Integration tests for VIN extraction endpoint Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
48 lines
1.1 KiB
Python
48 lines
1.1 KiB
Python
"""Base extractor class for domain-specific OCR extraction."""
|
|
from abc import ABC, abstractmethod
|
|
from dataclasses import dataclass, field
|
|
from typing import Any, Optional
|
|
|
|
|
|
@dataclass
|
|
class ExtractionResult:
|
|
"""Base result for extraction operations."""
|
|
|
|
success: bool
|
|
confidence: float
|
|
raw_text: str
|
|
processing_time_ms: int
|
|
extracted_data: dict[str, Any] = field(default_factory=dict)
|
|
error: Optional[str] = None
|
|
|
|
|
|
class BaseExtractor(ABC):
|
|
"""Abstract base class for domain-specific extractors."""
|
|
|
|
@abstractmethod
|
|
def extract(self, image_bytes: bytes, content_type: Optional[str] = None) -> ExtractionResult:
|
|
"""
|
|
Extract domain-specific data from an image.
|
|
|
|
Args:
|
|
image_bytes: Raw image bytes
|
|
content_type: MIME type of the image
|
|
|
|
Returns:
|
|
ExtractionResult with extracted data
|
|
"""
|
|
pass
|
|
|
|
@abstractmethod
|
|
def validate(self, data: Any) -> bool:
|
|
"""
|
|
Validate extracted data.
|
|
|
|
Args:
|
|
data: Extracted data to validate
|
|
|
|
Returns:
|
|
True if data is valid
|
|
"""
|
|
pass
|