feat: add VIN photo OCR pipeline (refs #67)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 31s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 31s
Deploy to Staging / Verify Staging (pull_request) Successful in 2m19s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 31s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 31s
Deploy to Staging / Verify Staging (pull_request) Successful in 2m19s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
Implement VIN-specific OCR extraction with optimized preprocessing: - Add POST /extract/vin endpoint for VIN extraction - VIN preprocessor: CLAHE, deskew, denoise, adaptive threshold - VIN validator: check digit validation, OCR error correction (I->1, O->0) - VIN extractor: PSM modes 6/7/8, character whitelist, alternatives - Response includes confidence, bounding box, and alternatives - Unit tests for validator and preprocessor - Integration tests for VIN extraction endpoint Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
47
ocr/app/extractors/base.py
Normal file
47
ocr/app/extractors/base.py
Normal file
@@ -0,0 +1,47 @@
|
||||
"""Base extractor class for domain-specific OCR extraction."""
|
||||
from abc import ABC, abstractmethod
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any, Optional
|
||||
|
||||
|
||||
@dataclass
|
||||
class ExtractionResult:
|
||||
"""Base result for extraction operations."""
|
||||
|
||||
success: bool
|
||||
confidence: float
|
||||
raw_text: str
|
||||
processing_time_ms: int
|
||||
extracted_data: dict[str, Any] = field(default_factory=dict)
|
||||
error: Optional[str] = None
|
||||
|
||||
|
||||
class BaseExtractor(ABC):
|
||||
"""Abstract base class for domain-specific extractors."""
|
||||
|
||||
@abstractmethod
|
||||
def extract(self, image_bytes: bytes, content_type: Optional[str] = None) -> ExtractionResult:
|
||||
"""
|
||||
Extract domain-specific data from an image.
|
||||
|
||||
Args:
|
||||
image_bytes: Raw image bytes
|
||||
content_type: MIME type of the image
|
||||
|
||||
Returns:
|
||||
ExtractionResult with extracted data
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def validate(self, data: Any) -> bool:
|
||||
"""
|
||||
Validate extracted data.
|
||||
|
||||
Args:
|
||||
data: Extracted data to validate
|
||||
|
||||
Returns:
|
||||
True if data is valid
|
||||
"""
|
||||
pass
|
||||
Reference in New Issue
Block a user