feat: Manual extractor Gemini rewrite (#129) #134
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Relates to #129
Milestone 5: Manual Extractor Gemini Rewrite
Rewrite the manual extractor to use Gemini instead of the traditional OCR pipeline.
Files
ocr/app/extractors/manual_extractor.py(rewrite)ocr/app/routers/extract.py(update manual endpoint)ocr/app/models/schemas.py(update if needed)Requirements
ManualExtractor.extract()to use GeminiEngine instead of traditional OCR pipelineExtractedSchedule,ManualExtractionResult,VehicleInfoserviceNamevalues to existing 27 maintenance subtypes via fuzzy matchingAcceptance Criteria
Tests
ocr/tests/test_manual_extractor.py(rewrite existing)Milestone: Manual Extractor Gemini Rewrite
Phase: Execution | Agent: Developer | Status: PASS
Changes Made
ocr/app/extractors/manual_extractor.py(rewrite)ManualExtractor.extract(pdf_bytes)now callsGeminiEngine.extract_maintenance(pdf_bytes)for semantic PDF understandingserviceNameto system maintenance subtypes viaServiceMapper.map_service_fuzzy()ExtractedSchedule,VehicleInfo,ManualExtractionResult_process_text_page,_process_scanned_page,_normalize_schedules,_extract_vehicle_info,_parse_vehicle_from_title,_parse_vehicle_from_textocr/app/routers/extract.py-- no changes neededprocess_manual_jobworks unchanged with rewritten extractor (same interface and return type)ocr/tests/test_manual_extractor.py(new)Acceptance Criteria Status
Test Results
All 8 new tests pass. No existing tests broken (8 pre-existing failures unrelated to this change).
Verdict: PASS | Next: QR post-implementation review