motovaultpro

egullickson/motovaultpro

Fork 0

Commit Graph

Author	SHA1	Message	Date
Eric Gullickson	3eb54211cb	feat: add owner's manual OCR pipeline (refs #71 ) All checks were successful Deploy to Staging / Build Images (pull_request) Successful in 3m1s Details Deploy to Staging / Deploy to Staging (pull_request) Successful in 31s Details Deploy to Staging / Verify Staging (pull_request) Successful in 2m19s Details Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s Details Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped Details Implement async PDF processing for owner's manuals with maintenance schedule extraction: - Add PDF preprocessor with PyMuPDF for text/scanned PDF handling - Add maintenance pattern matching (mileage, time, fluid specs) - Add service name mapping to maintenance subtypes - Add table detection and parsing for schedule tables - Add manual extractor orchestrating the complete pipeline - Add POST /extract/manual endpoint for async job submission - Add Redis job queue support for manual extraction jobs - Add progress tracking during processing Processing pipeline: 1. Analyze PDF structure (text layer vs scanned) 2. Find maintenance schedule sections 3. Extract text or OCR scanned pages at 300 DPI 4. Detect and parse maintenance tables 5. Normalize service names and extract intervals 6. Return structured maintenance schedules with confidence scores Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-01 21:30:20 -06:00

Author

SHA1

Message

Date

Eric Gullickson

3eb54211cb

feat: add owner's manual OCR pipeline (refs #71 )

Deploy to Staging / Build Images (pull_request) Successful in 3m1s

Details

Deploy to Staging / Deploy to Staging (pull_request) Successful in 31s

Details

Deploy to Staging / Verify Staging (pull_request) Successful in 2m19s

Details

Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s

Details

Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped

Details

Implement async PDF processing for owner's manuals with maintenance
schedule extraction:

- Add PDF preprocessor with PyMuPDF for text/scanned PDF handling
- Add maintenance pattern matching (mileage, time, fluid specs)
- Add service name mapping to maintenance subtypes
- Add table detection and parsing for schedule tables
- Add manual extractor orchestrating the complete pipeline
- Add POST /extract/manual endpoint for async job submission
- Add Redis job queue support for manual extraction jobs
- Add progress tracking during processing

Processing pipeline:
1. Analyze PDF structure (text layer vs scanned)
2. Find maintenance schedule sections
3. Extract text or OCR scanned pages at 300 DPI
4. Detect and parse maintenance tables
5. Normalize service names and extract intervals
6. Return structured maintenance schedules with confidence scores

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-01 21:30:20 -06:00

1 Commits