feat: Expand OCR with fuel receipt scanning and maintenance extraction (#129) #147

Merged
egullickson merged 26 commits from issue-129-expand-ocr-fuel-receipt-maintenance into main 2026-02-13 02:25:55 +00:00
Owner

Summary

  • Add fuel receipt OCR scanning (Pro tier) with station matching via Google Places API
  • Add owners manual maintenance schedule extraction via Gemini 2.5 Flash on Vertex AI
  • Both features include full async job flow, progress indicators, and review screens
  • 51 files changed across backend, frontend, OCR Python service, and documentation

Linked issues

Fixes #129
Fixes #130
Fixes #131
Fixes #132
Fixes #133
Fixes #134
Fixes #135
Fixes #136
Fixes #137
Fixes #138
Fixes #139
Fixes #140
Fixes #141
Fixes #142
Fixes #143
Fixes #144
Fixes #145
Fixes #146

Type

  • Feature
  • Bug fix
  • Chore / refactor
  • Docs

Milestone Breakdown

Milestone Sub-Issue Description
M0 #130 Receipt extraction proxy endpoint (backend)
M1 #131 Receipt OCR frontend (useReceiptOcr, tier gating, station matching)
M2 #132 Station matching from receipt merchant name
M3 #133 Gemini engine module and Vertex AI configuration
M4 #134 ManualExtractor rewrite to use Gemini engine
M5 #135 Backend OCR manual proxy endpoint
M6 #136 Frontend manual extraction flow with review screen
M7-docs #137 CLAUDE.md indexes and README updates
M7a #138 Standalone requireTier middleware
M7b #139 Receipt proxy tier guard, 422 forwarding, tests
M7c #140 44px minimum touch targets for receipt OCR components
M7d #141 5s timeout and warning log for station name search
M7e #142 Traceback logging and error messages for GeminiEngine
M7f #143 ManualExtractor progress spec-aligned 10/50/95/100 pattern
M7g #144 PDF magic bytes validation, 410 Gone, manual extraction tests
M7h #145 410 error handling, progress messages, touch targets, frontend tests
M8 #146 Documentation verification and fixes

Test plan

  • Unit tests

Commands / steps:

  1. npm test - Run all unit tests (backend + frontend)
  2. npm run lint - Linting passes
  3. npm run type-check - TypeScript validation passes

Test coverage:

  • backend/src/features/ocr/tests/unit/ocr-receipt.test.ts - Receipt extraction proxy tests
  • backend/src/features/ocr/tests/unit/ocr-manual.test.ts - Manual extraction proxy tests
  • backend/src/features/stations/tests/unit/station-matching.test.ts - Station matching tests
  • backend/src/core/middleware/require-tier.test.ts - Standalone requireTier middleware tests
  • backend/src/core/config/tests/feature-tiers.test.ts - Feature tier configuration tests
  • ocr/tests/test_gemini_engine.py - Gemini engine unit tests
  • ocr/tests/test_manual_extractor.py - Manual extractor unit tests
  • frontend/src/features/maintenance/components/MaintenanceScheduleReviewScreen.test.tsx - Review screen tests

Key Implementation Details

Receipt OCR Flow

Camera/Upload -> POST /ocr/extract/receipt (Pro tier) -> Python HybridEngine -> extractedFields -> POST /stations/match -> Pre-fill FuelLogForm

Manual Extraction Flow

PDF Upload + checkbox -> POST /ocr/extract/manual (Pro tier) -> Async job -> GeminiEngine (Vertex AI) -> Poll /ocr/jobs/:jobId -> MaintenanceScheduleReviewScreen -> Batch create schedules

New Infrastructure

  • requireTier standalone middleware for route-level tier gating
  • fuelLog.receiptScan Pro tier feature key
  • Gemini 2.5 Flash engine (Vertex AI SDK, structured JSON output, 20MB limit)
  • ManualExtractor spec-aligned progress pattern (10% -> 50% -> 95% -> 100%)
  • PDF magic bytes validation, 410 Gone for expired jobs

Checklist

  • Acceptance criteria met (from linked issue)
  • No secrets committed
  • Logging is appropriate (no PII)
  • Docs updated (if needed)
## Summary - Add fuel receipt OCR scanning (Pro tier) with station matching via Google Places API - Add owners manual maintenance schedule extraction via Gemini 2.5 Flash on Vertex AI - Both features include full async job flow, progress indicators, and review screens - 51 files changed across backend, frontend, OCR Python service, and documentation ## Linked issues Fixes #129 Fixes #130 Fixes #131 Fixes #132 Fixes #133 Fixes #134 Fixes #135 Fixes #136 Fixes #137 Fixes #138 Fixes #139 Fixes #140 Fixes #141 Fixes #142 Fixes #143 Fixes #144 Fixes #145 Fixes #146 ## Type - [x] Feature - [ ] Bug fix - [ ] Chore / refactor - [x] Docs ## Milestone Breakdown | Milestone | Sub-Issue | Description | |-----------|-----------|-------------| | M0 | #130 | Receipt extraction proxy endpoint (backend) | | M1 | #131 | Receipt OCR frontend (useReceiptOcr, tier gating, station matching) | | M2 | #132 | Station matching from receipt merchant name | | M3 | #133 | Gemini engine module and Vertex AI configuration | | M4 | #134 | ManualExtractor rewrite to use Gemini engine | | M5 | #135 | Backend OCR manual proxy endpoint | | M6 | #136 | Frontend manual extraction flow with review screen | | M7-docs | #137 | CLAUDE.md indexes and README updates | | M7a | #138 | Standalone requireTier middleware | | M7b | #139 | Receipt proxy tier guard, 422 forwarding, tests | | M7c | #140 | 44px minimum touch targets for receipt OCR components | | M7d | #141 | 5s timeout and warning log for station name search | | M7e | #142 | Traceback logging and error messages for GeminiEngine | | M7f | #143 | ManualExtractor progress spec-aligned 10/50/95/100 pattern | | M7g | #144 | PDF magic bytes validation, 410 Gone, manual extraction tests | | M7h | #145 | 410 error handling, progress messages, touch targets, frontend tests | | M8 | #146 | Documentation verification and fixes | ## Test plan - [x] Unit tests **Commands / steps:** 1. `npm test` - Run all unit tests (backend + frontend) 2. `npm run lint` - Linting passes 3. `npm run type-check` - TypeScript validation passes **Test coverage:** - `backend/src/features/ocr/tests/unit/ocr-receipt.test.ts` - Receipt extraction proxy tests - `backend/src/features/ocr/tests/unit/ocr-manual.test.ts` - Manual extraction proxy tests - `backend/src/features/stations/tests/unit/station-matching.test.ts` - Station matching tests - `backend/src/core/middleware/require-tier.test.ts` - Standalone requireTier middleware tests - `backend/src/core/config/tests/feature-tiers.test.ts` - Feature tier configuration tests - `ocr/tests/test_gemini_engine.py` - Gemini engine unit tests - `ocr/tests/test_manual_extractor.py` - Manual extractor unit tests - `frontend/src/features/maintenance/components/MaintenanceScheduleReviewScreen.test.tsx` - Review screen tests ## Key Implementation Details ### Receipt OCR Flow ``` Camera/Upload -> POST /ocr/extract/receipt (Pro tier) -> Python HybridEngine -> extractedFields -> POST /stations/match -> Pre-fill FuelLogForm ``` ### Manual Extraction Flow ``` PDF Upload + checkbox -> POST /ocr/extract/manual (Pro tier) -> Async job -> GeminiEngine (Vertex AI) -> Poll /ocr/jobs/:jobId -> MaintenanceScheduleReviewScreen -> Batch create schedules ``` ### New Infrastructure - `requireTier` standalone middleware for route-level tier gating - `fuelLog.receiptScan` Pro tier feature key - Gemini 2.5 Flash engine (Vertex AI SDK, structured JSON output, 20MB limit) - ManualExtractor spec-aligned progress pattern (10% -> 50% -> 95% -> 100%) - PDF magic bytes validation, 410 Gone for expired jobs ## Checklist - [x] Acceptance criteria met (from linked issue) - [x] No secrets committed - [x] Logging is appropriate (no PII) - [x] Docs updated (if needed)
egullickson added 19 commits 2026-02-11 21:28:12 +00:00
Add POST /api/ocr/extract/receipt endpoint that proxies to the Python
OCR service's /extract/receipt for receipt-specific field extraction.

- ReceiptExtractionResponse type with receiptType, extractedFields, rawText
- OcrClient.extractReceipt() with optional receipt_type form field
- OcrService.extractReceipt() with 10MB max, image-only validation
- OcrController.extractReceipt() with file upload and error mapping
- Route with auth middleware
- 9 unit tests covering normal, edge, and error scenarios

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Free tier users see locked button with upgrade prompt dialog.
Pro+ users can capture receipts normally. Works on mobile and desktop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Google Places Text Search to match receipt merchant names (e.g.
"Shell", "COSTCO #123") to real gas stations. Backend exposes
POST /api/stations/match endpoint. Frontend calls it after OCR
extraction and pre-fills locationData with matched station's placeId,
name, and address. Users can clear the match in the review modal.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add standalone GeminiEngine class for maintenance schedule extraction
from PDF owners manuals using Vertex AI Gemini 2.5 Flash with structured
JSON output enforcement, 20MB size limit, and lazy initialization.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace traditional OCR pipeline (table_detector, table_parser,
maintenance_patterns) with GeminiEngine for semantic PDF extraction.
Map Gemini serviceName values to 27 maintenance subtypes via
ServiceMapper fuzzy matching. Add 8 unit tests covering normal
extraction, unusual names, empty response, and error handling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add POST /api/ocr/extract/manual endpoint that proxies to the Python
OCR service's manual extraction pipeline. Includes Pro tier gating via
document.scanMaintenanceSchedule, PDF-only validation, 200MB file size
limit, and async 202 job response for polling via existing job status
endpoint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create useManualExtraction hook: submit PDF to OCR, poll job status, track progress
- Create useCreateSchedulesFromExtraction hook: batch create maintenance schedules from extraction
- Create MaintenanceScheduleReviewScreen: dialog with checkboxes, inline editing, batch create
- Update DocumentForm: remove "(Coming soon)", trigger extraction after upload, show progress
- Add 12 unit tests for review screen (rendering, selection, empty state, errors)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add/update documentation across backend, Python OCR service, and frontend
for receipt scanning, manual extraction, and Gemini integration. Create
new CLAUDE.md files for engines/, fuel-logs/, documents/, and maintenance/
features.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Create reusable preHandler middleware for subscription tier gating.
Composable with requireAuth in route preHandler arrays. Returns 403
TIER_REQUIRED with upgrade prompt for insufficient tier, 500 for
unknown feature keys. Includes 9 unit tests covering all acceptance
criteria.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds minHeight/minWidth: 44 to ReceiptCameraButton, ReceiptOcrReviewModal
action buttons, and UpgradeRequiredDialog buttons and close icon to meet
mobile accessibility requirements.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add 5000ms timeout to Places Text Search API call in searchStationByName.
Timeout errors log a warning instead of error and return null gracefully.
Add timeout test case to station-matching unit tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add filename .pdf extension fallback and %PDF magic bytes validation to
extractManual controller. Update getJobStatus to return 410 Gone for
expired jobs. Add 16 unit tests covering all acceptance criteria.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Handle poll errors including 410 Gone in useManualExtraction hook
- Add specific progress stage messages (Preparing/Processing/Mapping/Complete)
- Enforce 44px minimum touch targets on all interactive elements
- Add tests for inline editing, mobile fullscreen, and desktop modal layouts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
docs: fix receipt tier gating and add feature tier refs to core docs (refs #146)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 15m57s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 53s
Deploy to Staging / Verify Staging (pull_request) Successful in 9s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
48993eb311
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
egullickson added 1 commit 2026-02-12 01:42:50 +00:00
fix: Variables
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 34s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 9s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
b97d226d44
egullickson added 1 commit 2026-02-12 01:57:40 +00:00
fix: Manual scanning
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 35s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
a078962d3f
egullickson added 1 commit 2026-02-12 02:06:11 +00:00
fix: Manual polling typo
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 36s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
55a7bcc874
egullickson added 1 commit 2026-02-12 02:29:43 +00:00
fix: Update auto schedule creation
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 3m29s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 25s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
33b489d526
egullickson added 1 commit 2026-02-12 02:47:55 +00:00
fix: Data validation for scheduled maintenance
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 3m24s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 25s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
59e7f4053a
egullickson added 1 commit 2026-02-13 02:03:40 +00:00
fix: Wire vehicleId into maintenance page to display schedules (refs #148)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 3m28s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 52s
Deploy to Staging / Verify Staging (pull_request) Successful in 9s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 10s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
6bb2c575b4
Maintenance page called useMaintenanceRecords() without a vehicleId,
causing the schedules query (enabled: !!vehicleId) to never execute.
Added vehicle selector to both desktop and mobile pages, auto-selects
first vehicle, and passes selectedVehicleId to the hook. Also fixed
stale query invalidation keys in delete handlers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
egullickson added 1 commit 2026-02-13 02:14:07 +00:00
fix: Replace circle toggle with MUI Switch pill style (refs #148)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 35s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 52s
Deploy to Staging / Verify Staging (pull_request) Successful in 9s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
80ee2faed8
EmailNotificationToggle used a custom button-based toggle that rendered
as a circle. Replaced with MUI Switch component to match the pill-style
toggles used on the SettingsPage throughout the app.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
egullickson merged commit 0e97128a31 into main 2026-02-13 02:25:55 +00:00
egullickson deleted branch issue-129-expand-ocr-fuel-receipt-maintenance 2026-02-13 02:25:57 +00:00
Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: egullickson/motovaultpro#147