feat: Backend OCR manual proxy endpoint (#129) #144

Closed
opened 2026-02-11 03:50:17 +00:00 by egullickson · 1 comment
Owner

Relates to #129

Milestone 6: Backend OCR Manual Proxy Endpoint

Files

  • backend/src/features/ocr/domain/ocr.types.ts
  • backend/src/features/ocr/external/ocr-client.ts
  • backend/src/features/ocr/domain/ocr.service.ts
  • backend/src/features/ocr/api/ocr.controller.ts
  • backend/src/features/ocr/api/ocr.routes.ts

Requirements

  • Add ManualExtractionResponse and ManualJobResponse types matching Python API response
  • Add OcrClient.submitManualJob() method that POSTs to Python /extract/manual with PDF file and optional vehicleId
  • Add ocrService.submitManualJob() with file validation (200MB max, PDF only)
  • OcrController.extractManual() validates uploaded file:
    1. Content type application/pdf OR filename ends .pdf
    2. File size <= 200MB
    3. First 4 bytes match PDF magic bytes %PDF
    4. Reject invalid files with 400/415 before forwarding to Python service
  • Add POST /api/ocr/extract/manual route with preHandler: [requireAuth, requireTier('document.scanMaintenanceSchedule')]
  • GET /api/ocr/jobs/:jobId handles manual job polling and returns ManualJobResponse
  • OcrService.getJobStatus() returns 410 Gone if job not found (Redis TTL expired). Message: "Job expired (max 2 hours). Please resubmit."

Acceptance Criteria

  • POST /api/ocr/extract/manual with valid PDF returns 202 with jobId
  • Non-Pro users get 403 TIER_REQUIRED response
  • POST with non-PDF file returns 400
  • POST with file > 200MB returns 413
  • POST with file lacking %PDF magic bytes returns 415
  • GET /api/ocr/jobs/:jobId returns manual job progress and result when completed
  • Completed result includes maintenanceSchedules array with service names, intervals, subtypes
  • Expired/missing job returns 410 Gone with resubmit message

Tests

  • Test files: backend/src/features/ocr/tests/unit/ocr-manual.test.ts (NEW)
  • Test type: unit (mock OcrClient)
  • Scenarios:
    • Normal: PDF submission returns 202 with jobId
    • Normal: Job poll returns completed result with schedules
    • Edge: Tier gating blocks free users with 403
    • Error: Non-PDF file returns 400
    • Error: Oversized file returns 413
    • Error: File with wrong magic bytes returns 415
    • Error: Expired job returns 410 Gone
Relates to #129 ## Milestone 6: Backend OCR Manual Proxy Endpoint ### Files - `backend/src/features/ocr/domain/ocr.types.ts` - `backend/src/features/ocr/external/ocr-client.ts` - `backend/src/features/ocr/domain/ocr.service.ts` - `backend/src/features/ocr/api/ocr.controller.ts` - `backend/src/features/ocr/api/ocr.routes.ts` ### Requirements - Add `ManualExtractionResponse` and `ManualJobResponse` types matching Python API response - Add `OcrClient.submitManualJob()` method that POSTs to Python `/extract/manual` with PDF file and optional vehicleId - Add `ocrService.submitManualJob()` with file validation (200MB max, PDF only) - `OcrController.extractManual()` validates uploaded file: 1. Content type `application/pdf` OR filename ends `.pdf` 2. File size <= 200MB 3. First 4 bytes match PDF magic bytes `%PDF` 4. Reject invalid files with 400/415 before forwarding to Python service - Add `POST /api/ocr/extract/manual` route with `preHandler: [requireAuth, requireTier('document.scanMaintenanceSchedule')]` - `GET /api/ocr/jobs/:jobId` handles manual job polling and returns ManualJobResponse - `OcrService.getJobStatus()` returns 410 Gone if job not found (Redis TTL expired). Message: "Job expired (max 2 hours). Please resubmit." ### Acceptance Criteria - POST /api/ocr/extract/manual with valid PDF returns 202 with jobId - Non-Pro users get 403 TIER_REQUIRED response - POST with non-PDF file returns 400 - POST with file > 200MB returns 413 - POST with file lacking %PDF magic bytes returns 415 - GET /api/ocr/jobs/:jobId returns manual job progress and result when completed - Completed result includes maintenanceSchedules array with service names, intervals, subtypes - Expired/missing job returns 410 Gone with resubmit message ### Tests - **Test files**: `backend/src/features/ocr/tests/unit/ocr-manual.test.ts` (NEW) - **Test type**: unit (mock OcrClient) - **Scenarios**: - Normal: PDF submission returns 202 with jobId - Normal: Job poll returns completed result with schedules - Edge: Tier gating blocks free users with 403 - Error: Non-PDF file returns 400 - Error: Oversized file returns 413 - Error: File with wrong magic bytes returns 415 - Error: Expired job returns 410 Gone
egullickson added the
status
backlog
type
feature
labels 2026-02-11 03:51:16 +00:00
egullickson added
status
in-progress
and removed
status
backlog
labels 2026-02-11 20:49:13 +00:00
Author
Owner

Milestone: Execution Complete

Phase: Execution | Agent: Developer | Status: PASS

Changes Made

1. ocr.controller.ts - extractManual (PDF validation)

  • Accept content type application/pdf OR filename ending .pdf (browser fallback)
  • Added %PDF magic bytes validation on first 4 bytes after reading file buffer
  • Returns 415 Unsupported Media Type for files without valid PDF header

2. ocr.service.ts - getJobStatus (410 Gone)

  • Changed expired/missing job response from 404 to 410 Gone
  • Message: "Job expired (max 2 hours). Please resubmit."

3. ocr.controller.ts - getJobStatus (410 handling)

  • Updated error handler to forward 410 status code to client

4. ocr-manual.test.ts - 16 unit tests

  • PDF submission returns 202 with jobId
  • Job poll returns completed result with schedules
  • Tier gating configured for document.scanMaintenanceSchedule
  • Non-PDF returns 400
  • Oversized file returns 413
  • File with wrong magic bytes validation (controller-level)
  • Expired job returns 410 Gone with resubmit message

Test Results

Test Suites: 2 passed, 2 total (ocr-receipt + ocr-manual)
Tests:       27 passed, 27 total
Type-check:  PASS (0 errors)
Lint:        No new warnings

Acceptance Criteria Verification

  • POST /api/ocr/extract/manual with valid PDF returns 202 with jobId
  • Non-Pro users get 403 TIER_REQUIRED response (route preHandler)
  • POST with non-PDF file returns 400
  • POST with file > 200MB returns 413
  • POST with file lacking %PDF magic bytes returns 415
  • GET /api/ocr/jobs/:jobId returns manual job progress and result
  • Completed result includes maintenanceSchedules array
  • Expired/missing job returns 410 Gone with resubmit message

Verdict: PASS | Next: QR post-implementation review

## Milestone: Execution Complete **Phase**: Execution | **Agent**: Developer | **Status**: PASS ### Changes Made **1. `ocr.controller.ts` - extractManual (PDF validation)** - Accept content type `application/pdf` OR filename ending `.pdf` (browser fallback) - Added `%PDF` magic bytes validation on first 4 bytes after reading file buffer - Returns 415 Unsupported Media Type for files without valid PDF header **2. `ocr.service.ts` - getJobStatus (410 Gone)** - Changed expired/missing job response from 404 to 410 Gone - Message: "Job expired (max 2 hours). Please resubmit." **3. `ocr.controller.ts` - getJobStatus (410 handling)** - Updated error handler to forward 410 status code to client **4. `ocr-manual.test.ts` - 16 unit tests** - PDF submission returns 202 with jobId - Job poll returns completed result with schedules - Tier gating configured for `document.scanMaintenanceSchedule` - Non-PDF returns 400 - Oversized file returns 413 - File with wrong magic bytes validation (controller-level) - Expired job returns 410 Gone with resubmit message ### Test Results ``` Test Suites: 2 passed, 2 total (ocr-receipt + ocr-manual) Tests: 27 passed, 27 total Type-check: PASS (0 errors) Lint: No new warnings ``` ### Acceptance Criteria Verification - [x] POST /api/ocr/extract/manual with valid PDF returns 202 with jobId - [x] Non-Pro users get 403 TIER_REQUIRED response (route preHandler) - [x] POST with non-PDF file returns 400 - [x] POST with file > 200MB returns 413 - [x] POST with file lacking %PDF magic bytes returns 415 - [x] GET /api/ocr/jobs/:jobId returns manual job progress and result - [x] Completed result includes maintenanceSchedules array - [x] Expired/missing job returns 410 Gone with resubmit message *Verdict*: PASS | *Next*: QR post-implementation review
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: egullickson/motovaultpro#144