feat: Expand OCR with fuel receipt scanning and owners manual maintenance extraction #129

New Issue

egullickson · 2026-02-11T02:24:30Z

egullickson commented

2026-02-11 02:24:30 +00:00

Summary

Expand the OCR functionality with two new scanning capabilities that leverage the existing OCR pipeline and Google cloud services:

Fuel Receipt OCR Scanning - Take a photo of a fuel receipt during fuel log creation to auto-extract station, gallons/liters, and cost per unit
Owners Manual Maintenance Schedule Extraction - Scan an uploaded owners manual to automatically extract routine maintenance schedules

Google Vision API 1,000 calls/month limit established in #127.

Feature 1: Fuel Receipt OCR Scanning

Description

When adding a fuel log, the user can take a photo of their fuel receipt (mirroring the existing VIN OCR decode UX pattern). The OCR extracts fields from the receipt image and pre-fills the fuel log form with editable values.

Requirements

UX Pattern: Mirror the existing VIN OCR decode screen (take photo -> OCR extract -> pre-fill editable fields -> user confirms/edits)
Extracted Fields:
- Gas station name (matched via Google Places API lookup to link a real station object)
- Gallons or liters (fuel quantity)
- Cost per gallon/liter (unit price)
- Total cost (if visible)
Fuel type: If the fuel type/grade cannot be detected from the receipt, that is acceptable - leave it for manual selection
All pre-filled fields must be editable before saving, just like the VIN decode screen
Google Vision API (TEXT_DETECTION) is the correct engine for single-image receipt scanning (scene text, not structured documents)
Tier Gating: Pro+ only (add fuelLog.receiptScan to FEATURE_TIERS)
Monthly Limit: Counts against the global 1,000 Google API calls/month cap

Station Matching Flow

OCR extracts gas station name/brand from receipt text
Backend calls Google Places API with extracted name to find matching station
If match found, pre-select the station in the fuel log form
User can change/clear the station selection

Technical Notes

Uses the synchronous OCR endpoint (POST /api/ocr/extract) since receipts are single images (1-3 seconds)
The OCR response already supports documentType: 'receipt' and extractedFields
Frontend needs a camera/photo capture component in the fuel log creation flow
Mobile-first: phone camera capture is the primary use case

Feature 2: Owners Manual Maintenance Schedule Extraction

Description

When uploading an owners manual document, a checkbox option "Scan for Maintenance Schedule" triggers a Gemini AI scan of the entire manual to extract routine maintenance items and their intervals. Extracted items are presented for user review before creating maintenance schedules.

Engine: Gemini 2.5 Flash on Vertex AI

Gemini is the right choice over Document AI for this use case because:

Semantic understanding: Gemini comprehends what a maintenance schedule means, not just layout/text extraction
Native PDF processing: Sends the PDF directly to Gemini -- no OCR preprocessing pipeline needed
Structured JSON output: Native responseMimeType: 'application/json' with responseSchema enforcement
1M token context window: Handles entire owners manuals (up to ~1,500 pages of text)
Cost effective: ~$0.001-0.002 per page ($0.30 per 1M input tokens, $2.50 per 1M output tokens)

Gemini Prompt

Extract all routine scheduled maintenance items from this vehicle owners manual.

For each maintenance item, extract:
- serviceName: The maintenance task name (e.g., "Engine Oil Change", "Tire Rotation", "Cabin Air Filter Replacement")
- intervalMiles: The mileage interval as a number, or null if not specified (e.g., 5000, 30000)
- intervalMonths: The time interval in months as a number, or null if not specified (e.g., 6, 12, 24)
- details: Any additional details such as fluid specifications, part numbers, or special instructions (e.g., "Use 0W-20 full synthetic oil")

Only include routine scheduled maintenance items with clear intervals. Do not include one-time procedures, troubleshooting steps, or warranty information.

Return the results as a JSON object with a single "maintenanceSchedule" array.

Gemini Response Schema (enforced via `responseSchema`)

{
  "type": "object",
  "properties": {
    "maintenanceSchedule": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "serviceName": { "type": "string" },
          "intervalMiles": { "type": "number", "nullable": true },
          "intervalMonths": { "type": "number", "nullable": true },
          "details": { "type": "string", "nullable": true }
        },
        "required": ["serviceName"]
      }
    }
  },
  "required": ["maintenanceSchedule"]
}

Example Gemini Response

{
  "maintenanceSchedule": [
    {
      "serviceName": "Engine Oil Change",
      "intervalMiles": 5000,
      "intervalMonths": 6,
      "details": "Use 0W-20 full synthetic oil. Replace oil filter at every oil change."
    },
    {
      "serviceName": "Tire Rotation",
      "intervalMiles": 5000,
      "intervalMonths": 6,
      "details": "Rotate front to rear on same side."
    },
    {
      "serviceName": "Cabin Air Filter Replacement",
      "intervalMiles": 15000,
      "intervalMonths": 12,
      "details": null
    },
    {
      "serviceName": "Brake Fluid Replacement",
      "intervalMiles": null,
      "intervalMonths": 36,
      "details": "Use DOT 3 brake fluid."
    },
    {
      "serviceName": "Spark Plug Replacement",
      "intervalMiles": 60000,
      "intervalMonths": null,
      "details": "Iridium spark plugs. Torque to 18 ft-lbs."
    }
  ]
}

GCP Setup Instructions

1. Enable the Vertex AI API

# Via gcloud CLI
gcloud services enable aiplatform.googleapis.com --project=YOUR_PROJECT_ID

# Or via GCP Console:
# APIs & Services > Enable APIs and Services > Search "Vertex AI API" > Enable

2. Service Account Permissions

The existing service account (used for Google Vision in #127) needs one additional IAM role:

Role	Role ID	Purpose
Vertex AI User	`roles/aiplatform.user`	Required for `aiplatform.endpoints.predict` permission

# Grant the Vertex AI User role to the existing service account
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
  --member="serviceAccount:YOUR_SERVICE_ACCOUNT@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

If using Workload Identity Federation (WIF) from #127, the same federated identity gets the additional role -- no new service account needed.

3. SDK Dependency

# Add to ocr/requirements.txt (Python OCR service)
google-cloud-aiplatform>=1.40.0

# OR for the Node.js backend proxy
npm install @google-cloud/vertexai

4. Environment Variables

Variable	Default	Description
`VERTEX_AI_PROJECT`	(required)	GCP project ID
`VERTEX_AI_LOCATION`	`us-central1`	GCP region for Vertex AI
`GEMINI_MODEL`	`gemini-2.5-flash`	Gemini model ID

5. Authentication

Uses the same credential path as Google Vision (#127):

Development: GOOGLE_APPLICATION_CREDENTIALS env var pointing to service account key JSON
Production: Workload Identity Federation (WIF) via Auth0 -- already configured in #127

Requirements

Checkbox: "Scan for Maintenance Schedule" on the document upload form (when document type is owners manual)
Long-running task: Owners manuals are 10-200MB, 100-300 pages. Gemini processes faster than traditional OCR but still takes 10-60+ seconds for large manuals. Must use the async OCR job flow (POST /api/ocr/jobs -> poll GET /api/ocr/jobs/:jobId)
PDF delivery to Gemini: For manuals under 20MB, use inline base64. For manuals over 20MB, upload to GCS first and pass the gs:// URI
Extracted Data Per Item:
- Service/maintenance item name (e.g., "Oil Change", "Tire Rotation")
- Interval in miles/km (e.g., every 5,000 miles)
- Interval in months (e.g., every 6 months)
- Additional details/notes (e.g., "Use 0W-20 synthetic")
User Review Flow: After extraction completes, present all extracted maintenance items in a review screen. User selects which items to create as maintenance_schedules. User can edit any field before confirming.
Creates maintenance_schedules (recurring schedules with interval_months / interval_miles) -- NOT one-time records
Tier Gating: Already defined as Pro+ (document.scanMaintenanceSchedule in FEATURE_TIERS)
Vehicle association: The owners manual document must be associated with a vehicle so the created schedules link to the correct vehicle

Technical Notes

Uses the async OCR job endpoint since manuals are large files
Gemini replaces the entire OCR preprocessing + pattern matching pipeline for manuals -- no PaddleOCR, no spaCy NER, no layout analysis needed
Frontend needs a progress indicator while the async job runs, and a notification when complete
Must map extracted serviceName values to existing maintenance categories/subtypes (routine_maintenance with appropriate subtypes from the 27 available)

Shared Concerns

Mobile + Desktop

Both features MUST work on mobile and desktop per project requirements
Fuel receipt scanning: Mobile is the primary use case (phone camera capture)
Manual scanning: Desktop may be more common (uploading PDF files), but mobile must work too

Acceptance Criteria

Fuel receipt OCR scanning works on mobile (camera capture) and desktop (file upload)
Extracted receipt fields pre-fill fuel log form with editable values
Station name from receipt is matched via Google Places API and linked
Fuel receipt scan is gated to Pro+ tier
Owners manual scan checkbox appears on document upload for owners manuals
Gemini 2.5 Flash processes manual PDF and returns structured JSON maintenance schedule
Async job flow handles long-running Gemini extraction with progress feedback
Extracted maintenance items are presented for user review before creation
User can select/deselect and edit items before creating schedules
Created items are maintenance_schedules with correct intervals
Manual scan respects existing Pro+ tier gating
Both features respect the global 1,000 calls/month Google API limit
All fields are editable before saving (both features)
Works on both mobile and desktop viewports
Vertex AI API enabled and service account has roles/aiplatform.user role
GCP authentication works via WIF (production) and service account key (development)

## Summary Expand the OCR functionality with two new scanning capabilities that leverage the existing OCR pipeline and Google cloud services: 1. **Fuel Receipt OCR Scanning** - Take a photo of a fuel receipt during fuel log creation to auto-extract station, gallons/liters, and cost per unit 2. **Owners Manual Maintenance Schedule Extraction** - Scan an uploaded owners manual to automatically extract routine maintenance schedules Google Vision API 1,000 calls/month limit established in #127. --- ## Feature 1: Fuel Receipt OCR Scanning ### Description When adding a fuel log, the user can take a photo of their fuel receipt (mirroring the existing VIN OCR decode UX pattern). The OCR extracts fields from the receipt image and pre-fills the fuel log form with editable values. ### Requirements - **UX Pattern**: Mirror the existing VIN OCR decode screen (take photo -> OCR extract -> pre-fill editable fields -> user confirms/edits) - **Extracted Fields**: - Gas station name (matched via Google Places API lookup to link a real station object) - Gallons or liters (fuel quantity) - Cost per gallon/liter (unit price) - Total cost (if visible) - **Fuel type**: If the fuel type/grade cannot be detected from the receipt, that is acceptable - leave it for manual selection - **All pre-filled fields must be editable** before saving, just like the VIN decode screen - **Google Vision API** (`TEXT_DETECTION`) is the correct engine for single-image receipt scanning (scene text, not structured documents) - **Tier Gating**: Pro+ only (add `fuelLog.receiptScan` to `FEATURE_TIERS`) - **Monthly Limit**: Counts against the global 1,000 Google API calls/month cap ### Station Matching Flow 1. OCR extracts gas station name/brand from receipt text 2. Backend calls Google Places API with extracted name to find matching station 3. If match found, pre-select the station in the fuel log form 4. User can change/clear the station selection ### Technical Notes - Uses the synchronous OCR endpoint (`POST /api/ocr/extract`) since receipts are single images (1-3 seconds) - The OCR response already supports `documentType: 'receipt'` and `extractedFields` - Frontend needs a camera/photo capture component in the fuel log creation flow - Mobile-first: phone camera capture is the primary use case --- ## Feature 2: Owners Manual Maintenance Schedule Extraction ### Description When uploading an owners manual document, a checkbox option "Scan for Maintenance Schedule" triggers a Gemini AI scan of the entire manual to extract routine maintenance items and their intervals. Extracted items are presented for user review before creating maintenance schedules. ### Engine: Gemini 2.5 Flash on Vertex AI Gemini is the right choice over Document AI for this use case because: - **Semantic understanding**: Gemini comprehends what a maintenance schedule means, not just layout/text extraction - **Native PDF processing**: Sends the PDF directly to Gemini -- no OCR preprocessing pipeline needed - **Structured JSON output**: Native `responseMimeType: 'application/json'` with `responseSchema` enforcement - **1M token context window**: Handles entire owners manuals (up to ~1,500 pages of text) - **Cost effective**: ~$0.001-0.002 per page ($0.30 per 1M input tokens, $2.50 per 1M output tokens) ### Gemini Prompt ``` Extract all routine scheduled maintenance items from this vehicle owners manual. For each maintenance item, extract: - serviceName: The maintenance task name (e.g., "Engine Oil Change", "Tire Rotation", "Cabin Air Filter Replacement") - intervalMiles: The mileage interval as a number, or null if not specified (e.g., 5000, 30000) - intervalMonths: The time interval in months as a number, or null if not specified (e.g., 6, 12, 24) - details: Any additional details such as fluid specifications, part numbers, or special instructions (e.g., "Use 0W-20 full synthetic oil") Only include routine scheduled maintenance items with clear intervals. Do not include one-time procedures, troubleshooting steps, or warranty information. Return the results as a JSON object with a single "maintenanceSchedule" array. ``` ### Gemini Response Schema (enforced via `responseSchema`) ```json { "type": "object", "properties": { "maintenanceSchedule": { "type": "array", "items": { "type": "object", "properties": { "serviceName": { "type": "string" }, "intervalMiles": { "type": "number", "nullable": true }, "intervalMonths": { "type": "number", "nullable": true }, "details": { "type": "string", "nullable": true } }, "required": ["serviceName"] } } }, "required": ["maintenanceSchedule"] } ``` ### Example Gemini Response ```json { "maintenanceSchedule": [ { "serviceName": "Engine Oil Change", "intervalMiles": 5000, "intervalMonths": 6, "details": "Use 0W-20 full synthetic oil. Replace oil filter at every oil change." }, { "serviceName": "Tire Rotation", "intervalMiles": 5000, "intervalMonths": 6, "details": "Rotate front to rear on same side." }, { "serviceName": "Cabin Air Filter Replacement", "intervalMiles": 15000, "intervalMonths": 12, "details": null }, { "serviceName": "Brake Fluid Replacement", "intervalMiles": null, "intervalMonths": 36, "details": "Use DOT 3 brake fluid." }, { "serviceName": "Spark Plug Replacement", "intervalMiles": 60000, "intervalMonths": null, "details": "Iridium spark plugs. Torque to 18 ft-lbs." } ] } ``` ### GCP Setup Instructions #### 1. Enable the Vertex AI API ```bash # Via gcloud CLI gcloud services enable aiplatform.googleapis.com --project=YOUR_PROJECT_ID # Or via GCP Console: # APIs & Services > Enable APIs and Services > Search "Vertex AI API" > Enable ``` #### 2. Service Account Permissions The existing service account (used for Google Vision in #127) needs one additional IAM role: | Role | Role ID | Purpose | |------|---------|---------| | **Vertex AI User** | `roles/aiplatform.user` | Required for `aiplatform.endpoints.predict` permission | ```bash # Grant the Vertex AI User role to the existing service account gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ --member="serviceAccount:YOUR_SERVICE_ACCOUNT@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ --role="roles/aiplatform.user" ``` If using Workload Identity Federation (WIF) from #127, the same federated identity gets the additional role -- no new service account needed. #### 3. SDK Dependency ```bash # Add to ocr/requirements.txt (Python OCR service) google-cloud-aiplatform>=1.40.0 # OR for the Node.js backend proxy npm install @google-cloud/vertexai ``` #### 4. Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `VERTEX_AI_PROJECT` | (required) | GCP project ID | | `VERTEX_AI_LOCATION` | `us-central1` | GCP region for Vertex AI | | `GEMINI_MODEL` | `gemini-2.5-flash` | Gemini model ID | #### 5. Authentication Uses the same credential path as Google Vision (#127): - **Development**: `GOOGLE_APPLICATION_CREDENTIALS` env var pointing to service account key JSON - **Production**: Workload Identity Federation (WIF) via Auth0 -- already configured in #127 ### Requirements - **Checkbox**: "Scan for Maintenance Schedule" on the document upload form (when document type is owners manual) - **Long-running task**: Owners manuals are 10-200MB, 100-300 pages. Gemini processes faster than traditional OCR but still takes 10-60+ seconds for large manuals. Must use the async OCR job flow (`POST /api/ocr/jobs` -> poll `GET /api/ocr/jobs/:jobId`) - **PDF delivery to Gemini**: For manuals under 20MB, use inline base64. For manuals over 20MB, upload to GCS first and pass the `gs://` URI - **Extracted Data Per Item**: - Service/maintenance item name (e.g., "Oil Change", "Tire Rotation") - Interval in miles/km (e.g., every 5,000 miles) - Interval in months (e.g., every 6 months) - Additional details/notes (e.g., "Use 0W-20 synthetic") - **User Review Flow**: After extraction completes, present all extracted maintenance items in a review screen. User selects which items to create as `maintenance_schedules`. User can edit any field before confirming. - **Creates `maintenance_schedules`** (recurring schedules with `interval_months` / `interval_miles`) -- NOT one-time records - **Tier Gating**: Already defined as Pro+ (`document.scanMaintenanceSchedule` in `FEATURE_TIERS`) - **Vehicle association**: The owners manual document must be associated with a vehicle so the created schedules link to the correct vehicle ### Technical Notes - Uses the async OCR job endpoint since manuals are large files - Gemini replaces the entire OCR preprocessing + pattern matching pipeline for manuals -- no PaddleOCR, no spaCy NER, no layout analysis needed - Frontend needs a progress indicator while the async job runs, and a notification when complete - Must map extracted `serviceName` values to existing maintenance categories/subtypes (`routine_maintenance` with appropriate subtypes from the 27 available) --- ## Shared Concerns ### Mobile + Desktop - Both features MUST work on mobile and desktop per project requirements - Fuel receipt scanning: Mobile is the primary use case (phone camera capture) - Manual scanning: Desktop may be more common (uploading PDF files), but mobile must work too ### Acceptance Criteria - [ ] Fuel receipt OCR scanning works on mobile (camera capture) and desktop (file upload) - [ ] Extracted receipt fields pre-fill fuel log form with editable values - [ ] Station name from receipt is matched via Google Places API and linked - [ ] Fuel receipt scan is gated to Pro+ tier - [ ] Owners manual scan checkbox appears on document upload for owners manuals - [ ] Gemini 2.5 Flash processes manual PDF and returns structured JSON maintenance schedule - [ ] Async job flow handles long-running Gemini extraction with progress feedback - [ ] Extracted maintenance items are presented for user review before creation - [ ] User can select/deselect and edit items before creating schedules - [ ] Created items are `maintenance_schedules` with correct intervals - [ ] Manual scan respects existing Pro+ tier gating - [ ] Both features respect the global 1,000 calls/month Google API limit - [ ] All fields are editable before saving (both features) - [ ] Works on both mobile and desktop viewports - [ ] Vertex AI API enabled and service account has `roles/aiplatform.user` role - [ ] GCP authentication works via WIF (production) and service account key (development)

egullickson added the

labels 2026-02-11 02:24:34 +00:00

egullickson added

and removed

labels 2026-02-11 02:45:50 +00:00

egullickson commented

2026-02-11 03:04:24 +00:00

Plan: Expand OCR with Fuel Receipt Scanning and Owners Manual Maintenance Extraction

Phase: Planning | Agent: Planner | Status: APPROVED (revised per review cycle)

Overview

Expand OCR functionality with two new capabilities: (1) Fuel receipt scanning that auto-extracts fields and pre-fills the fuel log form, and (2) Owners manual maintenance schedule extraction via Gemini 2.5 Flash that creates recurring maintenance schedules. Manual extraction uses Gemini 2.5 Flash for PDF processing and dedicated frontend components for schedule review and creation.

Planning Context

Decision Log

Decision	Reasoning Chain
Gemini as standalone Python module, not extending OcrEngine ABC	OcrEngine.recognize() accepts image bytes and returns text+confidence -> GeminiEngine.extract_maintenance() accepts PDF bytes and returns structured JSON -> different interface -> standalone module because interface signatures differ
Gemini in Python OCR service (not Node.js backend)	Python service has async job queue with progress callbacks -> has PDF handling infrastructure -> has WIF authentication -> keeps all AI/OCR processing in one service -> avoids duplicating auth and job patterns
Google Places API for station matching (separate budget)	Vision API 1000/month limit does NOT apply to Places API -> separate budget allows station matching without resource competition -> issue requires station matching -> use google-maps.client.ts in stations feature
Station matching via separate frontend call	Frontend calls OCR, receives extractedFields, then calls POST /api/stations/match with merchantName -> two sequential calls -> better separation of concerns -> OCR service stays focused on extraction, backend handles station enrichment
Station matching in backend, not frontend	Google Places API key stays server-side -> OCR service stays focused on text extraction -> backend has google-maps.client.ts -> keep API keys server-side
No monthly limit on Gemini calls	Gemini is pay-per-use on Vertex AI -> no artificial cap needed -> Vision API limit (1000/month) applies only to VIN + receipt OCR via Google Vision -> counter stays as ocr:vision_requests
Receipt OCR falls back to PaddleOCR when Vision limit reached	HybridEngine implements this fallback pattern -> receipts degrade gracefully to local OCR (lower accuracy but functional)
20MB raw bytes PDF limit at launch for Gemini	Gemini inline base64 supports up to 20MB -> Vertex AI SDK handles base64 encoding internally -> validate raw bytes only -> GCS upload path for larger files adds significant complexity -> most manuals under 20MB -> clear error message for oversized files -> GCS as documented future enhancement
Backend proxy creates dedicated /api/ocr/extract/receipt endpoint	Dedicated proxy allows receipt-specific middleware (tier gating, rate limiting, request logging) -> Python /extract/receipt has specialized receipt preprocessing, fuel pattern matching, and cross-validation -> generic /extract only auto-detects document type
useReceiptOcr calls /ocr/extract/receipt	/ocr/extract/receipt provides receipt-specific preprocessing and fuel field extraction -> the generic /ocr/extract endpoint lacks receipt-specialized patterns
30s timeout for receipt OCR API call	Receipt images are single photos (1-3s typical processing) -> 30s accommodates slow first-call model loading and cloud fallback -> matches useReceiptOcr timeout
New requireTier() middleware (M0)	Backend tier enforcement prevents direct API bypass -> frontend-only tier check can be circumvented via curl -> both receipt and manual endpoints use the same middleware -> clean separation as reusable preHandler
PDF magic byte validation for manual uploads	Content-type header can be spoofed -> first 4 bytes %PDF check prevents processing renamed non-PDF files -> minimal overhead, defense in depth
3-step progress for Gemini extraction	Gemini makes single blocking API call (10-60s) -> no sub-progress possible -> honest 4-point updates (10%, 50%, 95%, 100%) rather than simulated progress bar
410 Gone for expired job polling	HTTP 410 semantically correct for TTL-expired Redis jobs -> distinguishes from 404 "never existed" -> frontend shows clear "Job expired, please resubmit" message
Gemini response schema uses camelCase	Matches backend API convention (camelCase in TypeScript) -> Python manual_extractor.py preserves camelCase from Gemini for API response -> avoids extra case conversion layer

Rejected Alternatives

Alternative	Why Rejected
Extend OcrEngine ABC for Gemini	OcrEngine takes image_bytes, returns text+confidence. Gemini takes PDF, returns structured JSON. Forcing Gemini into this interface would require awkward adaptation layer with no benefit.
Skip Google Places for station matching	Issue requires station matching. Places API has separate budget from OCR. Skipping would miss a key requirement.
Gemini in Node.js backend	Would duplicate async job queue, PDF handling, and WIF authentication in Python service. Backend is a proxy layer, not a processing layer.
Unified counter for Vision + Gemini	Gemini is pay-per-use with no artificial cap. Only Vision API has 1000/month limit. Unifying would unnecessarily restrict manual scanning.
GCS upload for all PDFs	Adds bucket provisioning, IAM, upload flow for a minority of cases. Most manuals under 20MB. Defer to future enhancement.
Frontend calls stations API for matching	Would expose Places API key to frontend. Backend has google-maps.client.ts. Keep API keys server-side.
Backend merges OCR + Places into one response	Couples OCR and station logic in backend proxy. Separate frontend call is cleaner separation of concerns.
Frontend-only tier gating	Bypassable via direct API call. Backend enforcement required for security.
Simulated progress bar during Gemini call	Artificial progress is dishonest. 3-step updates are simple and accurate.
15MB safety margin on PDF size	Vertex AI SDK handles encoding internally. 20MB raw bytes matches API docs. Conservative limit rejects valid files unnecessarily.

Constraints and Assumptions

Technical: 1000/month Google Vision API limit (VIN + receipts only), 20MB Gemini raw bytes limit, WIF authentication via Auth0 M2M
Technical: Vertex AI API must be enabled in GCP project, service account needs roles/aiplatform.user
Technical: Python OCR service at mvp-ocr:8000, backend proxy at /api/ocr/*
Architecture: Feature capsule pattern for backend, React Hook Form + Zod for frontend forms
Frontend: Mobile + desktop required (320px, 768px, 1920px viewports), touch targets >= 44px
Dependencies: google-cloud-aiplatform Python SDK, Gemini 2.5 Flash on Vertex AI (us-central1)
Workflow: Issue #129 requires sub-issue decomposition (9 milestones, multiple 3+ file milestones). Each milestone maps 1:1 to a sub-issue per workflow-contract.json. ONE branch issue-129-expand-ocr, ONE PR closing parent #129 and all sub-issues.
Architectural note: If OcrClient/OcrService/OcrController exceed 10 methods, consider splitting into specialized classes (VinOcr, ReceiptOcr, ManualOcr).

Known Risks

Risk	Mitigation	Anchor
Receipt OCR accuracy varies by receipt format	receipt_extractor has cross-validation (total = qty * price within 10% tolerance) and confidence scoring. User can edit all fields before saving.	ocr/app/extractors/fuel_receipt.py:L108-L123
Gemini structured output may not perfectly match maintenance categories	Map serviceName to 27 subtypes via fuzzy matching. User reviews and edits all items before creating schedules.	Issue #129 specifies user review flow
PDFs over 20MB rejected at launch	Clear error message with file size limit. GCS upload path documented as future enhancement.	N/A
WIF authentication may not work with Vertex AI SDK	google-cloud-aiplatform uses ADC which supports external_account (WIF) type. Same credential path as Vision API. GeminiEngine._get_client() wraps initialization in try/except with diagnostic error.	ocr/app/engines/cloud_engine.py (WIF setup)
Redis job data TTL (2h) may be insufficient for very large manuals	MANUAL_JOB_TTL is 7200s (2 hours). Gemini processes manuals in 10-60s, well within limits. Expired jobs return 410 Gone.	ocr/app/services/job_queue.py:L22

Invisible Knowledge

Architecture

FUEL RECEIPT OCR FLOW:
  Mobile Camera / File Upload
      |
      v
  Frontend (useReceiptOcr) --POST /api/ocr/extract/receipt--> Backend Proxy
      |                                                            |
      v                                                            v
  ReceiptOcrReviewModal                              OcrClient.extractReceipt()
      |                                                            |
      v                                                            v
  Frontend calls POST /api/stations/match        Python /extract/receipt
  with extractedFields.merchantName                    |
      |                                                 v
      v                                          ReceiptExtractor.extract()
  Pre-fill locationData                                |
  with matched station                                 v
      |                                          HybridEngine (Vision/PaddleOCR)
      v                                                |
  Accept -> FuelLogForm.setValue()                     v
                                           Pattern matching (fuel, date, currency)
                                                      |
                                                      v
                                           ReceiptExtractionResponse

MANUAL EXTRACTION FLOW:
  DocumentForm (upload PDF + check "Scan for Maintenance Schedule")
      |
      v
  Frontend (useManualExtraction) --POST /api/ocr/extract/manual--> Backend Proxy
      |                                                                |
      v                                                                v
  Poll GET /api/ocr/jobs/:jobId                          OcrClient.submitManualJob()
  (progress: 10% -> 50% -> 95% -> 100%)                             |
      |                                                               v
      v                                                    Python /extract/manual
  Job completed (or 410 Gone if expired)                       |
      |                                                         v
      v                                                  GeminiEngine.extract_maintenance()
  MaintenanceScheduleReviewScreen                              |
  (select/edit/deselect items)                                  v
      |                                                  Vertex AI Gemini 2.5 Flash
      v                                                  (native PDF, structured JSON)
  POST /api/maintenance/schedules                              |
  (batch create selected)                                      v
                                                     ManualExtractionResponse
                                                     (maintenanceSchedules[])

Data Flow

RECEIPT: Photo -> Backend /extract/receipt -> Python receipt_extractor -> Vision/PaddleOCR
         -> pattern matching -> extractedFields -> Frontend review modal
         -> Frontend calls /stations/match -> Places API station match
         -> User edits -> Form population -> Create fuel log

MANUAL:  PDF -> Backend /extract/manual -> Python job queue -> Gemini 2.5 Flash
         -> structured JSON -> maintenanceSchedules[] -> Frontend poll -> Review screen
         -> User select/edit -> Batch create maintenance_schedules

Why This Structure

GeminiEngine is a standalone module: OcrEngine handles image-to-text extraction; GeminiEngine handles PDF-to-structured-data extraction. Different input types, output formats, and error modes warrant separate abstractions.
Station matching in backend: Google Places API key stays server-side. OCR service stays focused on text extraction. Backend has google-maps.client.ts.
Async job pattern for manuals: Manuals are 10-200MB, Gemini takes 10-60+ seconds. Async pattern with progress polling provides good UX without blocking.

Invariants

All receipt OCR extracted fields MUST be editable before saving to fuel log
All manual extraction items MUST be reviewed by user before creating schedules
Monthly Vision API counter only counts google_vision engine calls, never Gemini
Gemini module uses same WIF credential path as Vision API
Frontend works on both mobile (camera capture) and desktop (file upload)
Both receipt and manual endpoints enforce tier gating at backend level (requireTier middleware)

Tradeoffs

20MB PDF limit: Sacrifices support for very large manuals (>20MB) to avoid GCS bucket complexity. Most manuals are under 20MB.
No Gemini monthly cap: Vertex AI is pay-per-use. Cost is ~$0.001-0.002 per page. A 300-page manual costs ~$0.30-0.60. Acceptable for Pro+ tier.
Station matching adds latency: Google Places lookup adds ~200-500ms to receipt processing. Acceptable for the value of auto-linking a real station.
3-step progress: No sub-progress during Gemini API call. Honest about the blocking wait vs. simulated progress bar.

Sub-Issues

Sub-Issue	Milestone	Title
#138	M0	feat: Tier guard middleware (#129)
#139	M1	feat: Backend OCR receipt proxy endpoint (#129)
#140	M2	feat: Tier gating and frontend receipt OCR flow (#129)
#141	M3	feat: Station matching from receipt (#129)
#142	M4	feat: Gemini engine module and configuration (#129)
#143	M5	feat: Manual extractor Gemini rewrite (#129)
#144	M6	feat: Backend OCR manual proxy endpoint (#129)
#145	M7	feat: Frontend manual extraction flow (#129)
#146	M8	docs: Documentation for OCR expansion (#129)

Milestone Dependencies

M0 (Tier Guard) ---> M1 (Backend Receipt) ---> M2 (Tier Gating + Frontend Receipt)
                                           \
                                            --> M3 (Station Matching)

M4 (Gemini Engine) ---> M5 (Manual Extractor)
                                |
                                v
              M0 (Tier Guard) ---> M6 (Backend Manual Proxy) ---> M7 (Frontend Manual Flow)

M0..M7 ---> M8 (Documentation)

Feature 1 (#139-#141) and Feature 2 (#142-#145) can proceed in parallel after M0 completes.

Verdict: APPROVED | Next: Create branch issue-129-expand-ocr, begin execution at M0

## Plan: Expand OCR with Fuel Receipt Scanning and Owners Manual Maintenance Extraction **Phase**: Planning | **Agent**: Planner | **Status**: APPROVED (revised per review cycle) ## Overview Expand OCR functionality with two new capabilities: (1) Fuel receipt scanning that auto-extracts fields and pre-fills the fuel log form, and (2) Owners manual maintenance schedule extraction via Gemini 2.5 Flash that creates recurring maintenance schedules. Manual extraction uses Gemini 2.5 Flash for PDF processing and dedicated frontend components for schedule review and creation. ## Planning Context ### Decision Log | Decision | Reasoning Chain | | --- | --- | | Gemini as standalone Python module, not extending OcrEngine ABC | OcrEngine.recognize() accepts image bytes and returns text+confidence -> GeminiEngine.extract_maintenance() accepts PDF bytes and returns structured JSON -> different interface -> standalone module because interface signatures differ | | Gemini in Python OCR service (not Node.js backend) | Python service has async job queue with progress callbacks -> has PDF handling infrastructure -> has WIF authentication -> keeps all AI/OCR processing in one service -> avoids duplicating auth and job patterns | | Google Places API for station matching (separate budget) | Vision API 1000/month limit does NOT apply to Places API -> separate budget allows station matching without resource competition -> issue requires station matching -> use google-maps.client.ts in stations feature | | Station matching via separate frontend call | Frontend calls OCR, receives extractedFields, then calls POST /api/stations/match with merchantName -> two sequential calls -> better separation of concerns -> OCR service stays focused on extraction, backend handles station enrichment | | Station matching in backend, not frontend | Google Places API key stays server-side -> OCR service stays focused on text extraction -> backend has google-maps.client.ts -> keep API keys server-side | | No monthly limit on Gemini calls | Gemini is pay-per-use on Vertex AI -> no artificial cap needed -> Vision API limit (1000/month) applies only to VIN + receipt OCR via Google Vision -> counter stays as ocr:vision_requests | | Receipt OCR falls back to PaddleOCR when Vision limit reached | HybridEngine implements this fallback pattern -> receipts degrade gracefully to local OCR (lower accuracy but functional) | | 20MB raw bytes PDF limit at launch for Gemini | Gemini inline base64 supports up to 20MB -> Vertex AI SDK handles base64 encoding internally -> validate raw bytes only -> GCS upload path for larger files adds significant complexity -> most manuals under 20MB -> clear error message for oversized files -> GCS as documented future enhancement | | Backend proxy creates dedicated /api/ocr/extract/receipt endpoint | Dedicated proxy allows receipt-specific middleware (tier gating, rate limiting, request logging) -> Python /extract/receipt has specialized receipt preprocessing, fuel pattern matching, and cross-validation -> generic /extract only auto-detects document type | | useReceiptOcr calls /ocr/extract/receipt | /ocr/extract/receipt provides receipt-specific preprocessing and fuel field extraction -> the generic /ocr/extract endpoint lacks receipt-specialized patterns | | 30s timeout for receipt OCR API call | Receipt images are single photos (1-3s typical processing) -> 30s accommodates slow first-call model loading and cloud fallback -> matches useReceiptOcr timeout | | New requireTier() middleware (M0) | Backend tier enforcement prevents direct API bypass -> frontend-only tier check can be circumvented via curl -> both receipt and manual endpoints use the same middleware -> clean separation as reusable preHandler | | PDF magic byte validation for manual uploads | Content-type header can be spoofed -> first 4 bytes %PDF check prevents processing renamed non-PDF files -> minimal overhead, defense in depth | | 3-step progress for Gemini extraction | Gemini makes single blocking API call (10-60s) -> no sub-progress possible -> honest 4-point updates (10%, 50%, 95%, 100%) rather than simulated progress bar | | 410 Gone for expired job polling | HTTP 410 semantically correct for TTL-expired Redis jobs -> distinguishes from 404 "never existed" -> frontend shows clear "Job expired, please resubmit" message | | Gemini response schema uses camelCase | Matches backend API convention (camelCase in TypeScript) -> Python manual_extractor.py preserves camelCase from Gemini for API response -> avoids extra case conversion layer | ### Rejected Alternatives | Alternative | Why Rejected | | --- | --- | | Extend OcrEngine ABC for Gemini | OcrEngine takes image_bytes, returns text+confidence. Gemini takes PDF, returns structured JSON. Forcing Gemini into this interface would require awkward adaptation layer with no benefit. | | Skip Google Places for station matching | Issue requires station matching. Places API has separate budget from OCR. Skipping would miss a key requirement. | | Gemini in Node.js backend | Would duplicate async job queue, PDF handling, and WIF authentication in Python service. Backend is a proxy layer, not a processing layer. | | Unified counter for Vision + Gemini | Gemini is pay-per-use with no artificial cap. Only Vision API has 1000/month limit. Unifying would unnecessarily restrict manual scanning. | | GCS upload for all PDFs | Adds bucket provisioning, IAM, upload flow for a minority of cases. Most manuals under 20MB. Defer to future enhancement. | | Frontend calls stations API for matching | Would expose Places API key to frontend. Backend has google-maps.client.ts. Keep API keys server-side. | | Backend merges OCR + Places into one response | Couples OCR and station logic in backend proxy. Separate frontend call is cleaner separation of concerns. | | Frontend-only tier gating | Bypassable via direct API call. Backend enforcement required for security. | | Simulated progress bar during Gemini call | Artificial progress is dishonest. 3-step updates are simple and accurate. | | 15MB safety margin on PDF size | Vertex AI SDK handles encoding internally. 20MB raw bytes matches API docs. Conservative limit rejects valid files unnecessarily. | ### Constraints and Assumptions - **Technical**: 1000/month Google Vision API limit (VIN + receipts only), 20MB Gemini raw bytes limit, WIF authentication via Auth0 M2M - **Technical**: Vertex AI API must be enabled in GCP project, service account needs `roles/aiplatform.user` - **Technical**: Python OCR service at mvp-ocr:8000, backend proxy at /api/ocr/* - **Architecture**: Feature capsule pattern for backend, React Hook Form + Zod for frontend forms - **Frontend**: Mobile + desktop required (320px, 768px, 1920px viewports), touch targets >= 44px - **Dependencies**: google-cloud-aiplatform Python SDK, Gemini 2.5 Flash on Vertex AI (us-central1) - **Workflow**: Issue #129 requires sub-issue decomposition (9 milestones, multiple 3+ file milestones). Each milestone maps 1:1 to a sub-issue per workflow-contract.json. ONE branch issue-129-expand-ocr, ONE PR closing parent #129 and all sub-issues. - **Architectural note**: If OcrClient/OcrService/OcrController exceed 10 methods, consider splitting into specialized classes (VinOcr, ReceiptOcr, ManualOcr). ### Known Risks | Risk | Mitigation | Anchor | | --- | --- | --- | | Receipt OCR accuracy varies by receipt format | receipt_extractor has cross-validation (total = qty * price within 10% tolerance) and confidence scoring. User can edit all fields before saving. | ocr/app/extractors/fuel_receipt.py:L108-L123 | | Gemini structured output may not perfectly match maintenance categories | Map serviceName to 27 subtypes via fuzzy matching. User reviews and edits all items before creating schedules. | Issue #129 specifies user review flow | | PDFs over 20MB rejected at launch | Clear error message with file size limit. GCS upload path documented as future enhancement. | N/A | | WIF authentication may not work with Vertex AI SDK | google-cloud-aiplatform uses ADC which supports external_account (WIF) type. Same credential path as Vision API. GeminiEngine._get_client() wraps initialization in try/except with diagnostic error. | ocr/app/engines/cloud_engine.py (WIF setup) | | Redis job data TTL (2h) may be insufficient for very large manuals | MANUAL_JOB_TTL is 7200s (2 hours). Gemini processes manuals in 10-60s, well within limits. Expired jobs return 410 Gone. | ocr/app/services/job_queue.py:L22 | ## Invisible Knowledge ### Architecture ``` FUEL RECEIPT OCR FLOW: Mobile Camera / File Upload | v Frontend (useReceiptOcr) --POST /api/ocr/extract/receipt--> Backend Proxy | | v v ReceiptOcrReviewModal OcrClient.extractReceipt() | | v v Frontend calls POST /api/stations/match Python /extract/receipt with extractedFields.merchantName | | v v ReceiptExtractor.extract() Pre-fill locationData | with matched station v | HybridEngine (Vision/PaddleOCR) v | Accept -> FuelLogForm.setValue() v Pattern matching (fuel, date, currency) | v ReceiptExtractionResponse MANUAL EXTRACTION FLOW: DocumentForm (upload PDF + check "Scan for Maintenance Schedule") | v Frontend (useManualExtraction) --POST /api/ocr/extract/manual--> Backend Proxy | | v v Poll GET /api/ocr/jobs/:jobId OcrClient.submitManualJob() (progress: 10% -> 50% -> 95% -> 100%) | | v v Python /extract/manual Job completed (or 410 Gone if expired) | | v v GeminiEngine.extract_maintenance() MaintenanceScheduleReviewScreen | (select/edit/deselect items) v | Vertex AI Gemini 2.5 Flash v (native PDF, structured JSON) POST /api/maintenance/schedules | (batch create selected) v ManualExtractionResponse (maintenanceSchedules[]) ``` ### Data Flow ``` RECEIPT: Photo -> Backend /extract/receipt -> Python receipt_extractor -> Vision/PaddleOCR -> pattern matching -> extractedFields -> Frontend review modal -> Frontend calls /stations/match -> Places API station match -> User edits -> Form population -> Create fuel log MANUAL: PDF -> Backend /extract/manual -> Python job queue -> Gemini 2.5 Flash -> structured JSON -> maintenanceSchedules[] -> Frontend poll -> Review screen -> User select/edit -> Batch create maintenance_schedules ``` ### Why This Structure - **GeminiEngine is a standalone module**: OcrEngine handles image-to-text extraction; GeminiEngine handles PDF-to-structured-data extraction. Different input types, output formats, and error modes warrant separate abstractions. - **Station matching in backend**: Google Places API key stays server-side. OCR service stays focused on text extraction. Backend has google-maps.client.ts. - **Async job pattern for manuals**: Manuals are 10-200MB, Gemini takes 10-60+ seconds. Async pattern with progress polling provides good UX without blocking. ### Invariants - All receipt OCR extracted fields MUST be editable before saving to fuel log - All manual extraction items MUST be reviewed by user before creating schedules - Monthly Vision API counter only counts google_vision engine calls, never Gemini - Gemini module uses same WIF credential path as Vision API - Frontend works on both mobile (camera capture) and desktop (file upload) - Both receipt and manual endpoints enforce tier gating at backend level (requireTier middleware) ### Tradeoffs - **20MB PDF limit**: Sacrifices support for very large manuals (>20MB) to avoid GCS bucket complexity. Most manuals are under 20MB. - **No Gemini monthly cap**: Vertex AI is pay-per-use. Cost is ~$0.001-0.002 per page. A 300-page manual costs ~$0.30-0.60. Acceptable for Pro+ tier. - **Station matching adds latency**: Google Places lookup adds ~200-500ms to receipt processing. Acceptable for the value of auto-linking a real station. - **3-step progress**: No sub-progress during Gemini API call. Honest about the blocking wait vs. simulated progress bar. ## Sub-Issues | Sub-Issue | Milestone | Title | | --- | --- | --- | | #138 | M0 | feat: Tier guard middleware (#129) | | #139 | M1 | feat: Backend OCR receipt proxy endpoint (#129) | | #140 | M2 | feat: Tier gating and frontend receipt OCR flow (#129) | | #141 | M3 | feat: Station matching from receipt (#129) | | #142 | M4 | feat: Gemini engine module and configuration (#129) | | #143 | M5 | feat: Manual extractor Gemini rewrite (#129) | | #144 | M6 | feat: Backend OCR manual proxy endpoint (#129) | | #145 | M7 | feat: Frontend manual extraction flow (#129) | | #146 | M8 | docs: Documentation for OCR expansion (#129) | ## Milestone Dependencies ``` M0 (Tier Guard) ---> M1 (Backend Receipt) ---> M2 (Tier Gating + Frontend Receipt) \ --> M3 (Station Matching) M4 (Gemini Engine) ---> M5 (Manual Extractor) | v M0 (Tier Guard) ---> M6 (Backend Manual Proxy) ---> M7 (Frontend Manual Flow) M0..M7 ---> M8 (Documentation) ``` Feature 1 (#139-#141) and Feature 2 (#142-#145) can proceed in parallel after M0 completes. *Verdict*: APPROVED | *Next*: Create branch issue-129-expand-ocr, begin execution at M0

Review	Verdict	Findings
QR plan-completeness	PASS_WITH_CONCERNS	1 CRITICAL (missing sub-issue constraint), 7 SHOULD_FIX
TW plan-scrub	NEEDS_CHANGES	14 temporal contamination instances
QR plan-code	NEEDS_CHANGES	2 CRITICAL (file size validation, PDF content validation), 8 HIGH, 2 SHOULD_FIX
QR plan-docs	NEEDS_CHANGES	2 CRITICAL (non-existent files, missing backend entries), 5 HIGH, 3 SHOULD_FIX

#	Decision	Rationale	Review Source
1	Station matching via separate frontend call (POST /api/stations/match)	Frontend calls OCR, receives extractedFields, then calls station matching endpoint. Two sequential calls. Better separation of concerns, simpler backend.	QR completeness
2	Create 3 NEW CLAUDE.md files (ocr/app/engines/, frontend maintenance/, frontend documents/)	More granular LLM navigation, follows pattern of other feature directories.	QR docs
3	New milestone M0: Tier Guard Middleware (before M1)	requireTier() preHandler doesn't exist yet. Clean separation. M1 and M6 reuse it. Both receipt and manual endpoints get backend enforcement.	QR code
4	20MB raw bytes PDF size limit for Gemini	Vertex AI SDK handles base64 encoding internally. Validate raw bytes only. Matches Gemini API docs.	QR code
5	PDF magic byte validation (%PDF header check) in M6	Validate first 4 bytes before forwarding to Python. Prevents malformed file processing.	QR code
6	3-step progress for Gemini extraction	10% Preparing -> 50% Processing with Gemini -> 95% Mapping results -> 100% Complete. Honest about the blocking wait.	QR code
7	410 Gone for expired job polling	HTTP 410 semantically correct for TTL-expired Redis jobs. Frontend shows "Job expired, please resubmit."	QR code

feat: Expand OCR with fuel receipt scanning and owners manual maintenance extraction #129

Summary

Feature 1: Fuel Receipt OCR Scanning

Description

Requirements

Station Matching Flow

Technical Notes

Feature 2: Owners Manual Maintenance Schedule Extraction

Description

Engine: Gemini 2.5 Flash on Vertex AI

Gemini Prompt

Gemini Response Schema (enforced via responseSchema)

Example Gemini Response

GCP Setup Instructions

1. Enable the Vertex AI API

2. Service Account Permissions

3. SDK Dependency

4. Environment Variables

5. Authentication

Requirements

Technical Notes

Shared Concerns

Mobile + Desktop

Acceptance Criteria

Plan: Expand OCR with Fuel Receipt Scanning and Owners Manual Maintenance Extraction

Overview

Planning Context

Decision Log

Rejected Alternatives

Constraints and Assumptions

Known Risks

Invisible Knowledge

Architecture

Data Flow

Why This Structure

Invariants

Tradeoffs

Sub-Issues

Milestone Dependencies

QR Review: Plan Completeness (#129)

VERDICT: PASS_WITH_CONCERNS

Findings

[DECISION_LOG] [SHOULD_FIX]: Missing decision about useReceiptOcr endpoint call location

[DECISION_LOG] [SHOULD_FIX]: "Backend receipt endpoint" decision is misleading

[CONSTRAINTS] [CRITICAL]: Missing sub-issue creation requirement

[MILESTONES] [SHOULD_FIX]: M1 test backing should be "integration" not "default-derived"

[MILESTONES] [SHOULD_FIX]: M2 and M7 missing viewport test specifications

[MILESTONES] [NEEDS_CLARIFICATION]: M4 Gemini WIF authentication config incomplete

[MILESTONES] [SHOULD_FIX]: M5 "Remove unused imports" is implementation detail

[MILESTONES] [NEEDS_CLARIFICATION]: M6 tier guard implementation unclear

[INVISIBLE_KNOWLEDGE] [SHOULD_FIX]: Station matching flow ambiguous

Considered But Not Flagged

TW Review: Plan Scrub (#129)

VERDICT: NEEDS_CHANGES

Findings

[TEMPORAL] Overview: Progress estimate and missing-component language

[TEMPORAL] Overview: "Rewrite" language for manual extraction

[FORBIDDEN] Decision Log (Gemini standalone): Editorial language

[TEMPORAL] Decision Log (useReceiptOcr endpoint): Change-relative description

[TEMPORAL] Decision Log (HybridEngine fallback): Baseline reference

[FORBIDDEN] Decision Log (Gemini interface): Intensifier language

[TEMPORAL] Milestone 2 Requirements: Change-relative endpoint instruction

[TEMPORAL] Milestone 5 Requirements: "Rewrite" directive

[TEMPORAL] Milestone 5 Requirements: "Remove" directive with temporal anchor

[TEMPORAL] Milestone 5 Acceptance Criteria: "No longer called" baseline reference

[TEMPORAL] Milestone 7 Requirements: "Remove" change action

[TEMPORAL] Milestone 5 Requirements: Vague conditional directive

[TEMPORAL] Milestone 5 Acceptance Criteria: "Existing" and "new" comparison

[TEMPORAL] Milestone 6 Requirements: "Reuse existing" and "already returns"

Considered But Not Flagged

QR Review: Plan Code (#129)

VERDICT: NEEDS_CHANGES

Findings

[RULE 0] [CRITICAL]: Missing file size validation server-side in M4 Gemini engine

[RULE 0] [CRITICAL]: M6 backend manual proxy endpoint lacks PDF content validation

[RULE 0] [HIGH]: Missing error handling for Gemini WIF authentication failures in M4

[RULE 0] [HIGH]: M3 station matching has no timeout specified

[RULE 0] [HIGH]: M1 receipt proxy endpoint missing error code translation

[RULE 1] [HIGH]: M2 tier gating is frontend-only -- missing backend enforcement

[RULE 1] [HIGH]: M6 tier guard pattern underspecified

Gemini Response Schema (enforced via `responseSchema`)