feat: Gemini engine module and configuration (#129) #133

Closed
opened 2026-02-11 03:04:50 +00:00 by egullickson · 1 comment
Owner

Relates to #129

Milestone 4: Gemini Engine Module and Configuration

Create standalone Gemini 2.5 Flash module in the Python OCR service for maintenance schedule extraction.

Files

  • ocr/app/engines/gemini_engine.py (NEW)
  • ocr/app/config.py
  • ocr/requirements.txt
  • docker-compose.yml
  • docker-compose.staging.yml
  • docker-compose.prod.yml

Requirements

  • Create GeminiEngine class (standalone, NOT extending OcrEngine) with extract_maintenance() method
  • Uses Vertex AI SDK: GenerativeModel("gemini-2.5-flash") with generate_content()
  • Accept PDF bytes (base64 inline for <20MB), reject >20MB with clear error
  • Use response_mime_type="application/json" and response_schema for guaranteed JSON structure
  • Use prompt from issue #129 for maintenance schedule extraction
  • Authentication via same WIF credential path (GOOGLE_APPLICATION_CREDENTIALS)
  • Add config settings: VERTEX_AI_PROJECT, VERTEX_AI_LOCATION, GEMINI_MODEL
  • Add google-cloud-aiplatform>=1.40.0 to requirements.txt
  • Add environment variables to docker-compose files

Acceptance Criteria

  • GeminiEngine.extract_maintenance(pdf_bytes) returns structured JSON with maintenanceSchedule array
  • Each schedule item has serviceName (required), intervalMiles (nullable), intervalMonths (nullable), details (nullable)
  • PDFs > 20MB rejected with clear error message
  • Authentication works via WIF in container environment
  • Configuration reads from environment variables with sensible defaults

Tests

  • Test files: ocr/tests/test_gemini_engine.py (NEW)
  • Test type: unit (mock Vertex AI SDK)
  • Scenarios:
    • Normal: Valid PDF returns structured maintenance schedules
    • Edge: PDF with no maintenance content returns empty array
    • Error: PDF > 20MB rejected with size error
    • Error: Vertex AI authentication failure handled gracefully
Relates to #129 ## Milestone 4: Gemini Engine Module and Configuration Create standalone Gemini 2.5 Flash module in the Python OCR service for maintenance schedule extraction. ### Files - `ocr/app/engines/gemini_engine.py` (NEW) - `ocr/app/config.py` - `ocr/requirements.txt` - `docker-compose.yml` - `docker-compose.staging.yml` - `docker-compose.prod.yml` ### Requirements - Create `GeminiEngine` class (standalone, NOT extending OcrEngine) with `extract_maintenance()` method - Uses Vertex AI SDK: `GenerativeModel("gemini-2.5-flash")` with `generate_content()` - Accept PDF bytes (base64 inline for <20MB), reject >20MB with clear error - Use `response_mime_type="application/json"` and `response_schema` for guaranteed JSON structure - Use prompt from issue #129 for maintenance schedule extraction - Authentication via same WIF credential path (`GOOGLE_APPLICATION_CREDENTIALS`) - Add config settings: `VERTEX_AI_PROJECT`, `VERTEX_AI_LOCATION`, `GEMINI_MODEL` - Add `google-cloud-aiplatform>=1.40.0` to requirements.txt - Add environment variables to docker-compose files ### Acceptance Criteria - [ ] GeminiEngine.extract_maintenance(pdf_bytes) returns structured JSON with maintenanceSchedule array - [ ] Each schedule item has serviceName (required), intervalMiles (nullable), intervalMonths (nullable), details (nullable) - [ ] PDFs > 20MB rejected with clear error message - [ ] Authentication works via WIF in container environment - [ ] Configuration reads from environment variables with sensible defaults ### Tests - **Test files**: `ocr/tests/test_gemini_engine.py` (NEW) - **Test type**: unit (mock Vertex AI SDK) - **Scenarios**: - Normal: Valid PDF returns structured maintenance schedules - Edge: PDF with no maintenance content returns empty array - Error: PDF > 20MB rejected with size error - Error: Vertex AI authentication failure handled gracefully
egullickson added the
status
backlog
type
feature
labels 2026-02-11 03:12:39 +00:00
egullickson added
status
in-progress
and removed
status
backlog
labels 2026-02-11 15:53:28 +00:00
Author
Owner

Milestone: Gemini Engine Module and Configuration

Phase: Execution | Agent: Feature Agent | Status: PASS

Changes Made

New Files:

  • ocr/app/engines/gemini_engine.py - Standalone GeminiEngine class with extract_maintenance(pdf_bytes) method
  • ocr/tests/test_gemini_engine.py - 18 unit tests (all passing)

Modified Files:

  • ocr/app/config.py - Added vertex_ai_project, vertex_ai_location, gemini_model settings
  • ocr/requirements.txt - Added google-cloud-aiplatform>=1.40.0
  • docker-compose.yml - Added VERTEX_AI_PROJECT, VERTEX_AI_LOCATION, GEMINI_MODEL env vars to mvp-ocr
  • docker-compose.staging.yml - Same env vars added
  • docker-compose.prod.yml - Same env vars added

Implementation Details

  • GeminiEngine is standalone (NOT extending OcrEngine ABC) since Gemini does semantic document understanding, not traditional OCR
  • Uses Vertex AI SDK with GenerativeModel and generate_content() with response_mime_type="application/json" and response_schema for guaranteed JSON structure
  • PDFs >20MB rejected with clear error message
  • Lazy initialization: model not created until first extract_maintenance() call
  • Authentication via same WIF credential path as Google Vision (GOOGLE_APPLICATION_CREDENTIALS)
  • Returns MaintenanceExtractionResult with list of MaintenanceItem dataclasses (camelCase from API mapped to snake_case Python)

Test Results

18 passed in 4.81s

Test scenarios covered:

  • Exception hierarchy validation
  • Data type construction (required-only and all fields)
  • PDF >20MB rejection
  • PDF exactly at 20MB limit passes size check
  • Valid PDF returns structured maintenance schedules
  • PDF with no maintenance content returns empty array
  • Nullable fields handled correctly
  • Missing credential file raises GeminiUnavailableError
  • Missing SDK raises GeminiUnavailableError
  • API runtime error wrapped as GeminiProcessingError
  • Invalid JSON response wrapped as GeminiProcessingError
  • Lazy initialization verified (model is None after construction)
  • Model reused on subsequent calls

Acceptance Criteria Status

  • GeminiEngine.extract_maintenance(pdf_bytes) returns structured JSON with maintenanceSchedule array
  • Each schedule item has serviceName (required), intervalMiles (nullable), intervalMonths (nullable), details (nullable)
  • PDFs >20MB rejected with clear error message
  • Authentication works via WIF in container environment (same path as Google Vision)
  • Configuration reads from environment variables with sensible defaults

Verdict: PASS | Next: Ready for PR / Quality Review

## Milestone: Gemini Engine Module and Configuration **Phase**: Execution | **Agent**: Feature Agent | **Status**: PASS ### Changes Made **New Files:** - `ocr/app/engines/gemini_engine.py` - Standalone `GeminiEngine` class with `extract_maintenance(pdf_bytes)` method - `ocr/tests/test_gemini_engine.py` - 18 unit tests (all passing) **Modified Files:** - `ocr/app/config.py` - Added `vertex_ai_project`, `vertex_ai_location`, `gemini_model` settings - `ocr/requirements.txt` - Added `google-cloud-aiplatform>=1.40.0` - `docker-compose.yml` - Added `VERTEX_AI_PROJECT`, `VERTEX_AI_LOCATION`, `GEMINI_MODEL` env vars to mvp-ocr - `docker-compose.staging.yml` - Same env vars added - `docker-compose.prod.yml` - Same env vars added ### Implementation Details - `GeminiEngine` is standalone (NOT extending `OcrEngine` ABC) since Gemini does semantic document understanding, not traditional OCR - Uses Vertex AI SDK with `GenerativeModel` and `generate_content()` with `response_mime_type="application/json"` and `response_schema` for guaranteed JSON structure - PDFs >20MB rejected with clear error message - Lazy initialization: model not created until first `extract_maintenance()` call - Authentication via same WIF credential path as Google Vision (`GOOGLE_APPLICATION_CREDENTIALS`) - Returns `MaintenanceExtractionResult` with list of `MaintenanceItem` dataclasses (camelCase from API mapped to snake_case Python) ### Test Results ``` 18 passed in 4.81s ``` Test scenarios covered: - Exception hierarchy validation - Data type construction (required-only and all fields) - PDF >20MB rejection - PDF exactly at 20MB limit passes size check - Valid PDF returns structured maintenance schedules - PDF with no maintenance content returns empty array - Nullable fields handled correctly - Missing credential file raises GeminiUnavailableError - Missing SDK raises GeminiUnavailableError - API runtime error wrapped as GeminiProcessingError - Invalid JSON response wrapped as GeminiProcessingError - Lazy initialization verified (model is None after construction) - Model reused on subsequent calls ### Acceptance Criteria Status - [x] GeminiEngine.extract_maintenance(pdf_bytes) returns structured JSON with maintenanceSchedule array - [x] Each schedule item has serviceName (required), intervalMiles (nullable), intervalMonths (nullable), details (nullable) - [x] PDFs >20MB rejected with clear error message - [x] Authentication works via WIF in container environment (same path as Google Vision) - [x] Configuration reads from environment variables with sensible defaults *Verdict*: PASS | *Next*: Ready for PR / Quality Review
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: egullickson/motovaultpro#133