feat: Migrate MaintenanceReceiptExtractor to google-genai (#231) #234

Closed
opened 2026-02-20 15:11:08 +00:00 by egullickson · 2 comments
Owner

Relates to #231

Migrate ocr/app/extractors/maintenance_receipt_extractor.py from vertexai.generative_models to google.genai:

  • Replace _get_model() with _get_client() using genai.Client(vertexai=True, project, location)
  • Store self._client and self._model_name instead of self._model and self._generation_config
  • Migrate _extract_with_gemini(): use client.models.generate_content(model=..., contents=..., config=GenerateContentConfig(...))
  • No Google Search grounding (text-only receipts)

Acceptance Criteria

  • No imports from vertexai or google.cloud.aiplatform
  • Uses genai.Client(vertexai=True, ...) for initialization
  • Receipt extraction works with new SDK
  • Error handling preserved

File

ocr/app/extractors/maintenance_receipt_extractor.py

Relates to #231 Migrate `ocr/app/extractors/maintenance_receipt_extractor.py` from `vertexai.generative_models` to `google.genai`: - Replace `_get_model()` with `_get_client()` using `genai.Client(vertexai=True, project, location)` - Store `self._client` and `self._model_name` instead of `self._model` and `self._generation_config` - Migrate `_extract_with_gemini()`: use `client.models.generate_content(model=..., contents=..., config=GenerateContentConfig(...))` - No Google Search grounding (text-only receipts) ## Acceptance Criteria - [ ] No imports from `vertexai` or `google.cloud.aiplatform` - [ ] Uses `genai.Client(vertexai=True, ...)` for initialization - [ ] Receipt extraction works with new SDK - [ ] Error handling preserved ## File `ocr/app/extractors/maintenance_receipt_extractor.py`
egullickson added the
status
in-progress
type
feature
labels 2026-02-20 15:11:20 +00:00
egullickson added this to the Sprint 2026-02-02 milestone 2026-02-20 15:11:23 +00:00
Author
Owner

Plan: M3 -- Migrate MaintenanceReceiptExtractor (#234)

Phase: Planning | Agent: Planner | Status: APPROVED
Parent: #231 | Revision: v4


Context

The OCR service uses the deprecated vertexai.generative_models SDK in maintenance_receipt_extractor.py. This file follows the same SDK pattern as gemini_engine.py but processes text-only receipts (no image parts, no Google Search grounding).

Codebase Analysis

File SDK References Action
ocr/app/extractors/maintenance_receipt_extractor.py 1 import site: aiplatform + GenerationConfig+GenerativeModel (L187-191) Full migration (no search grounding)

API Migration Map

Old (vertexai.generative_models) New (google.genai)
from google.cloud import aiplatform from google import genai
from vertexai.generative_models import GenerativeModel, GenerationConfig, Part from google.genai import types
aiplatform.init(project=..., location=...) genai.Client(vertexai=True, project=..., location=...)
GenerativeModel(model_name) Client handles model per-call via model= kwarg
model.generate_content([...], generation_config=config) client.models.generate_content(model=name, contents=[...], config=config)
GenerationConfig(response_mime_type=..., response_schema=...) types.GenerateContentConfig(response_mime_type=..., response_schema=...)
Schema type "string", "object", etc. Schema type "STRING", "OBJECT", etc. (uppercase per Vertex AI Schema spec)

Internal State Changes

MaintenanceReceiptExtractor changes from:

self._model: Any | None = None          # GenerativeModel instance (set in _get_model, L90)
self._generation_config: Any | None = None  # GenerationConfig instance (set in _get_model, L91)

To:

self._client: Any | None = None         # genai.Client instance (set in _get_client)
self._model_name: str = ""              # Model name string for per-call use

Note: MaintenanceExtractionResult.model (the model name string field, e.g., "gemini-2.5-flash") is unaffected by this migration -- it is populated from settings.gemini_model and has no relation to the self._model instance attribute.

Authentication

Same as GeminiEngine: GOOGLE_APPLICATION_CREDENTIALS env var pointing to WIF credential config. CRITICAL: os.environ["GOOGLE_APPLICATION_CREDENTIALS"] and os.environ["GOOGLE_EXTERNAL_ACCOUNT_ALLOW_EXECUTABLES"] MUST be set BEFORE genai.Client() construction.

Implementation

  • File: ocr/app/extractors/maintenance_receipt_extractor.py
  • Same _get_model() -> _get_client() pattern as M2 (ADC env vars set first)
  • Remove both self._model and self._generation_config from __init__; replace with self._client and self._model_name
  • Convert _RECEIPT_RESPONSE_SCHEMA type values to uppercase (same as M2)
  • _extract_with_gemini(): call self._client.models.generate_content(model=self._model_name, contents=[...], config=types.GenerateContentConfig(...))
  • Fix pre-existing bug: Change _get_client() to raise GeminiUnavailableError for missing credentials (currently raises bare RuntimeError); add try/except ImportError and try/except Exception blocks matching GeminiEngine._get_client() pattern
  • Update _get_model() docstring (L173): replace "Lazy-initialize Vertex AI Gemini model" with "Lazy-initialize google-genai Gemini client"
  • No Google Search grounding (text-only receipts)
  • No Part usage (text input only)

Review Findings

QR plan-code:

  • [RULE 1] HIGH: MaintenanceReceiptExtractor raises bare RuntimeError instead of GeminiUnavailableError -- fix during migration
  • [RULE 0] CRITICAL: Schema type values must be uppercase -- added to implementation

QR plan-docs:

  • [RULE 2] SHOULD_FIX: _get_model() docstring (L173) -- included in implementation

TW plan-scrub:

  • CONSISTENCY: MaintenanceExtractionResult.model field disambiguated from self._model

Verdict: APPROVED | Next: Execute (depends on M1 #232)

## Plan: M3 -- Migrate MaintenanceReceiptExtractor (#234) **Phase**: Planning | **Agent**: Planner | **Status**: APPROVED **Parent**: #231 | **Revision**: v4 --- ### Context The OCR service uses the deprecated `vertexai.generative_models` SDK in `maintenance_receipt_extractor.py`. This file follows the same SDK pattern as `gemini_engine.py` but processes text-only receipts (no image parts, no Google Search grounding). ### Codebase Analysis | File | SDK References | Action | |------|---------------|--------| | `ocr/app/extractors/maintenance_receipt_extractor.py` | 1 import site: `aiplatform` + `GenerationConfig+GenerativeModel` (L187-191) | Full migration (no search grounding) | ### API Migration Map | Old (`vertexai.generative_models`) | New (`google.genai`) | |-------------------------------------|----------------------| | `from google.cloud import aiplatform` | `from google import genai` | | `from vertexai.generative_models import GenerativeModel, GenerationConfig, Part` | `from google.genai import types` | | `aiplatform.init(project=..., location=...)` | `genai.Client(vertexai=True, project=..., location=...)` | | `GenerativeModel(model_name)` | Client handles model per-call via `model=` kwarg | | `model.generate_content([...], generation_config=config)` | `client.models.generate_content(model=name, contents=[...], config=config)` | | `GenerationConfig(response_mime_type=..., response_schema=...)` | `types.GenerateContentConfig(response_mime_type=..., response_schema=...)` | | Schema type `"string"`, `"object"`, etc. | Schema type `"STRING"`, `"OBJECT"`, etc. (uppercase per Vertex AI Schema spec) | ### Internal State Changes **MaintenanceReceiptExtractor** changes from: ```python self._model: Any | None = None # GenerativeModel instance (set in _get_model, L90) self._generation_config: Any | None = None # GenerationConfig instance (set in _get_model, L91) ``` To: ```python self._client: Any | None = None # genai.Client instance (set in _get_client) self._model_name: str = "" # Model name string for per-call use ``` Note: `MaintenanceExtractionResult.model` (the model name string field, e.g., `"gemini-2.5-flash"`) is **unaffected** by this migration -- it is populated from `settings.gemini_model` and has no relation to the `self._model` instance attribute. ### Authentication Same as GeminiEngine: `GOOGLE_APPLICATION_CREDENTIALS` env var pointing to WIF credential config. **CRITICAL**: `os.environ["GOOGLE_APPLICATION_CREDENTIALS"]` and `os.environ["GOOGLE_EXTERNAL_ACCOUNT_ALLOW_EXECUTABLES"]` MUST be set BEFORE `genai.Client()` construction. ### Implementation - File: `ocr/app/extractors/maintenance_receipt_extractor.py` - Same `_get_model()` -> `_get_client()` pattern as M2 (ADC env vars set first) - Remove both `self._model` and `self._generation_config` from `__init__`; replace with `self._client` and `self._model_name` - Convert `_RECEIPT_RESPONSE_SCHEMA` type values to uppercase (same as M2) - `_extract_with_gemini()`: call `self._client.models.generate_content(model=self._model_name, contents=[...], config=types.GenerateContentConfig(...))` - **Fix pre-existing bug**: Change `_get_client()` to raise `GeminiUnavailableError` for missing credentials (currently raises bare `RuntimeError`); add `try/except ImportError` and `try/except Exception` blocks matching `GeminiEngine._get_client()` pattern - Update `_get_model()` docstring (L173): replace "Lazy-initialize Vertex AI Gemini model" with "Lazy-initialize google-genai Gemini client" - No Google Search grounding (text-only receipts) - No `Part` usage (text input only) ### Review Findings **QR plan-code:** - [RULE 1] HIGH: `MaintenanceReceiptExtractor` raises bare `RuntimeError` instead of `GeminiUnavailableError` -- fix during migration - [RULE 0] CRITICAL: Schema type values must be uppercase -- added to implementation **QR plan-docs:** - [RULE 2] SHOULD_FIX: `_get_model()` docstring (L173) -- included in implementation **TW plan-scrub:** - CONSISTENCY: `MaintenanceExtractionResult.model` field disambiguated from `self._model` --- *Verdict*: APPROVED | *Next*: Execute (depends on M1 #232)
Author
Owner

Milestone: M3 Complete -- Migrate MaintenanceReceiptExtractor

Phase: Execution | Agent: Developer | Status: PASS


Changes

  • ocr/app/extractors/maintenance_receipt_extractor.py: Full SDK migration
    • Same _get_model() -> _get_client() pattern as GeminiEngine
    • self._model + self._generation_config -> self._client + self._model_name
    • _extract_with_gemini() uses client.models.generate_content(model=..., ...)
    • Schema type values converted to uppercase
    • Bug fix: Changed bare RuntimeError to GeminiUnavailableError for missing credentials
    • Added proper try/except ImportError and try/except Exception blocks matching GeminiEngine pattern
    • Updated _get_model() docstring

Acceptance Criteria

  • No imports from vertexai or google.cloud.aiplatform
  • Uses genai.Client(vertexai=True, ...) for initialization
  • Receipt extraction works with new SDK
  • Error handling preserved (and improved with GeminiUnavailableError)

Verdict: PASS | Next: M4 -- Update test mocks (#235)

## Milestone: M3 Complete -- Migrate MaintenanceReceiptExtractor **Phase**: Execution | **Agent**: Developer | **Status**: PASS --- ### Changes - `ocr/app/extractors/maintenance_receipt_extractor.py`: Full SDK migration - Same `_get_model()` -> `_get_client()` pattern as GeminiEngine - `self._model` + `self._generation_config` -> `self._client` + `self._model_name` - `_extract_with_gemini()` uses `client.models.generate_content(model=..., ...)` - Schema type values converted to uppercase - **Bug fix**: Changed bare `RuntimeError` to `GeminiUnavailableError` for missing credentials - Added proper `try/except ImportError` and `try/except Exception` blocks matching GeminiEngine pattern - Updated `_get_model()` docstring ### Acceptance Criteria - [x] No imports from `vertexai` or `google.cloud.aiplatform` - [x] Uses `genai.Client(vertexai=True, ...)` for initialization - [x] Receipt extraction works with new SDK - [x] Error handling preserved (and improved with GeminiUnavailableError) --- *Verdict*: PASS | *Next*: M4 -- Update test mocks (#235)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: egullickson/motovaultpro#234