Files

Eric Gullickson ab0d8463be docs: update CLAUDE.md indexes and README for OCR expansion (refs #137 )

Add/update documentation across backend, Python OCR service, and frontend
for receipt scanning, manual extraction, and Gemini integration. Create
new CLAUDE.md files for engines/, fuel-logs/, documents/, and maintenance/
features.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-11 11:04:19 -06:00

1.7 KiB

Raw Blame History

ocr/app/engines/

OCR engine abstraction layer. Two categories of engines:

OcrEngine subclasses (image-to-text): PaddleOCR, Google Vision, Hybrid. Accept image bytes, return text + confidence + word boxes.
GeminiEngine (PDF-to-structured-data): Standalone module for maintenance schedule extraction via Vertex AI. Accepts PDF bytes, returns structured JSON. Not an OcrEngine subclass because the interface signatures differ.

Files

File	What	When to read
`__init__.py`	Public engine API exports (OcrEngine, create_engine, exceptions)	Importing engine interfaces
`base_engine.py`	OcrEngine ABC, OcrConfig, OcrEngineResult, WordBox, exception hierarchy	Engine interface contract, adding new engines
`paddle_engine.py`	PaddleOCR PP-OCRv4 primary engine	Local OCR debugging, accuracy tuning
`cloud_engine.py`	Google Vision TEXT_DETECTION fallback engine (WIF authentication)	Cloud OCR configuration, API quota
`hybrid_engine.py`	Combines primary + fallback engine with confidence threshold switching	Engine selection logic, fallback behavior
`engine_factory.py`	Factory function and engine registry for instantiation	Adding new engine types
`gemini_engine.py`	Gemini 2.5 Flash integration for maintenance schedule extraction (Vertex AI SDK, 20MB PDF limit, structured JSON output)	Manual extraction debugging, Gemini configuration

Engine Selection

create_engine(config)
    |
    +-- Primary: PaddleOCR (local, fast, no API limits)
    |
    +-- Fallback: Google Vision (cloud, 1000/month limit)
    |
    v
HybridEngine (tries primary, falls back if confidence < threshold)

GeminiEngine is created independently by ManualExtractor, not through the engine factory.

1.7 KiB Raw Blame History

ocr/app/engines/

Files

Engine Selection

1.7 KiB

Raw Blame History