fix: invert min-channel so Tesseract gets dark-on-light text (refs #113)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 35s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 7s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped

The min-channel correctly extracts contrast (white text=255 vs green
sticker bg=130), but Tesseract expects dark text on light background.
Without inversion, the grayscale-only path returned empty text for
every PSM mode because Tesseract couldn't see bright-on-dark text.
Invert via bitwise_not: text becomes 0 (black), sticker bg becomes
125 (gray). Fixes all three OCR paths (adaptive, grayscale, Otsu).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Eric Gullickson
2026-02-06 21:39:48 -06:00
parent 63c027a454
commit ae5221c759

View File

@@ -153,34 +153,32 @@ class VinPreprocessor:
def _best_contrast_channel(self, bgr_image: np.ndarray) -> np.ndarray: def _best_contrast_channel(self, bgr_image: np.ndarray) -> np.ndarray:
""" """
Compute a grayscale image that maximizes text-to-background contrast. Compute a grayscale image with dark text on light background.
Uses per-pixel minimum across B, G, R channels. White text has Uses inverted per-pixel minimum across B, G, R channels.
min(255,255,255) = 255 regardless of channel, while any colored White text has min(255,255,255) = 255 → inverted to 0 (black).
background has a low value in at least one channel (e.g. green Colored backgrounds have a low min value (e.g. green sticker:
sticker: min(130,230,150) = 130). This gives ~125 units of min(130,230,150) = 130) → inverted to 125 (medium gray).
contrast vs ~60 from standard grayscale.
Falls back to standard grayscale when the min-channel doesn't The inversion ensures Tesseract always receives dark-text-on-
improve contrast (i.e. for already-neutral/gray images). light-background, which is the polarity it expects.
""" """
b_channel, g_channel, r_channel = cv2.split(bgr_image) b_channel, g_channel, r_channel = cv2.split(bgr_image)
min_channel = np.minimum(np.minimum(b_channel, g_channel), r_channel) min_channel = np.minimum(np.minimum(b_channel, g_channel), r_channel)
gray = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2GRAY)
min_std = float(np.std(min_channel)) # Invert so white text (min=255) becomes black (0) and colored
gray_std = float(np.std(gray)) # backgrounds (min~130) become lighter gray (~125). Tesseract
# expects dark text on light background.
inverted = cv2.bitwise_not(min_channel)
gray = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2GRAY)
logger.debug( logger.debug(
"Channel contrast: min-channel std=%.1f, grayscale std=%.1f", "Channel contrast: inverted-min std=%.1f, grayscale std=%.1f",
min_std, gray_std, float(np.std(inverted)), float(np.std(gray)),
) )
# Always use min-channel for VIN images. White text keeps return inverted
# min(B,G,R)=255 while any colored background drops to its
# weakest channel. For neutral images the result is equivalent
# to grayscale, so there is no downside.
return min_channel
def _apply_clahe(self, image: np.ndarray) -> np.ndarray: def _apply_clahe(self, image: np.ndarray) -> np.ndarray:
""" """