[Chore]: Upgrade container image versions and migrate Promtail to Grafana Alloy #95
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem / User Need
After implementing the unified logging system (#80-#87), the Promtail container fails with a Docker API version incompatibility error:
Root cause: Docker Engine v29 raised the minimum supported API version to 1.44. Promtail 2.9.0 (deployed in #86) embeds Docker client API v1.42, which is below this minimum. Additionally, several container images deployed in the logging stack are significantly outdated.
Beyond the logging stack, the Python OCR container uses Python 3.11, which should be upgraded to at least 3.12+.
Current State vs Target State
grafana/promtail:2.9.0grafana/alloy:v1.12.2grafana/loki:2.9.0grafana/loki:3.6.1grafana/grafana:10.0.0grafana/grafana:12.4.0grafana/grafana-ossrepo deprecated as of 12.4.0 - usegrafana/grafana.python:3.11-slimpython:3.13-slimProposed Solution
1. Replace Promtail with Grafana Alloy
Promtail is officially deprecated by Grafana Labs. LTS ends February 28, 2026. EOL is March 2, 2026. The official successor is Grafana Alloy (OpenTelemetry Collector distribution with programmable pipelines).
Migration approach:
mvp-promtailcontainer withmvp-alloyusinggrafana/alloy:v1.12.2config/promtail/config.ymlto Alloy config format usingalloy convert --source-format=promtaildiscovery.docker+loki.source.dockerfor container log collection (replacesdocker_sd_configs)Current Promtail config (to be converted):
Alloy equivalent (using native Docker components):
2. Upgrade Loki 2.9.0 to 3.6.1
Breaking changes to address:
tsdbindex +v13schema, OR setallow_structured_metadata: falseinitiallyservice_namelabel auto-assignedcortex_*renamed toloki_*Recommended Loki config migration (current uses
boltdb-shipper+schema: v11):Since this is a fresh logging stack (deployed Feb 2026), there is no existing data to migrate. A clean cutover to tsdb + v13 schema is the simplest approach.
3. Upgrade Grafana 10.0.0 to 12.4.0
config/grafana/datasources/loki.yml) should remain compatiblegrafana/grafana-ossDocker Hub repo is deprecated. Usegrafana/grafanainstead.GF_SECURITY_ADMIN_PASSWORD4. Upgrade Python 3.11 to 3.13
python:3.11-slimtopython:3.13-slimNon-goals / Out of Scope
Acceptance Criteria
Promtail to Alloy Migration
mvp-promtailcontainer replaced withmvp-alloyusinggrafana/alloy:v1.12.2config/alloy/config.alloy(replacesconfig/promtail/config.yml)discovery.docker) works without API version errorsLoki Upgrade
grafana/loki:3.6.1Grafana Upgrade
grafana/grafana:12.4.0Python Upgrade
python:3.13-slimGeneral
docker compose up -dsucceeds cleanlyFiles to Modify
docker-compose.yml- Update image versions, renamemvp-promtailtomvp-alloyconfig/promtail/config.yml- Remove (replaced by Alloy config)config/alloy/config.alloy- New Alloy configurationconfig/loki/config.yml- Update for Loki 3.6 compatibilitybackend/ocr/Dockerfile(or equivalent) - Update Python base imagedocs/LOGGING.md- Update version references and Alloy detailsREADME.md- Update if container names referencedCLAUDE.md- Update if version/container details referenced.gitea/workflows/staging.yaml- Update if image versions referenced.gitea/workflows/production.yaml- Update if image versions referencedResearch Sources
Test Plan
Smoke tests:
docker compose logsmvp-alloy/loki/api/v1/pushFunctional tests:
Regression tests:
Plan: Container Image Upgrades and Promtail-to-Alloy Migration
Phase: Planning | Agent: Orchestrator | Status: AWAITING_REVIEW
Sub-Issues Created
Decision Critic Results
Four key decisions were evaluated through a 7-step structured critique:
Decision 1: Loki Schema Migration - VERIFIED with nuance
allow_structured_metadata: falsewith v11.Decision 2: Container Naming - STAND
mvp-promtailtomvp-alloyDecision 3: Execution Ordering - REVISED
Decision 4: Loki Volume Handling - STAND
mvp_loki_datavolumeCodebase Analysis Summary
10 files require changes across 4 categories:
docker-compose.ymlscripts/ci/mirror-base-images.sh.gitea/workflows/production.yamlconfig/promtail/config.yml(DELETE),config/alloy/config.alloy(CREATE),config/loki/config.ymlocr/Dockerfiledocs/LOGGING.md,CLAUDE.md,.ai/context.jsonHardcoded references found:
mvp-loki:3100in 3 locations (Alloy config, Grafana datasource, Loki healthcheck) - no change needed, container name preserved.Milestone Plan
Milestone 1: Update Base Image Mirror Script (#96)
Agent: Platform Agent | Files: 1 | Risk: Low
scripts/ci/mirror-base-images.sh:python:3.11-slim->python:3.13-slimgrafana/loki:2.9.0->grafana/loki:3.6.1grafana/promtail:2.9.0->grafana/alloy:v1.12.2grafana/grafana:10.0.0->grafana/grafana:12.4.0Verification: Mirror script syntax is valid bash.
Milestone 2: Replace Promtail with Grafana Alloy (#97)
Agent: Platform Agent | Files: 4 | Risk: Medium (critical fix)
Edit
docker-compose.yml(lines 289-307):mvp-promtail->mvp-alloygrafana/promtail:2.9.0->grafana/alloy:v1.12.2mvp-promtail->mvp-alloy./config/promtail/config.yml:/etc/promtail/config.yml:ro->./config/alloy/config.alloy:/etc/alloy/config.alloy-config.file=/etc/promtail/config.yml->run --server.http.listen-addr=0.0.0.0:12345 --storage.path=/var/lib/alloy/data /etc/alloy/config.alloyCreate
config/alloy/config.alloywith Docker discovery, relabeling, and Loki push configuration.Delete
config/promtail/directory.Edit
.gitea/workflows/production.yaml(line 174):mvp-promtail->mvp-alloyin shared services start command.Verification: Alloy container starts, no Docker API version errors, logs appear in Loki.
Milestone 3: Upgrade Loki to 3.6.1 (#98)
Agent: Platform Agent | Files: 2 | Risk: Medium (schema migration)
Edit
docker-compose.yml(line 269):grafana/loki:2.9.0->grafana/loki:3.6.1Rewrite
config/loki/config.yml:v11->v13boltdb-shipper->tsdbboltdb-shipper-active->tsdb-index,boltdb-shipper-cache->tsdb-cacheallow_structured_metadata: falseNote:
mvp_loki_datavolume should be cleared on deployment since storage format changes. Add step to production deployment or document manual volume clear.Verification: Loki starts, /ready healthcheck passes, accepts pushes from Alloy.
Milestone 4: Upgrade Grafana to 12.4.0 (#99)
Agent: Platform Agent | Files: 1 | Risk: Low
Edit
docker-compose.yml(line 311):grafana/grafana:10.0.0->grafana/grafana:12.4.0Verify
config/grafana/datasources/loki.ymlworks unchanged (A3 VERIFIED - apiVersion 1 stable since Grafana 5.x).Verification: Grafana starts, healthcheck passes, Loki datasource provisioned, Explore view returns log queries.
Milestone 5: Upgrade Python OCR to 3.13 (#100)
Agent: Platform Agent | Files: 1-2 | Risk: Medium (C7 UNCERTAIN)
Edit
ocr/Dockerfile(line 7):python:3.11-slim->python:3.13-slimBuild and verify all pip dependencies install (especially opencv-python-headless, pillow-heif, PyMuPDF).
python:3.12-slimas safe fallback. Update mirror script accordingly.Verification: Container builds, health endpoint responds, VIN image processing works.
Milestone 6: Documentation Updates (#101)
Agent: Technical Writer | Files: 3-4 | Risk: Low
docs/LOGGING.md:docker logs mvp-promtail->docker logs mvp-alloyCLAUDE.md(line 176):.ai/context.json:container_countfrom 5 to 9README.md: Update architecture description if Promtail is mentioned.Verification: All documentation references are consistent with deployed state.
Risk Register
Execution Notes
issue-95-upgrade-container-imageschore: {description} (refs #95)Verdict: PLAN READY FOR REVIEW | Next: QR plan-completeness
QR Review: Plan Review Cycle Results
Phase: Plan-Review | Agent: Quality Reviewer + Technical Writer | Status: PASS
Gate 1: QR plan-completeness - PASS_WITH_CONCERNS
Valid finding: Add Alloy config validation command as a fallback step in Milestone 2.
Dismissed: Most concerns (missing verification steps, dependency ordering) are already present in the plan. Each milestone has explicit verification criteria and the execution notes state "Milestone ordering is strict: 1 -> 2 -> 3 -> 4 -> 5 -> 6."
Gate 2: TW plan-scrub - PASS_WITH_CONCERNS
Valid finding: Add rollback note to execution section.
Dismissed: Temporal contamination and terminology concerns are not applicable - the plan clearly distinguishes "Create" (new files) from "Edit" (existing files), and uses consistent container name terminology throughout.
Gate 3: QR plan-code (RULE 0/1/2) - PASS_WITH_CONCERNS
RULE 0 findings - addressed by existing plan:
/var/run/docker.sock:/var/run/docker.sock:ro.RULE 1 findings - addressed by codebase analysis:
mvp-promtailappears only in docker-compose.yml, production.yaml line 174, config/promtail/config.yml, and docs/LOGGING.md. All are covered in milestones 2 and 6.RULE 2 finding - already mitigated:
Gate 4: QR plan-docs - PASS_WITH_CONCERNS
Valid findings to incorporate into Milestone 6:
docs/DATABASE-SCHEMA.mdline 3config/CLAUDE.mdline 3.ai/context.jsonneeds two changes (line 3 architecture string + line 293 container_count)Plan Revisions
Incorporating valid findings from all four gates:
Milestone 2 addition: Add Alloy config syntax validation step:
Milestone 6 additions:
docs/DATABASE-SCHEMA.mdandconfig/CLAUDE.mdto documentation updates (fix "5-container" references).ai/context.jsonhas two update locationsExecution notes addition:
"If any milestone fails during execution, the branch preserves all prior milestone commits. The urgent Alloy fix (Milestone 2) is positioned early so it can be cherry-picked if later milestones encounter issues."
Verdict: PASS | Next: Create branch and begin execution