feat: Infrastructure Grafana dashboard (#105) #110

Closed
opened 2026-02-06 14:02:01 +00:00 by egullickson · 1 comment
Owner

Parent Issue

Relates to #105

Summary

Create the Infrastructure dashboard showing per-container log details for PostgreSQL, Redis, Traefik, OCR, and logging stack health.

Scope

Create config/grafana/dashboards/infrastructure.json with these panels:

  1. Per-Container Log Throughput - Timeseries
    • LogQL: sum by (container) (rate({container=~"mvp-.*"}[1m]))
  2. PostgreSQL Error/Warning Logs - Logs panel
    • LogQL: {container="mvp-postgres"} |~ "ERROR|WARNING|FATAL"
  3. Redis Connection and Command Logs - Logs panel
    • LogQL: {container="mvp-redis"}
  4. Traefik Access Logs - Logs panel
    • LogQL: {container="mvp-traefik"}
  5. Traefik Error Logs - Logs panel
    • LogQL: {container="mvp-traefik"} |~ "level=error|err="
  6. OCR Service Logs - Logs panel
    • LogQL: {container="mvp-ocr"}
  7. OCR Processing Errors - Logs panel
    • LogQL: {container="mvp-ocr"} |~ "ERROR|error|Exception|Traceback"
  8. Loki Ingestion Rate - Timeseries
    • LogQL: sum(rate({container="mvp-loki"}[1m]))

Files Changed

  • config/grafana/dashboards/infrastructure.json (NEW)

Acceptance Criteria

  • All infrastructure containers have dedicated panels
  • PostgreSQL errors/warnings highlighted
  • Traefik access and error logs separated
  • OCR processing errors visible
  • Loki ingestion rate tracked
## Parent Issue Relates to #105 ## Summary Create the Infrastructure dashboard showing per-container log details for PostgreSQL, Redis, Traefik, OCR, and logging stack health. ## Scope Create `config/grafana/dashboards/infrastructure.json` with these panels: 1. **Per-Container Log Throughput** - Timeseries - LogQL: `sum by (container) (rate({container=~"mvp-.*"}[1m]))` 2. **PostgreSQL Error/Warning Logs** - Logs panel - LogQL: `{container="mvp-postgres"} |~ "ERROR|WARNING|FATAL"` 3. **Redis Connection and Command Logs** - Logs panel - LogQL: `{container="mvp-redis"}` 4. **Traefik Access Logs** - Logs panel - LogQL: `{container="mvp-traefik"}` 5. **Traefik Error Logs** - Logs panel - LogQL: `{container="mvp-traefik"} |~ "level=error|err="` 6. **OCR Service Logs** - Logs panel - LogQL: `{container="mvp-ocr"}` 7. **OCR Processing Errors** - Logs panel - LogQL: `{container="mvp-ocr"} |~ "ERROR|error|Exception|Traceback"` 8. **Loki Ingestion Rate** - Timeseries - LogQL: `sum(rate({container="mvp-loki"}[1m]))` ## Files Changed - `config/grafana/dashboards/infrastructure.json` (NEW) ## Acceptance Criteria - [ ] All infrastructure containers have dedicated panels - [ ] PostgreSQL errors/warnings highlighted - [ ] Traefik access and error logs separated - [ ] OCR processing errors visible - [ ] Loki ingestion rate tracked
egullickson added the
status
backlog
type
feature
labels 2026-02-06 14:02:19 +00:00
egullickson added this to the Sprint 2026-02-02 milestone 2026-02-06 14:02:23 +00:00
egullickson added
status
in-progress
and removed
status
backlog
labels 2026-02-06 16:06:41 +00:00
Author
Owner

Milestone: Infrastructure Dashboard

Phase: Execution | Agent: Platform | Status: PASS

Completed

  • Created config/grafana/dashboards/infrastructure.json with 8 panels
  • Follows established dashboard patterns (Grafana 12.4.0, Loki datasource variable, schemaVersion 39)

Panels

  1. Per-Container Log Throughput (timeseries) - sum by (container) (rate({container=~"mvp-.*"}[1m]))
  2. PostgreSQL Error/Warning Logs (logs) - Filters ERROR, WARNING, FATAL
  3. Redis Connection and Command Logs (logs) - All mvp-redis logs
  4. Traefik Access Logs (logs) - All mvp-traefik logs
  5. Traefik Error Logs (logs) - Filters level=error and err=
  6. OCR Service Logs (logs) - All mvp-ocr logs
  7. OCR Processing Errors (logs) - Filters ERROR, error, Exception, Traceback
  8. Loki Ingestion Rate (timeseries) - sum(rate({container="mvp-loki"}[1m]))

Acceptance Criteria

  • All infrastructure containers have dedicated panels
  • PostgreSQL errors/warnings highlighted
  • Traefik access and error logs separated
  • OCR processing errors visible
  • Loki ingestion rate tracked

Verdict: PASS | Commit: c891250 (refs #110)

## Milestone: Infrastructure Dashboard **Phase**: Execution | **Agent**: Platform | **Status**: PASS ### Completed - Created `config/grafana/dashboards/infrastructure.json` with 8 panels - Follows established dashboard patterns (Grafana 12.4.0, Loki datasource variable, schemaVersion 39) ### Panels 1. **Per-Container Log Throughput** (timeseries) - `sum by (container) (rate({container=~"mvp-.*"}[1m]))` 2. **PostgreSQL Error/Warning Logs** (logs) - Filters ERROR, WARNING, FATAL 3. **Redis Connection and Command Logs** (logs) - All mvp-redis logs 4. **Traefik Access Logs** (logs) - All mvp-traefik logs 5. **Traefik Error Logs** (logs) - Filters level=error and err= 6. **OCR Service Logs** (logs) - All mvp-ocr logs 7. **OCR Processing Errors** (logs) - Filters ERROR, error, Exception, Traceback 8. **Loki Ingestion Rate** (timeseries) - `sum(rate({container="mvp-loki"}[1m]))` ### Acceptance Criteria - [x] All infrastructure containers have dedicated panels - [x] PostgreSQL errors/warnings highlighted - [x] Traefik access and error logs separated - [x] OCR processing errors visible - [x] Loki ingestion rate tracked *Verdict*: PASS | *Commit*: `c891250` (refs #110)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: egullickson/motovaultpro#110