feat: Error Investigation Grafana dashboard (#105) #109

Closed
opened 2026-02-06 14:01:54 +00:00 by egullickson · 1 comment
Owner

Parent Issue

Relates to #105

Summary

Create the Error Investigation dashboard enabling debugging with log stream, error filtering, stack trace viewing, and requestId correlation lookup.

Scope

Create config/grafana/dashboards/error-investigation.json with these panels:

  1. Error Log Stream - Logs panel (live tail of error-level logs)
    • LogQL: {container=~"mvp-.*"} | json | level="error"
  2. Error Rate Over Time - Timeseries
    • LogQL: sum(count_over_time({container=~"mvp-.*"} | json | level="error" [1m]))
  3. Errors by Container - Bar chart
    • LogQL: sum by (container) (count_over_time({container=~"mvp-.*"} | json | level="error" [5m]))
  4. Errors by Endpoint - Table
    • LogQL: sum by (path) (count_over_time({container="mvp-backend"} | json | level="error" [5m]))
  5. Stack Trace Viewer - Logs panel showing error + stack fields
    • LogQL: {container="mvp-backend"} | json | level="error" | line_format "{{.error}}\n{{.stack}}"
  6. Correlation ID Lookup - Logs panel with template variable for requestId
    • Template variable: requestId (text input)
    • LogQL: {container="mvp-backend"} |= "$requestId"
  7. Recent 5xx Responses - Table
    • LogQL: {container="mvp-backend"} | json | msg="Request processed" | status >= 500

Dashboard Variables

  • requestId - Text input for correlation ID lookup

Files Changed

  • config/grafana/dashboards/error-investigation.json (NEW)

Acceptance Criteria

  • Error log stream shows live errors
  • Error rate tracks over time
  • Errors filterable by container and endpoint
  • Stack traces visible for backend errors
  • RequestId lookup returns correlated logs
  • 5xx responses listed with details
## Parent Issue Relates to #105 ## Summary Create the Error Investigation dashboard enabling debugging with log stream, error filtering, stack trace viewing, and requestId correlation lookup. ## Scope Create `config/grafana/dashboards/error-investigation.json` with these panels: 1. **Error Log Stream** - Logs panel (live tail of error-level logs) - LogQL: `{container=~"mvp-.*"} | json | level="error"` 2. **Error Rate Over Time** - Timeseries - LogQL: `sum(count_over_time({container=~"mvp-.*"} | json | level="error" [1m]))` 3. **Errors by Container** - Bar chart - LogQL: `sum by (container) (count_over_time({container=~"mvp-.*"} | json | level="error" [5m]))` 4. **Errors by Endpoint** - Table - LogQL: `sum by (path) (count_over_time({container="mvp-backend"} | json | level="error" [5m]))` 5. **Stack Trace Viewer** - Logs panel showing error + stack fields - LogQL: `{container="mvp-backend"} | json | level="error" | line_format "{{.error}}\n{{.stack}}"` 6. **Correlation ID Lookup** - Logs panel with template variable for requestId - Template variable: `requestId` (text input) - LogQL: `{container="mvp-backend"} |= "$requestId"` 7. **Recent 5xx Responses** - Table - LogQL: `{container="mvp-backend"} | json | msg="Request processed" | status >= 500` ## Dashboard Variables - `requestId` - Text input for correlation ID lookup ## Files Changed - `config/grafana/dashboards/error-investigation.json` (NEW) ## Acceptance Criteria - [ ] Error log stream shows live errors - [ ] Error rate tracks over time - [ ] Errors filterable by container and endpoint - [ ] Stack traces visible for backend errors - [ ] RequestId lookup returns correlated logs - [ ] 5xx responses listed with details
egullickson added the
status
backlog
type
feature
labels 2026-02-06 14:02:18 +00:00
egullickson added this to the Sprint 2026-02-02 milestone 2026-02-06 14:02:22 +00:00
egullickson added
status
in-progress
and removed
status
backlog
labels 2026-02-06 15:52:57 +00:00
Author
Owner

Milestone: Error Investigation Dashboard

Phase: Execution | Agent: Platform Agent | Status: PASS

Summary

Created config/grafana/dashboards/error-investigation.json with all 7 panels per spec:

# Panel Type LogQL
1 Error Log Stream logs {container=~"mvp-.*"} | json | level="error"
2 Error Rate Over Time timeseries sum(count_over_time({container=~"mvp-.*"} | json | level="error" [1m]))
3 Errors by Container barchart sum by (container) (count_over_time({container=~"mvp-.*"} | json | level="error" [5m]))
4 Errors by Endpoint table sum by (path) (count_over_time({container="mvp-backend"} | json | level="error" [5m]))
5 Stack Trace Viewer logs {container="mvp-backend"} | json | level="error" | line_format "{{.error}}\n{{.stack}}"
6 Correlation ID Lookup logs {container="mvp-backend"} |= "$requestId"
7 Recent 5xx Responses table {container="mvp-backend"} | json | msg="Request processed" | status >= 500

Template Variables

  • datasource - Loki datasource selector (standard)
  • requestId - Text input for correlation ID lookup

Pattern Conformance

  • Follows existing dashboard structure (api-performance.json, application-overview.json)
  • Grafana 12.4.0, schemaVersion 39, editable: false
  • File-based provisioning compatible
  • Tags: errors, debugging, backend

Commit

0345e39 - feat: add Error Investigation Grafana dashboard (refs #109)

Verdict: PASS | Next: QR post-implementation review

## Milestone: Error Investigation Dashboard **Phase**: Execution | **Agent**: Platform Agent | **Status**: PASS ### Summary Created `config/grafana/dashboards/error-investigation.json` with all 7 panels per spec: | # | Panel | Type | LogQL | |---|-------|------|-------| | 1 | Error Log Stream | logs | `{container=~"mvp-.*"} \| json \| level="error"` | | 2 | Error Rate Over Time | timeseries | `sum(count_over_time({container=~"mvp-.*"} \| json \| level="error" [1m]))` | | 3 | Errors by Container | barchart | `sum by (container) (count_over_time({container=~"mvp-.*"} \| json \| level="error" [5m]))` | | 4 | Errors by Endpoint | table | `sum by (path) (count_over_time({container="mvp-backend"} \| json \| level="error" [5m]))` | | 5 | Stack Trace Viewer | logs | `{container="mvp-backend"} \| json \| level="error" \| line_format "{{.error}}\n{{.stack}}"` | | 6 | Correlation ID Lookup | logs | `{container="mvp-backend"} \|= "$requestId"` | | 7 | Recent 5xx Responses | table | `{container="mvp-backend"} \| json \| msg="Request processed" \| status >= 500` | ### Template Variables - `datasource` - Loki datasource selector (standard) - `requestId` - Text input for correlation ID lookup ### Pattern Conformance - Follows existing dashboard structure (api-performance.json, application-overview.json) - Grafana 12.4.0, schemaVersion 39, editable: false - File-based provisioning compatible - Tags: `errors`, `debugging`, `backend` ### Commit `0345e39` - `feat: add Error Investigation Grafana dashboard (refs #109)` *Verdict*: PASS | *Next*: QR post-implementation review
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: egullickson/motovaultpro#109