feat: Add Grafana dashboards and alerting (#105) #112

Merged
egullickson merged 8 commits from issue-105-add-grafana-dashboards into main 2026-02-06 17:44:05 +00:00

8 Commits

Author SHA1 Message Date
Eric Gullickson
462d306783 fix: resolve staging deployment issues with Traefik, Loki, and Alloy (refs #105)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 1m21s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 48s
Deploy to Staging / Verify Staging (pull_request) Successful in 2m37s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
- Exclude blue-green.yml from staging Traefik by mounting dynamic-staging/
  directory (only grafana.yml + middleware.yml) instead of dynamic/ which
  contains production-only blue-green routing config
- Disable Loki healthcheck: distroless image has no /bin/sh so CMD-SHELL
  healthchecks cannot execute; Alloy and Grafana verify Loki connectivity
- Fix Alloy healthcheck: replace wget (not in image) with bash /dev/tcp
- Add Grafana staging domain override (logs.staging.motovaultpro.com)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 10:51:00 -06:00
Eric Gullickson
842b0eb945 docs: update config/CLAUDE.md with Grafana subdirectories (refs #111)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 34s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 2m36s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 10:32:58 -06:00
Eric Gullickson
4b2b318aff feat: add Grafana alerting rules and documentation (refs #111)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 36s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 2m36s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
Configure Grafana Unified Alerting with file-based provisioned alert
rules, contact points, and notification policies. Add stable UID to
Loki datasource for alert rule references. Update LOGGING.md with
dashboard descriptions, alerting rules table, and LogQL query reference.

Alert rules: Error Rate Spike (critical), Container Silence for
backend/postgres/redis (warning), 5xx Response Spike (critical).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 10:19:00 -06:00
Eric Gullickson
c891250946 feat: add Infrastructure Grafana dashboard (refs #110)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 10:11:38 -06:00
Eric Gullickson
0345e3976f feat: add Error Investigation Grafana dashboard (refs #109)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 09:54:52 -06:00
Eric Gullickson
9e6f130fa6 feat: add API Performance Grafana dashboard (refs #108)
Log-based dashboard with 6 panels: request rate, response time
distribution (p50/p95/p99), HTTP status code distribution, request
volume by endpoint, slowest endpoints, and status code breakdown.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 09:48:11 -06:00
Eric Gullickson
33e561e537 feat: add Application Overview Grafana dashboard (refs #107)
Adds file-provisioned dashboard with 5 panels:
- Container Log Volume Over Time (all 9 containers)
- Error Rate Across All Containers (percentage stat)
- Log Level Distribution Per Container (stacked bar chart)
- Container Health Status (green/red per container)
- Total Request Count Over Time (backend requests/min)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 08:24:08 -06:00
Eric Gullickson
6f1195d907 feat: add Grafana dashboard provisioning infrastructure (refs #106)
Add file-based dashboard provisioning config and mount dashboards
directory into Grafana container for auto-loading dashboard JSON files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 08:19:28 -06:00