feat: Grafana alerting rules and documentation (#105) #111

Closed
opened 2026-02-06 14:02:11 +00:00 by egullickson · 1 comment
Owner

Parent Issue

Relates to #105

Summary

Configure Grafana Unified Alerting with file-based provisioned alert rules and update documentation.

Scope

Alerting Rules (3 rules)

Create config/grafana/provisioning/alerting/ with alert rule YAML files:

  1. Error Rate Spike - Alert when error rate exceeds 5% over 5-minute window

    • LogQL: sum(count_over_time({container=~"mvp-.*"} | json | level="error" [5m])) / sum(count_over_time({container=~"mvp-.*"}[5m])) * 100 > 5
    • Severity: critical
    • For: 5m
  2. Container Silence - Alert when any container stops producing logs for 5 minutes

    • LogQL: count_over_time({container="mvp-backend"}[5m]) == 0 (one per critical container)
    • Severity: warning
    • For: 5m
  3. 5xx Spike - Alert when 5xx rate exceeds threshold

    • LogQL: sum(count_over_time({container="mvp-backend"} | json | msg="Request processed" | status >= 500 [5m])) > 10
    • Severity: critical
    • For: 5m

Contact Point

Configure a default contact point (Grafana UI notification + webhook placeholder for future email/Slack integration).

Documentation

Update docs/LOGGING.md with:

  • Dashboard section describing all 4 dashboards
  • How to access: https://logs.motovaultpro.com
  • Alerting rules description
  • LogQL query reference for common debugging tasks

Files Changed

  • config/grafana/provisioning/alerting/alert-rules.yml (NEW)
  • config/grafana/provisioning/alerting/contact-points.yml (NEW)
  • config/grafana/provisioning/alerting/notification-policies.yml (NEW)
  • docker-compose.yml (MODIFY - mount alerting provisioning directory)
  • docs/LOGGING.md (MODIFY - add dashboards section)

Acceptance Criteria

  • Alert rules load on Grafana startup
  • Error rate spike alert configured and testable
  • Container silence alert configured for critical containers
  • 5xx spike alert configured
  • docs/LOGGING.md updated with dashboard and alerting documentation
## Parent Issue Relates to #105 ## Summary Configure Grafana Unified Alerting with file-based provisioned alert rules and update documentation. ## Scope ### Alerting Rules (3 rules) Create `config/grafana/provisioning/alerting/` with alert rule YAML files: 1. **Error Rate Spike** - Alert when error rate exceeds 5% over 5-minute window - LogQL: `sum(count_over_time({container=~"mvp-.*"} | json | level="error" [5m])) / sum(count_over_time({container=~"mvp-.*"}[5m])) * 100 > 5` - Severity: critical - For: 5m 2. **Container Silence** - Alert when any container stops producing logs for 5 minutes - LogQL: `count_over_time({container="mvp-backend"}[5m]) == 0` (one per critical container) - Severity: warning - For: 5m 3. **5xx Spike** - Alert when 5xx rate exceeds threshold - LogQL: `sum(count_over_time({container="mvp-backend"} | json | msg="Request processed" | status >= 500 [5m])) > 10` - Severity: critical - For: 5m ### Contact Point Configure a default contact point (Grafana UI notification + webhook placeholder for future email/Slack integration). ### Documentation Update `docs/LOGGING.md` with: - Dashboard section describing all 4 dashboards - How to access: `https://logs.motovaultpro.com` - Alerting rules description - LogQL query reference for common debugging tasks ## Files Changed - `config/grafana/provisioning/alerting/alert-rules.yml` (NEW) - `config/grafana/provisioning/alerting/contact-points.yml` (NEW) - `config/grafana/provisioning/alerting/notification-policies.yml` (NEW) - `docker-compose.yml` (MODIFY - mount alerting provisioning directory) - `docs/LOGGING.md` (MODIFY - add dashboards section) ## Acceptance Criteria - [ ] Alert rules load on Grafana startup - [ ] Error rate spike alert configured and testable - [ ] Container silence alert configured for critical containers - [ ] 5xx spike alert configured - [ ] docs/LOGGING.md updated with dashboard and alerting documentation
egullickson added the
status
backlog
type
feature
labels 2026-02-06 14:02:19 +00:00
egullickson added this to the Sprint 2026-02-02 milestone 2026-02-06 14:02:24 +00:00
egullickson added
status
in-progress
and removed
status
backlog
labels 2026-02-06 16:13:01 +00:00
Author
Owner

Milestone: Grafana Alerting Rules and Documentation

Phase: Execution | Agent: Platform Agent | Status: PASS

Completed

  • Alert Rules (config/grafana/alerting/alert-rules.yml):
    • Error Rate Spike (critical) - fires when error rate > 5% over 5 minutes
    • Container Silence: mvp-backend (warning) - fires when no logs for 5 minutes
    • Container Silence: mvp-postgres (warning) - fires when no logs for 5 minutes
    • Container Silence: mvp-redis (warning) - fires when no logs for 5 minutes
    • 5xx Response Spike (critical) - fires when > 10 5xx responses in 5 minutes
  • Contact Point (config/grafana/alerting/contact-points.yml): Webhook placeholder for future email/Slack integration
  • Notification Policy (config/grafana/alerting/notification-policies.yml): Groups alerts by alertname and severity
  • Docker Compose: Added alerting provisioning volume mount to mvp-grafana service
  • Loki Datasource: Added stable uid: loki for alert rule references
  • Documentation: Updated docs/LOGGING.md with dashboard descriptions, alerting rules table, and expanded LogQL query reference

Files Changed

File Action
config/grafana/alerting/alert-rules.yml NEW
config/grafana/alerting/contact-points.yml NEW
config/grafana/alerting/notification-policies.yml NEW
config/grafana/datasources/loki.yml MODIFIED - added uid
docker-compose.yml MODIFIED - alerting volume mount
docs/LOGGING.md MODIFIED - dashboards, alerting, LogQL reference

Note

Alert files placed in config/grafana/alerting/ (not config/grafana/provisioning/alerting/) to avoid conflict with the existing dashboard provisioning mount that maps config/grafana/provisioning/ to /etc/grafana/provisioning/dashboards/.

Verdict: PASS | Next: Quality review

## Milestone: Grafana Alerting Rules and Documentation **Phase**: Execution | **Agent**: Platform Agent | **Status**: PASS ### Completed - **Alert Rules** (`config/grafana/alerting/alert-rules.yml`): - Error Rate Spike (critical) - fires when error rate > 5% over 5 minutes - Container Silence: mvp-backend (warning) - fires when no logs for 5 minutes - Container Silence: mvp-postgres (warning) - fires when no logs for 5 minutes - Container Silence: mvp-redis (warning) - fires when no logs for 5 minutes - 5xx Response Spike (critical) - fires when > 10 5xx responses in 5 minutes - **Contact Point** (`config/grafana/alerting/contact-points.yml`): Webhook placeholder for future email/Slack integration - **Notification Policy** (`config/grafana/alerting/notification-policies.yml`): Groups alerts by alertname and severity - **Docker Compose**: Added alerting provisioning volume mount to mvp-grafana service - **Loki Datasource**: Added stable `uid: loki` for alert rule references - **Documentation**: Updated `docs/LOGGING.md` with dashboard descriptions, alerting rules table, and expanded LogQL query reference ### Files Changed | File | Action | |------|--------| | `config/grafana/alerting/alert-rules.yml` | NEW | | `config/grafana/alerting/contact-points.yml` | NEW | | `config/grafana/alerting/notification-policies.yml` | NEW | | `config/grafana/datasources/loki.yml` | MODIFIED - added uid | | `docker-compose.yml` | MODIFIED - alerting volume mount | | `docs/LOGGING.md` | MODIFIED - dashboards, alerting, LogQL reference | ### Note Alert files placed in `config/grafana/alerting/` (not `config/grafana/provisioning/alerting/`) to avoid conflict with the existing dashboard provisioning mount that maps `config/grafana/provisioning/` to `/etc/grafana/provisioning/dashboards/`. *Verdict*: PASS | *Next*: Quality review
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: egullickson/motovaultpro#111