Files
motovaultpro/docs/LOGGING.md
Eric Gullickson 4b2b318aff
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 36s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 2m36s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
feat: add Grafana alerting rules and documentation (refs #111)
Configure Grafana Unified Alerting with file-based provisioned alert
rules, contact points, and notification policies. Add stable UID to
Loki datasource for alert rule references. Update LOGGING.md with
dashboard descriptions, alerting rules table, and LogQL query reference.

Alert rules: Error Rate Spike (critical), Container Silence for
backend/postgres/redis (warning), 5xx Response Spike (critical).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 10:19:00 -06:00

6.6 KiB

Unified Logging System

MotoVaultPro uses a unified logging system with centralized log aggregation.

Overview

  • Single Control: One LOG_LEVEL environment variable controls all containers
  • Correlation IDs: requestId field traces requests across services
  • Centralized Aggregation: Grafana + Loki for log querying and visualization

LOG_LEVEL Values

Level Frontend Backend PostgreSQL Redis Traefik
DEBUG debug debug all queries, 0ms debug DEBUG
INFO info info DDL only, 500ms verbose INFO
WARN warn warn errors, 1000ms notice WARN
ERROR error error errors only warning ERROR

Environment Defaults

Environment LOG_LEVEL Purpose
Development DEBUG Full debugging locally
Staging DEBUG Full debugging in staging
Production INFO Standard production logging

Correlation IDs

All logs include a requestId field (UUID v4) for tracing requests:

  • Traefik: Forwards X-Request-Id if present
  • Backend: Generates UUID if X-Request-Id missing, includes in all logs
  • Frontend: Includes requestId in API call logs

Example Log Entry

{
  "level": "info",
  "time": "2024-01-15T10:30:00.000Z",
  "requestId": "550e8400-e29b-41d4-a716-446655440000",
  "msg": "Request processed",
  "method": "GET",
  "path": "/api/vehicles",
  "status": 200,
  "duration": 45
}

Grafana Access

Dashboards

Four provisioned dashboards are available in the MotoVaultPro folder:

Dashboard Purpose Key Panels
Application Overview System-wide health at a glance Container log volume, error rate gauge, log level distribution, container health status, request count
API Performance Backend latency and throughput analysis Request rate, response time percentiles (p50/p95/p99), status code distribution, slowest endpoints
Error Investigation Debugging and root cause analysis Error log stream, errors by container/endpoint, stack trace viewer, correlation ID lookup, recent 5xx responses
Infrastructure Container-level logs and platform monitoring Per-container throughput, PostgreSQL/Redis/Traefik/OCR logs, Loki ingestion rate

All dashboards refresh every 30 seconds and default to a 1-hour time window. Dashboard JSON files are in config/grafana/dashboards/ and provisioned via config/grafana/provisioning/dashboards.yml.

Alerting Rules

Grafana Unified Alerting is configured with file-based provisioned rules. Alert rules are evaluated every 1 minute and must fire continuously for 5 minutes before triggering.

Alert Severity Condition Description
Error Rate Spike critical Error rate > 5% over 5m Fires when the percentage of error-level logs across all mvp-* containers exceeds 5%
Container Silence: mvp-backend warning No logs for 5m Fires when the backend container stops producing logs
Container Silence: mvp-postgres warning No logs for 5m Fires when the database container stops producing logs
Container Silence: mvp-redis warning No logs for 5m Fires when the cache container stops producing logs
5xx Response Spike critical > 10 5xx responses in 5m Fires when the backend produces more than 10 HTTP 5xx responses

Alert configuration files are in config/grafana/alerting/:

  • alert-rules.yml - Alert rule definitions with LogQL queries
  • contact-points.yml - Notification endpoints (webhook placeholder for future email/Slack)
  • notification-policies.yml - Routing rules that group alerts by name and severity

LogQL Query Reference

Common Debugging Queries

Query by requestId:

{container="mvp-backend"} |= "550e8400-e29b-41d4"

Query all errors:

{container=~"mvp-.*"} | json | level="error"

Query slow requests (>500ms):

{container="mvp-backend"} | json | msg="Request processed" | duration > 500

Error Analysis

Count errors per container over time:

sum by (container) (count_over_time({container=~"mvp-.*"} | json | level="error" [5m]))

Error rate as percentage:

sum(count_over_time({container=~"mvp-.*"} | json | level="error" [5m]))
  / sum(count_over_time({container=~"mvp-.*"} [5m])) * 100

HTTP Status Analysis

All 5xx responses:

{container="mvp-backend"} | json | msg="Request processed" | status >= 500

Request count by status code:

sum by (status) (count_over_time({container="mvp-backend"} | json | msg="Request processed" [5m]))

Container-Specific Queries

PostgreSQL errors:

{container="mvp-postgres"} |~ "ERROR|FATAL|PANIC"

Traefik access logs:

{container="mvp-traefik"} | json

OCR processing errors:

{container="mvp-ocr"} |~ "ERROR|Exception|Traceback"

Configuration

Logging configuration is generated by scripts/ci/generate-log-config.sh:

# Generate DEBUG level config
./scripts/ci/generate-log-config.sh DEBUG

# Generate INFO level config
./scripts/ci/generate-log-config.sh INFO

This creates .env.logging which is sourced by docker-compose.

Architecture

+-----------------------------------------------------------------------+
|                         CI/CD PIPELINE                                |
|  LOG_LEVEL --> generate-log-config.sh --> .env.logging                |
+-----------------------------------------------------------------------+
                                |
                                v
+-----------------------------------------------------------------------+
|                       APPLICATION LAYER                               |
|  Frontend   Backend    OCR      Postgres   Redis    Traefik           |
|      |         |         |         |         |         |              |
|      +---------+---------+---------+---------+---------+              |
|                          |                                            |
|                Docker Log Driver (json-file, 10m x 3)                 |
+-----------------------------------------------------------------------+
                           |
                           v
              Alloy --> Loki (30-day retention) --> Grafana

Troubleshooting

Logs not appearing in Grafana

  1. Check Alloy is running: docker logs mvp-alloy
  2. Check Loki is healthy: curl http://localhost:3100/ready
  3. Verify log rotation is not too aggressive

Invalid LOG_LEVEL

Both frontend and backend will warn and fall back to 'info' if an invalid LOG_LEVEL is provided.