# Unified Logging System MotoVaultPro uses a unified logging system with centralized log aggregation. ## Overview - **Single Control**: One `LOG_LEVEL` environment variable controls all containers - **Correlation IDs**: `requestId` field traces requests across services - **Centralized Aggregation**: Grafana + Loki for log querying and visualization ## LOG_LEVEL Values | Level | Frontend | Backend | PostgreSQL | Redis | Traefik | |-------|----------|---------|------------|-------|---------| | DEBUG | debug | debug | all queries, 0ms | debug | DEBUG | | INFO | info | info | DDL only, 500ms | verbose | INFO | | WARN | warn | warn | errors, 1000ms | notice | WARN | | ERROR | error | error | errors only | warning | ERROR | ## Environment Defaults | Environment | LOG_LEVEL | Purpose | |-------------|-----------|---------| | Development | DEBUG | Full debugging locally | | Staging | DEBUG | Full debugging in staging | | Production | INFO | Standard production logging | ## Correlation IDs All logs include a `requestId` field (UUID v4) for tracing requests: - **Traefik**: Forwards X-Request-Id if present - **Backend**: Generates UUID if X-Request-Id missing, includes in all logs - **Frontend**: Includes requestId in API call logs ### Example Log Entry ```json { "level": "info", "time": "2024-01-15T10:30:00.000Z", "requestId": "550e8400-e29b-41d4-a716-446655440000", "msg": "Request processed", "method": "GET", "path": "/api/vehicles", "status": 200, "duration": 45 } ``` ## Grafana Access - URL: https://logs.motovaultpro.com - Default credentials: admin/admin (change on first login) ## Dashboards Four provisioned dashboards are available in the MotoVaultPro folder: | Dashboard | Purpose | Key Panels | |-----------|---------|------------| | Application Overview | System-wide health at a glance | Container log volume, error rate gauge, log level distribution, container health status, request count | | API Performance | Backend latency and throughput analysis | Request rate, response time percentiles (p50/p95/p99), status code distribution, slowest endpoints | | Error Investigation | Debugging and root cause analysis | Error log stream, errors by container/endpoint, stack trace viewer, correlation ID lookup, recent 5xx responses | | Infrastructure | Container-level logs and platform monitoring | Per-container throughput, PostgreSQL/Redis/Traefik/OCR logs, Loki ingestion rate | All dashboards refresh every 30 seconds and default to a 1-hour time window. Dashboard JSON files are in `config/grafana/dashboards/` and provisioned via `config/grafana/provisioning/dashboards.yml`. ## Alerting Rules Grafana Unified Alerting is configured with file-based provisioned rules. Alert rules are evaluated every 1 minute and must fire continuously for 5 minutes before triggering. | Alert | Severity | Condition | Description | |-------|----------|-----------|-------------| | Error Rate Spike | critical | Error rate > 5% over 5m | Fires when the percentage of error-level logs across all mvp-* containers exceeds 5% | | Container Silence: mvp-backend | warning | No logs for 5m | Fires when the backend container stops producing logs | | Container Silence: mvp-postgres | warning | No logs for 5m | Fires when the database container stops producing logs | | Container Silence: mvp-redis | warning | No logs for 5m | Fires when the cache container stops producing logs | | 5xx Response Spike | critical | > 10 5xx responses in 5m | Fires when the backend produces more than 10 HTTP 5xx responses | Alert configuration files are in `config/grafana/alerting/`: - `alert-rules.yml` - Alert rule definitions with LogQL queries - `contact-points.yml` - Notification endpoints (webhook placeholder for future email/Slack) - `notification-policies.yml` - Routing rules that group alerts by name and severity ## LogQL Query Reference ### Common Debugging Queries Query by requestId: ``` {container="mvp-backend"} |= "550e8400-e29b-41d4" ``` Query all errors: ``` {container=~"mvp-.*"} | json | level="error" ``` Query slow requests (>500ms): ``` {container="mvp-backend"} | json | msg="Request processed" | duration > 500 ``` ### Error Analysis Count errors per container over time: ``` sum by (container) (count_over_time({container=~"mvp-.*"} | json | level="error" [5m])) ``` Error rate as percentage: ``` sum(count_over_time({container=~"mvp-.*"} | json | level="error" [5m])) / sum(count_over_time({container=~"mvp-.*"} [5m])) * 100 ``` ### HTTP Status Analysis All 5xx responses: ``` {container="mvp-backend"} | json | msg="Request processed" | status >= 500 ``` Request count by status code: ``` sum by (status) (count_over_time({container="mvp-backend"} | json | msg="Request processed" [5m])) ``` ### Container-Specific Queries PostgreSQL errors: ``` {container="mvp-postgres"} |~ "ERROR|FATAL|PANIC" ``` Traefik access logs: ``` {container="mvp-traefik"} | json ``` OCR processing errors: ``` {container="mvp-ocr"} |~ "ERROR|Exception|Traceback" ``` ## Configuration Logging configuration is generated by `scripts/ci/generate-log-config.sh`: ```bash # Generate DEBUG level config ./scripts/ci/generate-log-config.sh DEBUG # Generate INFO level config ./scripts/ci/generate-log-config.sh INFO ``` This creates `.env.logging` which is sourced by docker-compose. ## Architecture ``` +-----------------------------------------------------------------------+ | CI/CD PIPELINE | | LOG_LEVEL --> generate-log-config.sh --> .env.logging | +-----------------------------------------------------------------------+ | v +-----------------------------------------------------------------------+ | APPLICATION LAYER | | Frontend Backend OCR Postgres Redis Traefik | | | | | | | | | | +---------+---------+---------+---------+---------+ | | | | | Docker Log Driver (json-file, 10m x 3) | +-----------------------------------------------------------------------+ | v Alloy --> Loki (30-day retention) --> Grafana ``` ## Troubleshooting ### Logs not appearing in Grafana 1. Check Alloy is running: `docker logs mvp-alloy` 2. Check Loki is healthy: `curl http://localhost:3100/ready` 3. Verify log rotation is not too aggressive ### Invalid LOG_LEVEL Both frontend and backend will warn and fall back to 'info' if an invalid LOG_LEVEL is provided.