# Unified Logging System

MotoVaultPro uses a unified logging system with centralized log aggregation.

## Overview

- **Single Control**: One `LOG_LEVEL` environment variable controls all containers
- **Correlation IDs**: `requestId` field traces requests across services
- **Centralized Aggregation**: Grafana + Loki for log querying and visualization

## LOG_LEVEL Values

| Level | Frontend | Backend | PostgreSQL | Redis | Traefik |
|-------|----------|---------|------------|-------|---------|
| DEBUG | debug | debug | all queries, 0ms | debug | DEBUG |
| INFO | info | info | DDL only, 500ms | verbose | INFO |
| WARN | warn | warn | errors, 1000ms | notice | WARN |
| ERROR | error | error | errors only | warning | ERROR |

## Environment Defaults

| Environment | LOG_LEVEL | Purpose |
|-------------|-----------|---------|
| Development | DEBUG | Full debugging locally |
| Staging | DEBUG | Full debugging in staging |
| Production | INFO | Standard production logging |

## Correlation IDs

All logs include a `requestId` field (UUID v4) for tracing requests:

- **Traefik**: Forwards X-Request-Id if present
- **Backend**: Generates UUID if X-Request-Id missing, includes in all logs
- **Frontend**: Includes requestId in API call logs

### Example Log Entry
```json
{
  "level": "info",
  "time": "2024-01-15T10:30:00.000Z",
  "requestId": "550e8400-e29b-41d4-a716-446655440000",
  "msg": "Request processed",
  "method": "GET",
  "path": "/api/vehicles",
  "status": 200,
  "duration": 45
}
```

## Grafana Access

- URL: https://logs.motovaultpro.com
- Default credentials: admin/admin (change on first login)

## Dashboards

Four provisioned dashboards are available in the MotoVaultPro folder:

| Dashboard | Purpose | Key Panels |
|-----------|---------|------------|
| Application Overview | System-wide health at a glance | Container log volume, error rate gauge, log level distribution, container health status, request count |
| API Performance | Backend latency and throughput analysis | Request rate, response time percentiles (p50/p95/p99), status code distribution, slowest endpoints |
| Error Investigation | Debugging and root cause analysis | Error log stream, errors by container/endpoint, stack trace viewer, correlation ID lookup, recent 5xx responses |
| Infrastructure | Container-level logs and platform monitoring | Per-container throughput, PostgreSQL/Redis/Traefik/OCR logs, Loki ingestion rate |

All dashboards refresh every 30 seconds and default to a 1-hour time window. Dashboard JSON files are in `config/grafana/dashboards/` and provisioned via `config/grafana/provisioning/dashboards.yml`.

## Alerting Rules

Grafana Unified Alerting is configured with file-based provisioned rules. Alert rules are evaluated every 1 minute and must fire continuously for 5 minutes before triggering.

| Alert | Severity | Condition | Description |
|-------|----------|-----------|-------------|
| Error Rate Spike | critical | Error rate > 5% over 5m | Fires when the percentage of error-level logs across all mvp-* containers exceeds 5% |
| Container Silence: mvp-backend | warning | No logs for 5m | Fires when the backend container stops producing logs |
| Container Silence: mvp-postgres | warning | No logs for 5m | Fires when the database container stops producing logs |
| Container Silence: mvp-redis | warning | No logs for 5m | Fires when the cache container stops producing logs |
| 5xx Response Spike | critical | > 10 5xx responses in 5m | Fires when the backend produces more than 10 HTTP 5xx responses |

Alert configuration files are in `config/grafana/alerting/`:
- `alert-rules.yml` - Alert rule definitions with LogQL queries
- `contact-points.yml` - Notification endpoints (webhook placeholder for future email/Slack)
- `notification-policies.yml` - Routing rules that group alerts by name and severity

## LogQL Query Reference

### Common Debugging Queries

Query by requestId:
```
{container="mvp-backend"} |= "550e8400-e29b-41d4"
```

Query all errors:
```
{container=~"mvp-.*"} | json | level="error"
```

Query slow requests (>500ms):
```
{container="mvp-backend"} | json | msg="Request processed" | duration > 500
```

### Error Analysis

Count errors per container over time:
```
sum by (container) (count_over_time({container=~"mvp-.*"} | json | level="error" [5m]))
```

Error rate as percentage:
```
sum(count_over_time({container=~"mvp-.*"} | json | level="error" [5m]))
  / sum(count_over_time({container=~"mvp-.*"} [5m])) * 100
```

### HTTP Status Analysis

All 5xx responses:
```
{container="mvp-backend"} | json | msg="Request processed" | status >= 500
```

Request count by status code:
```
sum by (status) (count_over_time({container="mvp-backend"} | json | msg="Request processed" [5m]))
```

### Container-Specific Queries

PostgreSQL errors:
```
{container="mvp-postgres"} |~ "ERROR|FATAL|PANIC"
```

Traefik access logs:
```
{container="mvp-traefik"} | json
```

OCR processing errors:
```
{container="mvp-ocr"} |~ "ERROR|Exception|Traceback"
```

## Configuration

Logging configuration is generated by `scripts/ci/generate-log-config.sh`:

```bash
# Generate DEBUG level config
./scripts/ci/generate-log-config.sh DEBUG

# Generate INFO level config
./scripts/ci/generate-log-config.sh INFO
```

This creates `.env.logging` which is sourced by docker-compose.

## Architecture

```
+-----------------------------------------------------------------------+
|                         CI/CD PIPELINE                                |
|  LOG_LEVEL --> generate-log-config.sh --> .env.logging                |
+-----------------------------------------------------------------------+
                                |
                                v
+-----------------------------------------------------------------------+
|                       APPLICATION LAYER                               |
|  Frontend   Backend    OCR      Postgres   Redis    Traefik           |
|      |         |         |         |         |         |              |
|      +---------+---------+---------+---------+---------+              |
|                          |                                            |
|                Docker Log Driver (json-file, 10m x 3)                 |
+-----------------------------------------------------------------------+
                           |
                           v
              Alloy --> Loki (30-day retention) --> Grafana
```

## Troubleshooting

### Logs not appearing in Grafana

1. Check Alloy is running: `docker logs mvp-alloy`
2. Check Loki is healthy: `curl http://localhost:3100/ready`
3. Verify log rotation is not too aggressive

### Invalid LOG_LEVEL

Both frontend and backend will warn and fall back to 'info' if an invalid LOG_LEVEL is provided.