Compare commits

..

9 Commits

Author SHA1 Message Date
88db803b6a Merge pull request 'feat: Add Grafana dashboards and alerting (#105)' (#112) from issue-105-add-grafana-dashboards into main
All checks were successful
Deploy to Staging / Build Images (push) Successful in 36s
Deploy to Staging / Deploy to Staging (push) Successful in 51s
Deploy to Staging / Verify Staging (push) Successful in 2m30s
Deploy to Staging / Notify Staging Ready (push) Successful in 7s
Deploy to Staging / Notify Staging Failure (push) Has been skipped
Reviewed-on: #112
2026-02-06 17:44:04 +00:00
Eric Gullickson
462d306783 fix: resolve staging deployment issues with Traefik, Loki, and Alloy (refs #105)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 1m21s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 48s
Deploy to Staging / Verify Staging (pull_request) Successful in 2m37s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
- Exclude blue-green.yml from staging Traefik by mounting dynamic-staging/
  directory (only grafana.yml + middleware.yml) instead of dynamic/ which
  contains production-only blue-green routing config
- Disable Loki healthcheck: distroless image has no /bin/sh so CMD-SHELL
  healthchecks cannot execute; Alloy and Grafana verify Loki connectivity
- Fix Alloy healthcheck: replace wget (not in image) with bash /dev/tcp
- Add Grafana staging domain override (logs.staging.motovaultpro.com)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 10:51:00 -06:00
Eric Gullickson
842b0eb945 docs: update config/CLAUDE.md with Grafana subdirectories (refs #111)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 34s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 2m36s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 10:32:58 -06:00
Eric Gullickson
4b2b318aff feat: add Grafana alerting rules and documentation (refs #111)
All checks were successful
Deploy to Staging / Build Images (pull_request) Successful in 36s
Deploy to Staging / Deploy to Staging (pull_request) Successful in 51s
Deploy to Staging / Verify Staging (pull_request) Successful in 2m36s
Deploy to Staging / Notify Staging Ready (pull_request) Successful in 8s
Deploy to Staging / Notify Staging Failure (pull_request) Has been skipped
Configure Grafana Unified Alerting with file-based provisioned alert
rules, contact points, and notification policies. Add stable UID to
Loki datasource for alert rule references. Update LOGGING.md with
dashboard descriptions, alerting rules table, and LogQL query reference.

Alert rules: Error Rate Spike (critical), Container Silence for
backend/postgres/redis (warning), 5xx Response Spike (critical).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 10:19:00 -06:00
Eric Gullickson
c891250946 feat: add Infrastructure Grafana dashboard (refs #110)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 10:11:38 -06:00
Eric Gullickson
0345e3976f feat: add Error Investigation Grafana dashboard (refs #109)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 09:54:52 -06:00
Eric Gullickson
9e6f130fa6 feat: add API Performance Grafana dashboard (refs #108)
Log-based dashboard with 6 panels: request rate, response time
distribution (p50/p95/p99), HTTP status code distribution, request
volume by endpoint, slowest endpoints, and status code breakdown.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 09:48:11 -06:00
Eric Gullickson
33e561e537 feat: add Application Overview Grafana dashboard (refs #107)
Adds file-provisioned dashboard with 5 panels:
- Container Log Volume Over Time (all 9 containers)
- Error Rate Across All Containers (percentage stat)
- Log Level Distribution Per Container (stacked bar chart)
- Container Health Status (green/red per container)
- Total Request Count Over Time (backend requests/min)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 08:24:08 -06:00
Eric Gullickson
6f1195d907 feat: add Grafana dashboard provisioning infrastructure (refs #106)
Add file-based dashboard provisioning config and mount dashboards
directory into Grafana container for auto-loading dashboard JSON files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 08:19:28 -06:00
15 changed files with 2757 additions and 7 deletions

View File

@@ -8,6 +8,8 @@
| `alloy/` | Grafana Alloy log collector config | Log collection pipeline |
| `deployment/` | Deployment environment configs | Deploy scripts, environment configs |
| `grafana/` | Grafana dashboards and datasources | Log visualization setup |
| `grafana/dashboards/` | Provisioned Grafana dashboard JSON files | Dashboard modifications |
| `grafana/provisioning/` | Grafana provisioning configs (dashboards, alerting) | Provisioning setup |
| `loki/` | Loki log storage config | Log storage, retention |
| `monitoring/` | Monitoring and alert rules | Alerting rules, health checks |
| `shared/` | Shared cross-service configuration | Cross-service settings |

View File

@@ -0,0 +1,210 @@
apiVersion: 1
groups:
- orgId: 1
name: MotoVaultPro Alerts
folder: MotoVaultPro
interval: 1m
rules:
# Error Rate Spike - Alert when error rate exceeds 5% over 5 minutes
- uid: mvp-error-rate-spike
title: Error Rate Spike
condition: D
data:
- refId: A
relativeTimeRange:
from: 600
to: 0
datasourceUid: loki
model:
refId: A
expr: 'sum(count_over_time({container=~"mvp-.*"} | json | level=`error` [5m]))'
queryType: instant
- refId: B
relativeTimeRange:
from: 600
to: 0
datasourceUid: loki
model:
refId: B
expr: 'sum(count_over_time({container=~"mvp-.*"} [5m]))'
queryType: instant
- refId: C
relativeTimeRange:
from: 0
to: 0
datasourceUid: __expr__
model:
refId: C
type: math
expression: '($A / $B) * 100'
- refId: D
relativeTimeRange:
from: 0
to: 0
datasourceUid: __expr__
model:
refId: D
type: threshold
expression: C
conditions:
- evaluator:
type: gt
params:
- 5
noDataState: OK
execErrState: Error
for: 5m
labels:
severity: critical
annotations:
summary: Error rate exceeds 5% over 5 minutes across all MotoVaultPro containers
description: Check the Error Investigation dashboard for details.
# Container Silence - mvp-backend
- uid: mvp-silence-backend
title: "Container Silence: mvp-backend"
condition: B
data:
- refId: A
relativeTimeRange:
from: 600
to: 0
datasourceUid: loki
model:
refId: A
expr: 'count_over_time({container="mvp-backend"}[5m])'
queryType: instant
- refId: B
relativeTimeRange:
from: 0
to: 0
datasourceUid: __expr__
model:
refId: B
type: threshold
expression: A
conditions:
- evaluator:
type: lt
params:
- 1
noDataState: Alerting
execErrState: Error
for: 5m
labels:
severity: warning
annotations:
summary: mvp-backend container has stopped producing logs
description: No logs received from mvp-backend for 5 minutes. The container may be down or stuck.
# Container Silence - mvp-postgres
- uid: mvp-silence-postgres
title: "Container Silence: mvp-postgres"
condition: B
data:
- refId: A
relativeTimeRange:
from: 600
to: 0
datasourceUid: loki
model:
refId: A
expr: 'count_over_time({container="mvp-postgres"}[5m])'
queryType: instant
- refId: B
relativeTimeRange:
from: 0
to: 0
datasourceUid: __expr__
model:
refId: B
type: threshold
expression: A
conditions:
- evaluator:
type: lt
params:
- 1
noDataState: Alerting
execErrState: Error
for: 5m
labels:
severity: warning
annotations:
summary: mvp-postgres container has stopped producing logs
description: No logs received from mvp-postgres for 5 minutes. The database container may be down.
# Container Silence - mvp-redis
- uid: mvp-silence-redis
title: "Container Silence: mvp-redis"
condition: B
data:
- refId: A
relativeTimeRange:
from: 600
to: 0
datasourceUid: loki
model:
refId: A
expr: 'count_over_time({container="mvp-redis"}[5m])'
queryType: instant
- refId: B
relativeTimeRange:
from: 0
to: 0
datasourceUid: __expr__
model:
refId: B
type: threshold
expression: A
conditions:
- evaluator:
type: lt
params:
- 1
noDataState: Alerting
execErrState: Error
for: 5m
labels:
severity: warning
annotations:
summary: mvp-redis container has stopped producing logs
description: No logs received from mvp-redis for 5 minutes. The cache container may be down.
# 5xx Spike - Alert when 5xx responses exceed threshold
- uid: mvp-5xx-spike
title: 5xx Response Spike
condition: B
data:
- refId: A
relativeTimeRange:
from: 600
to: 0
datasourceUid: loki
model:
refId: A
expr: 'sum(count_over_time({container="mvp-backend"} | json | msg=`Request processed` | status >= 500 [5m]))'
queryType: instant
- refId: B
relativeTimeRange:
from: 0
to: 0
datasourceUid: __expr__
model:
refId: B
type: threshold
expression: A
conditions:
- evaluator:
type: gt
params:
- 10
noDataState: OK
execErrState: Error
for: 5m
labels:
severity: critical
annotations:
summary: High rate of 5xx responses from mvp-backend
description: More than 10 HTTP 5xx responses in 5 minutes. Check the API Performance and Error Investigation dashboards.

View File

@@ -0,0 +1,12 @@
apiVersion: 1
contactPoints:
- orgId: 1
name: mvp-default
receivers:
- uid: mvp-webhook-placeholder
type: webhook
settings:
url: "https://example.com/mvp-webhook-placeholder"
httpMethod: POST
disableResolveMessage: false

View File

@@ -0,0 +1,11 @@
apiVersion: 1
policies:
- orgId: 1
receiver: mvp-default
group_by:
- alertname
- severity
group_wait: 30s
group_interval: 5m
repeat_interval: 4h

View File

@@ -0,0 +1,615 @@
{
"__inputs": [],
"__elements": {},
"__requires": [
{
"type": "grafana",
"id": "grafana",
"name": "Grafana",
"version": "12.4.0"
},
{
"type": "datasource",
"id": "loki",
"name": "Loki",
"version": "1.0.0"
}
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": false,
"fiscalYearStartMonth": 0,
"graphTooltip": 1,
"id": null,
"links": [],
"liveNow": false,
"panels": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "fixed",
"fixedColor": "blue"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "Requests / sec",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 15,
"gradientMode": "none",
"hideFrom": {
"tooltip": false,
"viz": false,
"legend": false
},
"lineInterpolation": "smooth",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
},
"unit": "reqps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 0
},
"id": 1,
"options": {
"legend": {
"calcs": [
"mean",
"max"
],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "sum(rate({container=\"mvp-backend\"} | json | msg=\"Request processed\" [1m]))",
"legendFormat": "Requests/sec",
"refId": "A"
}
],
"title": "Request Rate Over Time",
"type": "timeseries"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "Duration (ms)",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"tooltip": false,
"viz": false,
"legend": false
},
"lineInterpolation": "smooth",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 500
},
{
"color": "red",
"value": 1000
}
]
},
"unit": "ms"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 8
},
"id": 2,
"options": {
"legend": {
"calcs": [
"lastNotNull"
],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "quantile_over_time(0.50, {container=\"mvp-backend\"} | json | msg=\"Request processed\" | unwrap duration | __error__=\"\" [5m]) by ()",
"legendFormat": "p50",
"refId": "A"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "quantile_over_time(0.95, {container=\"mvp-backend\"} | json | msg=\"Request processed\" | unwrap duration | __error__=\"\" [5m]) by ()",
"legendFormat": "p95",
"refId": "B"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "quantile_over_time(0.99, {container=\"mvp-backend\"} | json | msg=\"Request processed\" | unwrap duration | __error__=\"\" [5m]) by ()",
"legendFormat": "p99",
"refId": "C"
}
],
"title": "Response Time Distribution (p50 / p95 / p99)",
"type": "timeseries"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 8,
"x": 0,
"y": 16
},
"id": 3,
"options": {
"legend": {
"displayMode": "list",
"placement": "right",
"showLegend": true,
"values": [
"percent"
]
},
"pieType": "donut",
"reduceOptions": {
"values": false,
"calcs": [
"sum"
],
"fields": ""
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"pluginVersion": "12.4.0",
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "sum by (status) (count_over_time({container=\"mvp-backend\"} | json | msg=\"Request processed\" [5m]))",
"legendFormat": "{{status}}",
"refId": "A"
}
],
"title": "HTTP Status Code Distribution",
"type": "piechart"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"fillOpacity": 80,
"gradientMode": "none",
"hideFrom": {
"tooltip": false,
"viz": false,
"legend": false
},
"lineWidth": 1,
"scaleDistribution": {
"type": "linear"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 16,
"x": 8,
"y": 16
},
"id": 4,
"options": {
"barRadius": 0,
"barWidth": 0.8,
"fullHighlight": false,
"groupWidth": 0.7,
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": false
},
"orientation": "horizontal",
"showValue": "auto",
"stacking": "none",
"tooltip": {
"mode": "single",
"sort": "none"
},
"xTickLabelRotation": 0,
"xTickLabelSpacing": 0
},
"pluginVersion": "12.4.0",
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "sum by (path) (count_over_time({container=\"mvp-backend\"} | json | msg=\"Request processed\" [5m]))",
"legendFormat": "{{path}}",
"refId": "A"
}
],
"title": "Request Volume by Endpoint",
"type": "barchart"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"custom": {
"align": "auto",
"cellOptions": {
"type": "auto"
},
"inspect": false
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 200
},
{
"color": "red",
"value": 500
}
]
},
"unit": "ms"
},
"overrides": [
{
"matcher": {
"id": "byName",
"options": "path"
},
"properties": [
{
"id": "custom.width",
"value": 300
}
]
}
]
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 24
},
"id": 5,
"options": {
"cellHeight": "sm",
"footer": {
"countRows": false,
"fields": "",
"reducer": [
"sum"
],
"show": false
},
"showHeader": true,
"sortBy": [
{
"desc": true,
"displayName": "Value"
}
]
},
"pluginVersion": "12.4.0",
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "topk(10, avg by (path) (avg_over_time({container=\"mvp-backend\"} | json | msg=\"Request processed\" | unwrap duration | __error__=\"\" [5m])))",
"legendFormat": "{{path}}",
"refId": "A"
}
],
"title": "Slowest Endpoints (Avg Duration)",
"type": "table"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"custom": {
"align": "auto",
"cellOptions": {
"type": "auto"
},
"inspect": false
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
}
},
"overrides": [
{
"matcher": {
"id": "byName",
"options": "path"
},
"properties": [
{
"id": "custom.width",
"value": 300
}
]
}
]
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 24
},
"id": 6,
"options": {
"cellHeight": "sm",
"footer": {
"countRows": false,
"fields": "",
"reducer": [
"sum"
],
"show": false
},
"showHeader": true,
"sortBy": [
{
"desc": true,
"displayName": "Value"
}
]
},
"pluginVersion": "12.4.0",
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "sum by (path, status) (count_over_time({container=\"mvp-backend\"} | json | msg=\"Request processed\" [5m]))",
"legendFormat": "{{path}} - {{status}}",
"refId": "A"
}
],
"title": "Status Code Breakdown by Endpoint",
"type": "table"
}
],
"refresh": "30s",
"schemaVersion": 39,
"tags": [
"api",
"performance",
"backend"
],
"templating": {
"list": [
{
"current": {
"selected": false,
"text": "Loki",
"value": "Loki"
},
"hide": 0,
"includeAll": false,
"label": "Datasource",
"multi": false,
"name": "datasource",
"options": [],
"query": "loki",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"type": "datasource"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {},
"timezone": "browser",
"title": "API Performance",
"uid": "api-performance",
"version": 1,
"weekStart": ""
}

View File

@@ -0,0 +1,545 @@
{
"__inputs": [],
"__elements": {},
"__requires": [
{
"type": "grafana",
"id": "grafana",
"name": "Grafana",
"version": "12.4.0"
},
{
"type": "datasource",
"id": "loki",
"name": "Loki",
"version": "1.0.0"
}
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": false,
"fiscalYearStartMonth": 0,
"graphTooltip": 1,
"id": null,
"links": [],
"liveNow": false,
"panels": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "Log Lines / min",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"tooltip": false,
"viz": false,
"legend": false
},
"lineInterpolation": "linear",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
},
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 0
},
"id": 1,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "sum by (container) (count_over_time({container=~\"mvp-.*\"}[1m]))",
"legendFormat": "{{container}}",
"refId": "A"
}
],
"title": "Container Log Volume Over Time",
"type": "timeseries"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 1
},
{
"color": "red",
"value": 5
}
]
},
"unit": "percent",
"decimals": 2
},
"overrides": []
},
"gridPos": {
"h": 6,
"w": 8,
"x": 0,
"y": 8
},
"id": 2,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"values": false,
"calcs": [
"lastNotNull"
],
"fields": ""
},
"textMode": "auto"
},
"pluginVersion": "12.4.0",
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "sum(count_over_time({container=~\"mvp-.*\"} | json | level=\"error\" [5m])) / sum(count_over_time({container=~\"mvp-.*\"}[5m])) * 100",
"refId": "A"
}
],
"title": "Error Rate Across All Containers",
"type": "stat"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"fillOpacity": 80,
"gradientMode": "none",
"hideFrom": {
"tooltip": false,
"viz": false,
"legend": false
},
"lineWidth": 1,
"scaleDistribution": {
"type": "linear"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 6,
"w": 16,
"x": 8,
"y": 8
},
"id": 3,
"options": {
"barRadius": 0,
"barWidth": 0.97,
"fullHighlight": false,
"groupWidth": 0.7,
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"orientation": "auto",
"showValue": "auto",
"stacking": "normal",
"tooltip": {
"mode": "multi",
"sort": "none"
},
"xTickLabelRotation": 0,
"xTickLabelSpacing": 0
},
"pluginVersion": "12.4.0",
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "sum by (container, level) (count_over_time({container=~\"mvp-.*\"} | json [5m]))",
"legendFormat": "{{level}}",
"refId": "A"
}
],
"title": "Log Level Distribution Per Container",
"type": "barchart"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "red",
"value": null
},
{
"color": "green",
"value": 1
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 24,
"x": 0,
"y": 14
},
"id": 4,
"options": {
"colorMode": "background",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "horizontal",
"reduceOptions": {
"values": false,
"calcs": [
"lastNotNull"
],
"fields": ""
},
"textMode": "name"
},
"pluginVersion": "12.4.0",
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "count_over_time({container=\"mvp-backend\"}[5m])",
"legendFormat": "mvp-backend",
"refId": "A"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "count_over_time({container=\"mvp-frontend\"}[5m])",
"legendFormat": "mvp-frontend",
"refId": "B"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "count_over_time({container=\"mvp-postgres\"}[5m])",
"legendFormat": "mvp-postgres",
"refId": "C"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "count_over_time({container=\"mvp-redis\"}[5m])",
"legendFormat": "mvp-redis",
"refId": "D"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "count_over_time({container=\"mvp-traefik\"}[5m])",
"legendFormat": "mvp-traefik",
"refId": "E"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "count_over_time({container=\"mvp-ocr\"}[5m])",
"legendFormat": "mvp-ocr",
"refId": "F"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "count_over_time({container=\"mvp-loki\"}[5m])",
"legendFormat": "mvp-loki",
"refId": "G"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "count_over_time({container=\"mvp-alloy\"}[5m])",
"legendFormat": "mvp-alloy",
"refId": "H"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "count_over_time({container=\"mvp-grafana\"}[5m])",
"legendFormat": "mvp-grafana",
"refId": "I"
}
],
"title": "Container Health Status",
"type": "stat"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "fixed",
"fixedColor": "blue"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "Requests / min",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"tooltip": false,
"viz": false,
"legend": false
},
"lineInterpolation": "linear",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
},
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 18
},
"id": 5,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "count_over_time({container=\"mvp-backend\"} | json | msg=\"Request processed\" [1m])",
"legendFormat": "Backend Requests",
"refId": "A"
}
],
"title": "Total Request Count Over Time",
"type": "timeseries"
}
],
"refresh": "30s",
"schemaVersion": 39,
"tags": [
"overview",
"logs",
"containers"
],
"templating": {
"list": [
{
"current": {
"selected": false,
"text": "Loki",
"value": "Loki"
},
"hide": 0,
"includeAll": false,
"label": "Datasource",
"multi": false,
"name": "datasource",
"options": [],
"query": "loki",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"type": "datasource"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {},
"timezone": "browser",
"title": "Application Overview",
"uid": "application-overview",
"version": 1,
"weekStart": ""
}

View File

@@ -0,0 +1,580 @@
{
"__inputs": [],
"__elements": {},
"__requires": [
{
"type": "grafana",
"id": "grafana",
"name": "Grafana",
"version": "12.4.0"
},
{
"type": "datasource",
"id": "loki",
"name": "Loki",
"version": "1.0.0"
}
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": false,
"fiscalYearStartMonth": 0,
"graphTooltip": 1,
"id": null,
"links": [],
"liveNow": false,
"panels": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 24,
"x": 0,
"y": 0
},
"id": 1,
"options": {
"showTime": true,
"showLabels": true,
"showCommonLabels": false,
"wrapLogMessage": true,
"prettifyLogMessage": false,
"enableLogDetails": true,
"dedupStrategy": "none",
"sortOrder": "Descending"
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "{container=~\"mvp-.*\"} | json | level=\"error\"",
"refId": "A"
}
],
"title": "Error Log Stream",
"type": "logs"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "fixed",
"fixedColor": "red"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "Errors / min",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 15,
"gradientMode": "none",
"hideFrom": {
"tooltip": false,
"viz": false,
"legend": false
},
"lineInterpolation": "smooth",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 1
}
]
},
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 10
},
"id": 2,
"options": {
"legend": {
"calcs": [
"sum",
"max"
],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "sum(count_over_time({container=~\"mvp-.*\"} | json | level=\"error\" [1m]))",
"legendFormat": "Errors/min",
"refId": "A"
}
],
"title": "Error Rate Over Time",
"type": "timeseries"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"fillOpacity": 80,
"gradientMode": "none",
"hideFrom": {
"tooltip": false,
"viz": false,
"legend": false
},
"lineWidth": 1,
"scaleDistribution": {
"type": "linear"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 18
},
"id": 3,
"options": {
"barRadius": 0,
"barWidth": 0.8,
"fullHighlight": false,
"groupWidth": 0.7,
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"orientation": "horizontal",
"showValue": "auto",
"stacking": "none",
"tooltip": {
"mode": "single",
"sort": "none"
},
"xTickLabelRotation": 0,
"xTickLabelSpacing": 0
},
"pluginVersion": "12.4.0",
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "sum by (container) (count_over_time({container=~\"mvp-.*\"} | json | level=\"error\" [5m]))",
"legendFormat": "{{container}}",
"refId": "A"
}
],
"title": "Errors by Container",
"type": "barchart"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"custom": {
"align": "auto",
"cellOptions": {
"type": "auto"
},
"inspect": false
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 1
}
]
}
},
"overrides": [
{
"matcher": {
"id": "byName",
"options": "path"
},
"properties": [
{
"id": "custom.width",
"value": 300
}
]
}
]
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 18
},
"id": 4,
"options": {
"cellHeight": "sm",
"footer": {
"countRows": false,
"fields": "",
"reducer": [
"sum"
],
"show": false
},
"showHeader": true,
"sortBy": [
{
"desc": true,
"displayName": "Value"
}
]
},
"pluginVersion": "12.4.0",
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "sum by (path) (count_over_time({container=\"mvp-backend\"} | json | level=\"error\" [5m]))",
"legendFormat": "{{path}}",
"refId": "A"
}
],
"title": "Errors by Endpoint",
"type": "table"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 24,
"x": 0,
"y": 26
},
"id": 5,
"options": {
"showTime": true,
"showLabels": true,
"showCommonLabels": false,
"wrapLogMessage": true,
"prettifyLogMessage": false,
"enableLogDetails": true,
"dedupStrategy": "none",
"sortOrder": "Descending"
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "{container=\"mvp-backend\"} | json | level=\"error\" | line_format \"{{.error}}\\n{{.stack}}\"",
"refId": "A"
}
],
"title": "Stack Trace Viewer",
"type": "logs"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 24,
"x": 0,
"y": 36
},
"id": 6,
"options": {
"showTime": true,
"showLabels": true,
"showCommonLabels": false,
"wrapLogMessage": true,
"prettifyLogMessage": false,
"enableLogDetails": true,
"dedupStrategy": "none",
"sortOrder": "Descending"
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "{container=\"mvp-backend\"} |= \"$requestId\"",
"refId": "A"
}
],
"title": "Correlation ID Lookup",
"type": "logs"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"custom": {
"align": "auto",
"cellOptions": {
"type": "auto"
},
"inspect": false
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "red",
"value": null
}
]
}
},
"overrides": [
{
"matcher": {
"id": "byName",
"options": "path"
},
"properties": [
{
"id": "custom.width",
"value": 250
}
]
},
{
"matcher": {
"id": "byName",
"options": "status"
},
"properties": [
{
"id": "custom.width",
"value": 80
}
]
}
]
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 46
},
"id": 7,
"options": {
"cellHeight": "sm",
"footer": {
"countRows": false,
"fields": "",
"reducer": [
"sum"
],
"show": false
},
"showHeader": true,
"sortBy": [
{
"desc": true,
"displayName": "Time"
}
]
},
"pluginVersion": "12.4.0",
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "{container=\"mvp-backend\"} | json | msg=\"Request processed\" | status >= 500",
"refId": "A"
}
],
"title": "Recent 5xx Responses",
"type": "table"
}
],
"refresh": "30s",
"schemaVersion": 39,
"tags": [
"errors",
"debugging",
"backend"
],
"templating": {
"list": [
{
"current": {
"selected": false,
"text": "Loki",
"value": "Loki"
},
"hide": 0,
"includeAll": false,
"label": "Datasource",
"multi": false,
"name": "datasource",
"options": [],
"query": "loki",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"type": "datasource"
},
{
"current": {
"selected": false,
"text": "",
"value": ""
},
"hide": 0,
"label": "Request ID",
"name": "requestId",
"options": [
{
"selected": true,
"text": "",
"value": ""
}
],
"query": "",
"skipUrlSync": false,
"type": "textbox"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {},
"timezone": "browser",
"title": "Error Investigation",
"uid": "error-investigation",
"version": 1,
"weekStart": ""
}

View File

@@ -0,0 +1,490 @@
{
"__inputs": [],
"__elements": {},
"__requires": [
{
"type": "grafana",
"id": "grafana",
"name": "Grafana",
"version": "12.4.0"
},
{
"type": "datasource",
"id": "loki",
"name": "Loki",
"version": "1.0.0"
}
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": false,
"fiscalYearStartMonth": 0,
"graphTooltip": 1,
"id": null,
"links": [],
"liveNow": false,
"panels": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "Log Lines / min",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"tooltip": false,
"viz": false,
"legend": false
},
"lineInterpolation": "linear",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
},
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 0
},
"id": 1,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "sum by (container) (rate({container=~\"mvp-.*\"}[1m]))",
"legendFormat": "{{container}}",
"refId": "A"
}
],
"title": "Per-Container Log Throughput",
"type": "timeseries"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 24,
"x": 0,
"y": 8
},
"id": 2,
"options": {
"showTime": true,
"showLabels": true,
"showCommonLabels": false,
"wrapLogMessage": true,
"prettifyLogMessage": false,
"enableLogDetails": true,
"dedupStrategy": "none",
"sortOrder": "Descending"
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "{container=\"mvp-postgres\"} |~ \"ERROR|WARNING|FATAL\"",
"refId": "A"
}
],
"title": "PostgreSQL Error/Warning Logs",
"type": "logs"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 24,
"x": 0,
"y": 18
},
"id": 3,
"options": {
"showTime": true,
"showLabels": true,
"showCommonLabels": false,
"wrapLogMessage": true,
"prettifyLogMessage": false,
"enableLogDetails": true,
"dedupStrategy": "none",
"sortOrder": "Descending"
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "{container=\"mvp-redis\"}",
"refId": "A"
}
],
"title": "Redis Connection and Command Logs",
"type": "logs"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 24,
"x": 0,
"y": 28
},
"id": 4,
"options": {
"showTime": true,
"showLabels": true,
"showCommonLabels": false,
"wrapLogMessage": true,
"prettifyLogMessage": false,
"enableLogDetails": true,
"dedupStrategy": "none",
"sortOrder": "Descending"
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "{container=\"mvp-traefik\"}",
"refId": "A"
}
],
"title": "Traefik Access Logs",
"type": "logs"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 24,
"x": 0,
"y": 38
},
"id": 5,
"options": {
"showTime": true,
"showLabels": true,
"showCommonLabels": false,
"wrapLogMessage": true,
"prettifyLogMessage": false,
"enableLogDetails": true,
"dedupStrategy": "none",
"sortOrder": "Descending"
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "{container=\"mvp-traefik\"} |~ \"level=error|err=\"",
"refId": "A"
}
],
"title": "Traefik Error Logs",
"type": "logs"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 24,
"x": 0,
"y": 48
},
"id": 6,
"options": {
"showTime": true,
"showLabels": true,
"showCommonLabels": false,
"wrapLogMessage": true,
"prettifyLogMessage": false,
"enableLogDetails": true,
"dedupStrategy": "none",
"sortOrder": "Descending"
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "{container=\"mvp-ocr\"}",
"refId": "A"
}
],
"title": "OCR Service Logs",
"type": "logs"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 24,
"x": 0,
"y": 58
},
"id": 7,
"options": {
"showTime": true,
"showLabels": true,
"showCommonLabels": false,
"wrapLogMessage": true,
"prettifyLogMessage": false,
"enableLogDetails": true,
"dedupStrategy": "none",
"sortOrder": "Descending"
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "{container=\"mvp-ocr\"} |~ \"ERROR|error|Exception|Traceback\"",
"refId": "A"
}
],
"title": "OCR Processing Errors",
"type": "logs"
},
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "fixed",
"fixedColor": "purple"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "Lines / min",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"tooltip": false,
"viz": false,
"legend": false
},
"lineInterpolation": "linear",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
},
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 68
},
"id": 8,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "${datasource}"
},
"expr": "sum(rate({container=\"mvp-loki\"}[1m]))",
"legendFormat": "Loki Lines/min",
"refId": "A"
}
],
"title": "Loki Ingestion Rate",
"type": "timeseries"
}
],
"refresh": "30s",
"schemaVersion": 39,
"tags": [
"infrastructure",
"containers",
"logs"
],
"templating": {
"list": [
{
"current": {
"selected": false,
"text": "Loki",
"value": "Loki"
},
"hide": 0,
"includeAll": false,
"label": "Datasource",
"multi": false,
"name": "datasource",
"options": [],
"query": "loki",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"type": "datasource"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {},
"timezone": "browser",
"title": "Infrastructure",
"uid": "infrastructure",
"version": 1,
"weekStart": ""
}

View File

@@ -2,6 +2,7 @@ apiVersion: 1
datasources:
- name: Loki
uid: loki
type: loki
access: proxy
url: http://mvp-loki:3100

View File

@@ -0,0 +1,11 @@
apiVersion: 1
providers:
- name: 'MotoVaultPro'
orgId: 1
folder: 'MotoVaultPro'
type: file
disableDeletion: false
updateIntervalSeconds: 30
allowUiUpdates: false
options:
path: /var/lib/grafana/dashboards

View File

@@ -0,0 +1,8 @@
http:
middlewares:
grafana-ipwhitelist:
ipAllowList:
sourceRange:
- "10.0.0.0/8"
- "172.16.0.0/12"
- "192.168.0.0/16"

View File

@@ -0,0 +1,173 @@
http:
middlewares:
# Security headers middleware
secure-headers:
headers:
accessControlAllowMethods:
- GET
- OPTIONS
- PUT
- POST
- DELETE
accessControlAllowOriginList:
- "https://admin.motovaultpro.com"
- "https://motovaultpro.com"
accessControlMaxAge: 100
addVaryHeader: true
browserXssFilter: true
contentTypeNosniff: true
forceSTSHeader: true
frameDeny: true
stsIncludeSubdomains: true
stsPreload: true
stsSeconds: 31536000
customRequestHeaders:
X-Forwarded-Proto: https
# CORS middleware for API endpoints
cors:
headers:
accessControlAllowCredentials: true
accessControlAllowHeaders:
- "Authorization"
- "Content-Type"
- "X-Requested-With"
- "X-Tenant-ID"
- "X-Request-Id"
accessControlAllowMethods:
- "GET"
- "POST"
- "PUT"
- "DELETE"
- "OPTIONS"
accessControlAllowOriginList:
- "https://admin.motovaultpro.com"
- "https://motovaultpro.com"
accessControlMaxAge: 100
# API authentication middleware
api-auth:
forwardAuth:
address: "http://admin-backend:3001/auth/verify"
authResponseHeaders:
- "X-Auth-User"
- "X-Auth-Roles"
- "X-Tenant-ID"
authRequestHeaders:
- "Authorization"
- "X-Tenant-ID"
trustForwardHeader: true
# Platform API authentication middleware
platform-auth:
forwardAuth:
address: "http://admin-backend:3001/auth/verify-platform"
authResponseHeaders:
- "X-Service-Name"
- "X-Auth-Scope"
authRequestHeaders:
- "X-API-Key"
- "Authorization"
trustForwardHeader: true
# Rate limiting middleware
rate-limit:
rateLimit:
burst: 100
average: 50
period: 1m
# Request/response size limits
size-limit:
buffering:
maxRequestBodyBytes: 26214400 # 25MB
maxResponseBodyBytes: 26214400 # 25MB
# IP whitelist for development (optional)
local-ips:
ipAllowList:
sourceRange:
- "127.0.0.1/32"
- "10.0.0.0/8"
- "172.16.0.0/12"
- "192.168.0.0/16"
# Advanced security headers for production
security-headers-strict:
headers:
accessControlAllowCredentials: false
accessControlAllowMethods:
- GET
- POST
- OPTIONS
accessControlAllowOriginList:
- "https://admin.motovaultpro.com"
- "https://motovaultpro.com"
browserXssFilter: true
contentTypeNosniff: true
customRequestHeaders:
X-Forwarded-Proto: https
customResponseHeaders:
X-Frame-Options: DENY
X-Content-Type-Options: nosniff
Referrer-Policy: strict-origin-when-cross-origin
Permissions-Policy: "geolocation=(), microphone=(), camera=()"
forceSTSHeader: true
frameDeny: true
stsIncludeSubdomains: true
stsPreload: true
stsSeconds: 31536000
# Circuit breaker for reliability
circuit-breaker:
circuitBreaker:
expression: "NetworkErrorRatio() > 0.3 || ResponseCodeRatio(500, 600, 0, 600) > 0.3"
checkPeriod: 30s
fallbackDuration: 10s
recoveryDuration: 30s
# Request retry for resilience
retry-policy:
retry:
attempts: 3
initialInterval: 100ms
# Compress responses for performance
compression:
compress: {}
# Health check middleware chain
health-check-chain:
chain:
middlewares:
- compression
- secure-headers
# API middleware chain
api-chain:
chain:
middlewares:
- compression
- security-headers-strict
- cors
- rate-limit
- api-auth
- retry-policy
# Platform API middleware chain
platform-chain:
chain:
middlewares:
- compression
- security-headers-strict
- rate-limit
- platform-auth
- circuit-breaker
- retry-policy
# Public frontend middleware chain
frontend-chain:
chain:
middlewares:
- compression
- secure-headers

View File

@@ -15,6 +15,8 @@ services:
mvp-traefik:
image: ${REGISTRY_MIRRORS:-git.motovaultpro.com/egullickson/mirrors}/traefik:v3.6
container_name: mvp-traefik-staging
volumes:
- ./config/traefik/dynamic-staging:/etc/traefik/dynamic:ro
labels:
- "traefik.http.routers.traefik-dashboard.rule=Host(`traefik.staging.motovaultpro.com`)"
@@ -79,6 +81,20 @@ services:
volumes:
- mvp_redis_staging_data:/data
# ========================================
# Grafana (Staging domain override)
# ========================================
mvp-grafana:
labels:
- "traefik.enable=true"
- "traefik.docker.network=motovaultpro_frontend"
- "traefik.http.routers.grafana.rule=Host(`logs.staging.motovaultpro.com`)"
- "traefik.http.routers.grafana.entrypoints=websecure"
- "traefik.http.routers.grafana.tls=true"
- "traefik.http.routers.grafana.tls.certresolver=letsencrypt"
- "traefik.http.routers.grafana.middlewares=grafana-ipwhitelist@file"
- "traefik.http.services.grafana.loadbalancer.server.port=3000"
# Staging-specific volumes (separate from production)
volumes:
mvp_postgres_staging_data:

View File

@@ -276,10 +276,9 @@ services:
networks:
- backend
healthcheck:
test: ["CMD-SHELL", "wget -q --spider http://localhost:3100/ready || exit 1"]
interval: 30s
timeout: 10s
retries: 3
# Loki 3.x uses a distroless image with no shell or HTTP client.
# Disable in-container healthcheck; Alloy and Grafana verify connectivity.
disable: true
logging:
driver: json-file
options:
@@ -305,7 +304,7 @@ services:
depends_on:
- mvp-loki
healthcheck:
test: ["CMD-SHELL", "wget -q --spider http://localhost:12345/ready || exit 1"]
test: ["CMD-SHELL", "bash -c 'echo > /dev/tcp/localhost/12345'"]
interval: 30s
timeout: 10s
retries: 3
@@ -325,6 +324,9 @@ services:
GF_USERS_ALLOW_SIGN_UP: "false"
volumes:
- ./config/grafana/datasources:/etc/grafana/provisioning/datasources:ro
- ./config/grafana/provisioning:/etc/grafana/provisioning/dashboards:ro
- ./config/grafana/alerting:/etc/grafana/provisioning/alerting:ro
- ./config/grafana/dashboards:/var/lib/grafana/dashboards:ro
- mvp_grafana_data:/var/lib/grafana
networks:
- backend

View File

@@ -52,7 +52,39 @@ All logs include a `requestId` field (UUID v4) for tracing requests:
- URL: https://logs.motovaultpro.com
- Default credentials: admin/admin (change on first login)
### Example LogQL Queries
## Dashboards
Four provisioned dashboards are available in the MotoVaultPro folder:
| Dashboard | Purpose | Key Panels |
|-----------|---------|------------|
| Application Overview | System-wide health at a glance | Container log volume, error rate gauge, log level distribution, container health status, request count |
| API Performance | Backend latency and throughput analysis | Request rate, response time percentiles (p50/p95/p99), status code distribution, slowest endpoints |
| Error Investigation | Debugging and root cause analysis | Error log stream, errors by container/endpoint, stack trace viewer, correlation ID lookup, recent 5xx responses |
| Infrastructure | Container-level logs and platform monitoring | Per-container throughput, PostgreSQL/Redis/Traefik/OCR logs, Loki ingestion rate |
All dashboards refresh every 30 seconds and default to a 1-hour time window. Dashboard JSON files are in `config/grafana/dashboards/` and provisioned via `config/grafana/provisioning/dashboards.yml`.
## Alerting Rules
Grafana Unified Alerting is configured with file-based provisioned rules. Alert rules are evaluated every 1 minute and must fire continuously for 5 minutes before triggering.
| Alert | Severity | Condition | Description |
|-------|----------|-----------|-------------|
| Error Rate Spike | critical | Error rate > 5% over 5m | Fires when the percentage of error-level logs across all mvp-* containers exceeds 5% |
| Container Silence: mvp-backend | warning | No logs for 5m | Fires when the backend container stops producing logs |
| Container Silence: mvp-postgres | warning | No logs for 5m | Fires when the database container stops producing logs |
| Container Silence: mvp-redis | warning | No logs for 5m | Fires when the cache container stops producing logs |
| 5xx Response Spike | critical | > 10 5xx responses in 5m | Fires when the backend produces more than 10 HTTP 5xx responses |
Alert configuration files are in `config/grafana/alerting/`:
- `alert-rules.yml` - Alert rule definitions with LogQL queries
- `contact-points.yml` - Notification endpoints (webhook placeholder for future email/Slack)
- `notification-policies.yml` - Routing rules that group alerts by name and severity
## LogQL Query Reference
### Common Debugging Queries
Query by requestId:
```
@@ -66,7 +98,49 @@ Query all errors:
Query slow requests (>500ms):
```
{container="mvp-backend"} | json | duration > 500
{container="mvp-backend"} | json | msg="Request processed" | duration > 500
```
### Error Analysis
Count errors per container over time:
```
sum by (container) (count_over_time({container=~"mvp-.*"} | json | level="error" [5m]))
```
Error rate as percentage:
```
sum(count_over_time({container=~"mvp-.*"} | json | level="error" [5m]))
/ sum(count_over_time({container=~"mvp-.*"} [5m])) * 100
```
### HTTP Status Analysis
All 5xx responses:
```
{container="mvp-backend"} | json | msg="Request processed" | status >= 500
```
Request count by status code:
```
sum by (status) (count_over_time({container="mvp-backend"} | json | msg="Request processed" [5m]))
```
### Container-Specific Queries
PostgreSQL errors:
```
{container="mvp-postgres"} |~ "ERROR|FATAL|PANIC"
```
Traefik access logs:
```
{container="mvp-traefik"} | json
```
OCR processing errors:
```
{container="mvp-ocr"} |~ "ERROR|Exception|Traceback"
```
## Configuration