Files
motovaultpro/K8S-REDESIGN.md
Eric Gullickson 17d27f4b92 k8s improvement
2025-09-17 20:47:42 -05:00

17 KiB

Docker Compose → Kubernetes Architecture Redesign

Overview

This document outlines the comprehensive redesign of MotoVaultPro's Docker Compose architecture to closely replicate a Kubernetes deployment pattern. The goal is to maintain all current functionality while preparing for seamless K8s migration and improving development experience.

Current Architecture Analysis

Existing Services (13 containers)

MVP Platform Services (Microservices)

  • mvp-platform-landing - Marketing/landing page (nginx)
  • mvp-platform-tenants - Multi-tenant management API (FastAPI)
  • mvp-platform-vehicles-api - Vehicle data API (FastAPI)
  • mvp-platform-vehicles-etl - Data processing pipeline (Python)
  • mvp-platform-vehicles-db - Vehicle data storage (PostgreSQL)
  • mvp-platform-vehicles-redis - Vehicle data cache (Redis)
  • mvp-platform-vehicles-mssql - Monthly ETL source (SQL Server)

Application Services (Modular Monolith)

  • admin-backend - Application API with feature capsules (Node.js)
  • admin-frontend - React SPA (nginx)
  • admin-postgres - Application database (PostgreSQL)
  • admin-redis - Application cache (Redis)
  • admin-minio - Object storage (MinIO)

Infrastructure

  • nginx-proxy - Load balancer and SSL termination
  • platform-postgres - Platform services database
  • platform-redis - Platform services cache

Current Limitations

  1. Single Network: All services on default network
  2. Manual Routing: nginx configuration requires manual updates
  3. Port Exposure: Many services expose ports directly
  4. Configuration: Environment variables scattered across services
  5. Service Discovery: Hard-coded service names
  6. Observability: Limited monitoring and debugging capabilities

Target Kubernetes-like Architecture

Network Segmentation

networks:
  frontend:
    driver: bridge
    labels:
      - "com.motovaultpro.network=frontend"
      - "com.motovaultpro.purpose=public-facing"

  backend:
    driver: bridge
    internal: true
    labels:
      - "com.motovaultpro.network=backend"
      - "com.motovaultpro.purpose=api-services"

  database:
    driver: bridge
    internal: true
    labels:
      - "com.motovaultpro.network=database"
      - "com.motovaultpro.purpose=data-layer"

  platform:
    driver: bridge
    internal: true
    labels:
      - "com.motovaultpro.network=platform"
      - "com.motovaultpro.purpose=microservices"

Service Placement Strategy

Network Services Purpose K8s Equivalent
frontend traefik, admin-frontend, mvp-platform-landing Public-facing services Public LoadBalancer services
backend admin-backend, mvp-platform-tenants, mvp-platform-vehicles-api API services ClusterIP services
database All PostgreSQL, Redis, MinIO Data persistence StatefulSets with PVCs
platform Platform microservices communication Internal service mesh Service mesh networking

Traefik Configuration

Core Traefik Setup

traefik:
  image: traefik:v3.0
  container_name: traefik
  networks:
    - frontend
    - backend
  ports:
    - "80:80"
    - "443:443"
    - "8080:8080"  # Dashboard
  volumes:
    - /var/run/docker.sock:/var/run/docker.sock:ro
    - ./config/traefik:/config:ro
    - ./certs:/certs:ro
  configs:
    - source: traefik-config
      target: /etc/traefik/traefik.yml
  labels:
    - "traefik.enable=true"
    - "traefik.http.routers.dashboard.rule=Host(`traefik.motovaultpro.local`)"
    - "traefik.http.routers.dashboard.tls=true"

Service Discovery Labels

Admin Frontend

admin-frontend:
  labels:
    - "traefik.enable=true"
    - "traefik.http.routers.admin-app.rule=Host(`admin.motovaultpro.com`)"
    - "traefik.http.routers.admin-app.tls=true"
    - "traefik.http.routers.admin-app.middlewares=secure-headers@file"
    - "traefik.http.services.admin-app.loadbalancer.server.port=3000"
    - "traefik.http.services.admin-app.loadbalancer.healthcheck.path=/"

Admin Backend

admin-backend:
  labels:
    - "traefik.enable=true"
    - "traefik.http.routers.admin-api.rule=Host(`admin.motovaultpro.com`) && PathPrefix(`/api`)"
    - "traefik.http.routers.admin-api.tls=true"
    - "traefik.http.routers.admin-api.middlewares=api-auth@file,cors@file"
    - "traefik.http.services.admin-api.loadbalancer.server.port=3001"
    - "traefik.http.services.admin-api.loadbalancer.healthcheck.path=/health"

Platform Landing

mvp-platform-landing:
  labels:
    - "traefik.enable=true"
    - "traefik.http.routers.landing.rule=Host(`motovaultpro.com`)"
    - "traefik.http.routers.landing.tls=true"
    - "traefik.http.routers.landing.middlewares=secure-headers@file"
    - "traefik.http.services.landing.loadbalancer.server.port=3000"

Middleware Configuration

# config/traefik/middleware.yml
http:
  middlewares:
    secure-headers:
      headers:
        accessControlAllowMethods:
          - GET
          - OPTIONS
          - PUT
          - POST
          - DELETE
        accessControlAllowOriginList:
          - "https://admin.motovaultpro.com"
          - "https://motovaultpro.com"
        accessControlMaxAge: 100
        addVaryHeader: true
        browserXssFilter: true
        contentTypeNosniff: true
        forceSTSHeader: true
        frameDeny: true
        stsIncludeSubdomains: true
        stsPreload: true
        stsSeconds: 31536000

    cors:
      headers:
        accessControlAllowCredentials: true
        accessControlAllowHeaders:
          - "Authorization"
          - "Content-Type"
          - "X-Requested-With"
        accessControlAllowMethods:
          - "GET"
          - "POST"
          - "PUT"
          - "DELETE"
          - "OPTIONS"
        accessControlAllowOriginList:
          - "https://admin.motovaultpro.com"
          - "https://motovaultpro.com"
        accessControlMaxAge: 100

    api-auth:
      forwardAuth:
        address: "http://admin-backend:3001/auth/verify"
        authResponseHeaders:
          - "X-Auth-User"
          - "X-Auth-Roles"

Enhanced Health Checks

Standardized Health Check Pattern

All services will implement:

  1. Startup Probe - Service initialization
  2. Readiness Probe - Service ready to accept traffic
  3. Liveness Probe - Service health monitoring
# Example: admin-backend
healthcheck:
  test: ["CMD", "node", "-e", "
    const http = require('http');
    const options = {
      hostname: 'localhost',
      port: 3001,
      path: '/health/ready',
      timeout: 2000
    };
    const req = http.request(options, (res) => {
      process.exit(res.statusCode === 200 ? 0 : 1);
    });
    req.on('error', () => process.exit(1));
    req.end();
  "]
  interval: 15s
  timeout: 5s
  retries: 3
  start_period: 45s

Health Endpoint Standards

All services must expose:

  • /health - Basic health check
  • /health/ready - Readiness probe
  • /health/live - Liveness probe

Configuration Management

Docker Configs (K8s ConfigMaps equivalent)

configs:
  traefik-config:
    file: ./config/traefik/traefik.yml
  traefik-middleware:
    file: ./config/traefik/middleware.yml
  app-config-production:
    file: ./config/app/production.yml
  platform-config:
    file: ./config/platform/services.yml

Docker Secrets (K8s Secrets equivalent)

secrets:
  auth0-client-secret:
    file: ./secrets/auth0-client-secret.txt
  database-passwords:
    file: ./secrets/database-passwords.txt
  platform-api-keys:
    file: ./secrets/platform-api-keys.txt
  ssl-certificates:
    file: ./secrets/ssl-certs.txt

Environment Configuration

# config/app/production.yml
database:
  host: admin-postgres
  port: 5432
  name: motovaultpro
  pool_size: 20

redis:
  host: admin-redis
  port: 6379
  db: 0

auth0:
  domain: ${AUTH0_DOMAIN}
  audience: ${AUTH0_AUDIENCE}

platform:
  vehicles_api:
    url: http://mvp-platform-vehicles-api:8000
    timeout: 30s
  tenants_api:
    url: http://mvp-platform-tenants:8000
    timeout: 15s

Resource Management

Resource Allocation Strategy

Tier 1: Critical Services

deploy:
  resources:
    limits: { memory: 2G, cpus: '2.0' }
    reservations: { memory: 1G, cpus: '1.0' }
  restart_policy:
    condition: on-failure
    max_attempts: 3

Tier 2: Supporting Services

deploy:
  resources:
    limits: { memory: 1G, cpus: '1.0' }
    reservations: { memory: 512M, cpus: '0.5' }
  restart_policy:
    condition: on-failure
    max_attempts: 3

Tier 3: Infrastructure Services

deploy:
  resources:
    limits: { memory: 512M, cpus: '0.5' }
    reservations: { memory: 256M, cpus: '0.25' }
  restart_policy:
    condition: unless-stopped

Service Tiers

Tier Services Resource Profile Priority
1 admin-backend, mvp-platform-vehicles-api, admin-postgres High Critical
2 admin-frontend, mvp-platform-tenants, mvp-platform-landing Medium Important
3 traefik, redis services, etl services Low Supporting

Migration Implementation Plan

Phase 1: Infrastructure Foundation (Week 1)

Objectives:

  • Implement Traefik service
  • Create network segmentation
  • Establish basic routing

Tasks:

  1. Create new network topology
  2. Add Traefik service with basic configuration
  3. Migrate nginx routing to Traefik labels
  4. Test SSL certificate handling
  5. Verify all existing functionality

Success Criteria:

  • All services accessible via original URLs
  • SSL certificates working
  • Health checks functional
  • No performance degradation

Phase 2: Service Discovery & Labels (Week 2)

Objectives:

  • Convert all services to label-based discovery
  • Implement middleware for security
  • Add service health monitoring

Tasks:

  1. Convert each service to Traefik labels
  2. Implement security middleware
  3. Add CORS and authentication middleware
  4. Test service discovery and failover
  5. Implement Traefik dashboard access

Success Criteria:

  • All services discovered automatically
  • Security middleware working
  • Dashboard accessible and functional
  • Mobile and desktop testing passes

Phase 3: Configuration Management (Week 3)

Objectives:

  • Implement Docker configs and secrets
  • Standardize environment configuration
  • Add monitoring and observability

Tasks:

  1. Move configuration to Docker configs
  2. Implement secrets management
  3. Standardize health check endpoints
  4. Add service metrics collection
  5. Implement log aggregation

Success Criteria:

  • No hardcoded secrets in compose files
  • Centralized configuration management
  • Enhanced monitoring capabilities
  • Improved debugging experience

Phase 4: Optimization & Documentation (Week 4)

Objectives:

  • Optimize resource allocation
  • Update development workflow
  • Complete documentation

Tasks:

  1. Implement resource limits and reservations
  2. Update Makefile with new commands
  3. Create troubleshooting documentation
  4. Performance testing and optimization
  5. Final validation of all features

Success Criteria:

  • Optimized resource usage
  • Updated development workflow
  • Complete documentation
  • All tests passing

Development Workflow Enhancements

New Makefile Commands

# Traefik specific commands
traefik-dashboard:
	@echo "Opening Traefik dashboard..."
	@open https://traefik.motovaultpro.local:8080

traefik-logs:
	@docker compose logs -f traefik

service-discovery:
	@echo "Discovered services:"
	@docker compose exec traefik traefik api --url=http://localhost:8080/api/rawdata

network-inspect:
	@echo "Network topology:"
	@docker network ls --filter name=motovaultpro
	@docker network inspect motovaultpro_frontend motovaultpro_backend motovaultpro_database motovaultpro_platform

health-check-all:
	@echo "Checking health of all services..."
	@docker compose ps --format "table {{.Service}}\t{{.Status}}\t{{.Health}}"

# Enhanced existing commands
logs:
	@echo "Available log targets: all, traefik, backend, frontend, platform"
	@docker compose logs -f $(filter-out $@,$(MAKECMDGOALS))

%:
	@:  # This catches the log target argument

Enhanced Development Features

Service Discovery Dashboard

  • Real-time service status
  • Route configuration visualization
  • Health check monitoring
  • Request tracing

Debugging Tools

  • Network topology inspection
  • Service dependency mapping
  • Configuration validation
  • Performance metrics

Testing Enhancements

  • Automated health checks
  • Service integration testing
  • Load balancing validation
  • SSL certificate verification

Observability & Monitoring

Metrics Collection

# Add to traefik configuration
metrics:
  prometheus:
    addEntryPointsLabels: true
    addServicesLabels: true
    addRoutersLabels: true

Logging Strategy

Centralized Logging

  • All services log to stdout/stderr
  • Traefik access logs
  • Service health check logs
  • Application performance logs

Log Levels

  • ERROR: Critical issues requiring attention
  • WARN: Potential issues or degraded performance
  • INFO: Normal operational messages
  • DEBUG: Detailed diagnostic information (dev only)

Health Monitoring

Service Health Dashboard

  • Real-time service status
  • Historical health trends
  • Alert notifications
  • Performance metrics

Security Enhancements

Network Security

Network Isolation

  • Frontend network: Public-facing services only
  • Backend network: API services with restricted access
  • Database network: Data services with no external access
  • Platform network: Microservices internal communication

Access Control

  • Traefik middleware for authentication
  • Service-to-service authentication
  • Network-level access restrictions
  • SSL/TLS encryption for all traffic

Secret Management

Secrets Rotation

  • Database passwords
  • API keys
  • SSL certificates
  • Auth0 client secrets

Access Policies

  • Least privilege principle
  • Service-specific secret access
  • Audit logging for secret access
  • Encrypted secret storage

Testing Strategy

Automated Testing

Integration Tests

  • Service discovery validation
  • Health check verification
  • SSL certificate testing
  • Load balancing functionality

Performance Tests

  • Service response times
  • Network latency measurement
  • Resource utilization monitoring
  • Concurrent user simulation

Security Tests

  • Network isolation verification
  • Authentication middleware testing
  • SSL/TLS configuration validation
  • Secret management verification

Manual Testing Procedures

Development Workflow

  1. Service startup validation
  2. Route accessibility testing
  3. Mobile/desktop compatibility
  4. Feature functionality verification
  5. Performance benchmarking

Deployment Validation

  1. Service discovery verification
  2. Health check validation
  3. SSL certificate functionality
  4. Load balancing behavior
  5. Failover testing

Migration Rollback Plan

Rollback Triggers

  • Service discovery failures
  • Performance degradation > 20%
  • SSL certificate issues
  • Health check failures
  • Mobile/desktop compatibility issues

Rollback Procedure

  1. Immediate: Switch DNS to backup nginx configuration
  2. Quick: Restore docker-compose.yml.backup
  3. Complete: Revert all configuration changes
  4. Verify: Run full test suite
  5. Monitor: Ensure service stability

Backup Strategy

  • Backup current docker-compose.yml
  • Backup nginx configuration
  • Export service configurations
  • Document current network topology
  • Save working environment variables

Success Metrics

Performance Metrics

  • Service Startup Time: < 30 seconds for all services
  • Request Response Time: < 500ms for API calls
  • Health Check Response: < 2 seconds
  • SSL Handshake Time: < 1 second

Reliability Metrics

  • Service Availability: 99.9% uptime
  • Health Check Success Rate: > 98%
  • Service Discovery Accuracy: 100%
  • Failover Time: < 10 seconds

Development Experience Metrics

  • Development Setup Time: < 5 minutes
  • Service Debug Time: < 2 minutes to identify issues
  • Configuration Change Deployment: < 1 minute
  • Test Suite Execution: < 10 minutes

Post-Migration Benefits

Immediate Benefits

  1. Enhanced Observability: Real-time service monitoring and debugging
  2. Improved Security: Network segmentation and middleware protection
  3. Better Development Experience: Automatic service discovery and routing
  4. Simplified Configuration: Centralized configuration management
  5. K8s Preparation: Architecture closely mirrors Kubernetes patterns

Long-term Benefits

  1. Easier K8s Migration: Direct translation to Kubernetes manifests
  2. Better Scalability: Load balancing and resource management
  3. Improved Maintainability: Standardized configuration patterns
  4. Enhanced Monitoring: Built-in metrics and health monitoring
  5. Professional Development Environment: Production-like local setup

Conclusion

This comprehensive redesign transforms the Docker Compose architecture to closely mirror Kubernetes deployment patterns while maintaining all existing functionality and improving the development experience. The phased migration approach ensures minimal disruption while delivering immediate benefits in observability, security, and maintainability.

The new architecture provides a solid foundation for future Kubernetes migration while enhancing current development workflows with modern service discovery, monitoring, and configuration management practices.