Files
motovaultpro/docs/K8S-PHASE-3.md
Eric Gullickson 01a03263c9 Fixed Dark Mode
2025-07-28 09:39:17 -05:00

22 KiB

Phase 3: Production Deployment (Weeks 9-12)

This phase focuses on deploying the modernized application with proper production configurations, monitoring, backup strategies, and operational procedures.

Overview

Phase 3 transforms the development-ready Kubernetes application into a production-grade system with comprehensive monitoring, automated backup and recovery, secure ingress, and operational excellence. This phase ensures the system is ready for enterprise-level workloads with proper security, performance, and reliability guarantees.

Key Objectives

  • Production Kubernetes Deployment: Configure scalable, secure deployment manifests
  • Ingress and TLS Configuration: Secure external access with proper routing
  • Comprehensive Monitoring: Application and infrastructure observability
  • Backup and Disaster Recovery: Automated backup strategies and recovery procedures
  • Migration Execution: Seamless transition from legacy system

3.1 Kubernetes Deployment Configuration

Objective: Create production-ready Kubernetes manifests with proper resource management and high availability.

Application Deployment Configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: motovault-app
  namespace: motovault
  labels:
    app: motovault
    version: v1.0.0
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: motovault
  template:
    metadata:
      labels:
        app: motovault
        version: v1.0.0
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/path: "/metrics"
        prometheus.io/port: "8080"
    spec:
      serviceAccountName: motovault-service-account
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 2000
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - motovault
              topologyKey: kubernetes.io/hostname
          - weight: 50
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - motovault
              topologyKey: topology.kubernetes.io/zone
      containers:
      - name: motovault
        image: motovault:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        env:
        - name: ASPNETCORE_ENVIRONMENT
          value: "Production"
        - name: ASPNETCORE_URLS
          value: "http://+:8080"
        envFrom:
        - configMapRef:
            name: motovault-config
        - secretRef:
            name: motovault-secrets
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
            - ALL
        volumeMounts:
        - name: tmp-volume
          mountPath: /tmp
        - name: app-logs
          mountPath: /app/logs
      volumes:
      - name: tmp-volume
        emptyDir: {}
      - name: app-logs
        emptyDir: {}
      terminationGracePeriodSeconds: 30

---
apiVersion: v1
kind: Service
metadata:
  name: motovault-service
  namespace: motovault
  labels:
    app: motovault
spec:
  type: ClusterIP
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
    name: http
  selector:
    app: motovault

---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: motovault-pdb
  namespace: motovault
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: motovault

Horizontal Pod Autoscaler Configuration

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: motovault-hpa
  namespace: motovault
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: motovault-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60

Implementation Tasks

1. Create production namespace with security policies

apiVersion: v1
kind: Namespace
metadata:
  name: motovault
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

2. Configure resource quotas and limits

apiVersion: v1
kind: ResourceQuota
metadata:
  name: motovault-quota
  namespace: motovault
spec:
  hard:
    requests.cpu: "4"
    requests.memory: 8Gi
    limits.cpu: "8"
    limits.memory: 16Gi
    persistentvolumeclaims: "10"
    pods: "20"

3. Set up service accounts and RBAC

apiVersion: v1
kind: ServiceAccount
metadata:
  name: motovault-service-account
  namespace: motovault
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: motovault-role
  namespace: motovault
rules:
- apiGroups: [""]
  resources: ["configmaps", "secrets"]
  verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: motovault-rolebinding
  namespace: motovault
subjects:
- kind: ServiceAccount
  name: motovault-service-account
  namespace: motovault
roleRef:
  kind: Role
  name: motovault-role
  apiGroup: rbac.authorization.k8s.io

4. Configure pod anti-affinity for high availability

  • Spread pods across nodes and availability zones
  • Ensure no single point of failure
  • Optimize for both performance and availability

5. Implement rolling update strategy with zero downtime

  • Configure progressive rollout with health checks
  • Automatic rollback on failure
  • Canary deployment capabilities

3.2 Ingress and TLS Configuration

Objective: Configure secure external access with proper TLS termination and routing.

Ingress Configuration

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: motovault-ingress
  namespace: motovault
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/rate-limit: "100"
    nginx.ingress.kubernetes.io/rate-limit-window: "1m"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - motovault.example.com
    secretName: motovault-tls
  rules:
  - host: motovault.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: motovault-service
            port:
              number: 80

TLS Certificate Management

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@motovault.example.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: nginx

Implementation Tasks

1. Deploy cert-manager for automated TLS

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml

2. Configure Let's Encrypt for SSL certificates

  • Automated certificate provisioning and renewal
  • DNS-01 or HTTP-01 challenge configuration
  • Certificate monitoring and alerting

3. Set up WAF and DDoS protection

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: motovault-ingress-policy
  namespace: motovault
spec:
  podSelector:
    matchLabels:
      app: motovault
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: nginx-ingress
    ports:
    - protocol: TCP
      port: 8080

4. Configure rate limiting and security headers

  • Request rate limiting per IP
  • Security headers (HSTS, CSP, etc.)
  • Request size limitations

5. Set up health check endpoints for load balancer

  • Configure ingress health checks
  • Implement graceful degradation
  • Monitor certificate expiration

3.3 Monitoring and Observability Setup

Objective: Implement comprehensive monitoring, logging, and alerting for production operations.

Prometheus ServiceMonitor Configuration

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: motovault-metrics
  namespace: motovault
  labels:
    app: motovault
spec:
  selector:
    matchLabels:
      app: motovault
  endpoints:
  - port: http
    path: /metrics
    interval: 30s
    scrapeTimeout: 10s

Application Metrics Implementation

public class MetricsService
{
    private readonly Counter _httpRequestsTotal;
    private readonly Histogram _httpRequestDuration;
    private readonly Gauge _activeConnections;
    private readonly Counter _databaseOperationsTotal;
    private readonly Histogram _databaseOperationDuration;
    
    public MetricsService()
    {
        _httpRequestsTotal = Metrics.CreateCounter(
            "motovault_http_requests_total",
            "Total number of HTTP requests",
            new[] { "method", "endpoint", "status_code" });
            
        _httpRequestDuration = Metrics.CreateHistogram(
            "motovault_http_request_duration_seconds",
            "Duration of HTTP requests in seconds",
            new[] { "method", "endpoint" });
            
        _activeConnections = Metrics.CreateGauge(
            "motovault_active_connections",
            "Number of active database connections");
            
        _databaseOperationsTotal = Metrics.CreateCounter(
            "motovault_database_operations_total",
            "Total number of database operations",
            new[] { "operation", "table", "status" });
            
        _databaseOperationDuration = Metrics.CreateHistogram(
            "motovault_database_operation_duration_seconds",
            "Duration of database operations in seconds",
            new[] { "operation", "table" });
    }
    
    public void RecordHttpRequest(string method, string endpoint, int statusCode, double duration)
    {
        _httpRequestsTotal.WithLabels(method, endpoint, statusCode.ToString()).Inc();
        _httpRequestDuration.WithLabels(method, endpoint).Observe(duration);
    }
    
    public void RecordDatabaseOperation(string operation, string table, bool success, double duration)
    {
        var status = success ? "success" : "error";
        _databaseOperationsTotal.WithLabels(operation, table, status).Inc();
        _databaseOperationDuration.WithLabels(operation, table).Observe(duration);
    }
}

Grafana Dashboard Configuration

{
  "dashboard": {
    "title": "MotoVaultPro Application Dashboard",
    "panels": [
      {
        "title": "HTTP Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(motovault_http_requests_total[5m])",
            "legendFormat": "{{method}} {{endpoint}}"
          }
        ]
      },
      {
        "title": "Response Time Percentiles",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.50, rate(motovault_http_request_duration_seconds_bucket[5m]))",
            "legendFormat": "50th percentile"
          },
          {
            "expr": "histogram_quantile(0.95, rate(motovault_http_request_duration_seconds_bucket[5m]))",
            "legendFormat": "95th percentile"
          }
        ]
      },
      {
        "title": "Database Connection Pool",
        "type": "singlestat",
        "targets": [
          {
            "expr": "motovault_active_connections",
            "legendFormat": "Active Connections"
          }
        ]
      },
      {
        "title": "Error Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(motovault_http_requests_total{status_code=~\"5..\"}[5m])",
            "legendFormat": "5xx errors"
          }
        ]
      }
    ]
  }
}

Alert Manager Configuration

groups:
- name: motovault.rules
  rules:
  - alert: HighErrorRate
    expr: rate(motovault_http_requests_total{status_code=~"5.."}[5m]) > 0.1
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "High error rate detected"
      description: "Error rate is {{ $value }}% for the last 5 minutes"
      
  - alert: HighResponseTime
    expr: histogram_quantile(0.95, rate(motovault_http_request_duration_seconds_bucket[5m])) > 2
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High response time detected"
      description: "95th percentile response time is {{ $value }}s"
      
  - alert: DatabaseConnectionPoolExhaustion
    expr: motovault_active_connections > 80
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "Database connection pool nearly exhausted"
      description: "Active connections: {{ $value }}/100"
      
  - alert: PodCrashLooping
    expr: rate(kube_pod_container_status_restarts_total{namespace="motovault"}[15m]) > 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Pod is crash looping"
      description: "Pod {{ $labels.pod }} is restarting frequently"

Implementation Tasks

1. Deploy Prometheus and Grafana stack

kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/bundle.yaml

2. Configure application metrics endpoints

  • Add Prometheus metrics middleware
  • Implement custom business metrics
  • Configure metric collection intervals

3. Set up centralized logging with structured logs

builder.Services.AddLogging(loggingBuilder =>
{
    loggingBuilder.AddJsonConsole(options =>
    {
        options.JsonWriterOptions = new JsonWriterOptions { Indented = false };
        options.IncludeScopes = true;
        options.TimestampFormat = "yyyy-MM-ddTHH:mm:ss.fffZ";
    });
});

4. Create operational dashboards and alerts

  • Application performance dashboards
  • Infrastructure monitoring dashboards
  • Business metrics and KPIs
  • Alert routing and escalation

5. Implement distributed tracing

services.AddOpenTelemetry()
    .WithTracing(builder =>
    {
        builder
            .AddAspNetCoreInstrumentation()
            .AddNpgsql()
            .AddRedisInstrumentation()
            .AddJaegerExporter();
    });

3.4 Backup and Disaster Recovery

Objective: Implement comprehensive backup strategies and disaster recovery procedures.

Velero Backup Configuration

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: motovault-daily-backup
  namespace: velero
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM
  template:
    includedNamespaces:
    - motovault
    includedResources:
    - "*"
    storageLocation: default
    ttl: 720h0m0s  # 30 days
    snapshotVolumes: true

---
apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: motovault-weekly-backup
  namespace: velero
spec:
  schedule: "0 3 * * 0"  # Weekly on Sunday at 3 AM
  template:
    includedNamespaces:
    - motovault
    includedResources:
    - "*"
    storageLocation: default
    ttl: 2160h0m0s  # 90 days
    snapshotVolumes: true

Database Backup Strategy

#!/bin/bash
# Automated database backup script

BACKUP_DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="motovault_backup_${BACKUP_DATE}.sql"
S3_BUCKET="motovault-backups"

# Create database backup
kubectl exec -n motovault motovault-postgres-1 -- \
  pg_dump -U postgres motovault > "${BACKUP_FILE}"

# Compress backup
gzip "${BACKUP_FILE}"

# Upload to S3/MinIO
aws s3 cp "${BACKUP_FILE}.gz" "s3://${S3_BUCKET}/database/"

# Clean up local file
rm "${BACKUP_FILE}.gz"

# Retain only last 30 days of backups
aws s3api list-objects-v2 \
  --bucket "${S3_BUCKET}" \
  --prefix "database/" \
  --query 'Contents[?LastModified<=`'$(date -d "30 days ago" --iso-8601)'`].[Key]' \
  --output text | \
  xargs -I {} aws s3 rm "s3://${S3_BUCKET}/{}"

Disaster Recovery Procedures

#!/bin/bash
# Full system recovery script

BACKUP_DATE=$1
if [ -z "$BACKUP_DATE" ]; then
  echo "Usage: $0 <backup_date>"
  echo "Example: $0 20240120_020000"
  exit 1
fi

# Stop application
echo "Scaling down application..."
kubectl scale deployment motovault-app --replicas=0 -n motovault

# Restore database
echo "Restoring database from backup..."
aws s3 cp "s3://motovault-backups/database/database_backup_${BACKUP_DATE}.sql.gz" .
gunzip "database_backup_${BACKUP_DATE}.sql.gz"
kubectl exec -i motovault-postgres-1 -n motovault -- \
  psql -U postgres -d motovault < "database_backup_${BACKUP_DATE}.sql"

# Restore MinIO data
echo "Restoring MinIO data..."
aws s3 sync "s3://motovault-backups/minio/${BACKUP_DATE}/" /tmp/minio_restore/
mc mirror /tmp/minio_restore/ motovault-minio/motovault-files/

# Restart application
echo "Scaling up application..."
kubectl scale deployment motovault-app --replicas=3 -n motovault

# Verify health
echo "Waiting for application to be ready..."
kubectl wait --for=condition=ready pod -l app=motovault -n motovault --timeout=300s

echo "Recovery completed successfully"

Implementation Tasks

1. Deploy Velero for Kubernetes backup

velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.7.0 \
  --bucket motovault-backups \
  --backup-location-config region=us-west-2 \
  --snapshot-location-config region=us-west-2

2. Configure automated database backups

  • Point-in-time recovery setup
  • Incremental backup strategies
  • Cross-region backup replication

3. Implement MinIO backup synchronization

  • Automated file backup to external storage
  • Metadata backup and restoration
  • Verification of backup integrity

4. Create disaster recovery runbooks

  • Step-by-step recovery procedures
  • RTO/RPO definitions and testing
  • Contact information and escalation procedures

5. Set up backup monitoring and alerting

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: backup-alerts
spec:
  groups:
  - name: backup.rules
    rules:
    - alert: BackupFailed
      expr: velero_backup_failure_total > 0
      labels:
        severity: critical
      annotations:
        summary: "Backup operation failed"
        description: "Velero backup has failed"

Week-by-Week Breakdown

Week 9: Production Kubernetes Configuration

  • Days 1-2: Create production deployment manifests
  • Days 3-4: Configure HPA, PDB, and resource quotas
  • Days 5-7: Set up RBAC and security policies

Week 10: Ingress and TLS Setup

  • Days 1-2: Deploy and configure ingress controller
  • Days 3-4: Set up cert-manager and TLS certificates
  • Days 5-7: Configure security policies and rate limiting

Week 11: Monitoring and Observability

  • Days 1-3: Deploy Prometheus and Grafana stack
  • Days 4-5: Configure application metrics and dashboards
  • Days 6-7: Set up alerting and notification channels

Week 12: Backup and Migration Preparation

  • Days 1-3: Deploy and configure backup solutions
  • Days 4-5: Create migration scripts and procedures
  • Days 6-7: Execute migration dry runs and validation

Success Criteria

  • Production Kubernetes deployment with 99.9% availability
  • Secure ingress with automated TLS certificate management
  • Comprehensive monitoring with alerting
  • Automated backup and recovery procedures tested
  • Migration procedures validated and documented
  • Security policies and network controls implemented
  • Performance baselines established and monitored

Testing Requirements

Production Readiness Tests

  • Load testing under expected traffic patterns
  • Failover testing for all components
  • Security penetration testing
  • Backup and recovery validation

Performance Tests

  • Application response time under load
  • Database performance with connection pooling
  • Cache performance and hit ratios
  • Network latency and throughput

Security Tests

  • Container image vulnerability scanning
  • Network policy validation
  • Authentication and authorization testing
  • TLS configuration verification

Deliverables

  1. Production Deployment

    • Complete Kubernetes manifests
    • Security configurations
    • Monitoring and alerting setup
    • Backup and recovery procedures
  2. Documentation

    • Operational runbooks
    • Security procedures
    • Monitoring guides
    • Disaster recovery plans
  3. Migration Tools

    • Data migration scripts
    • Validation tools
    • Rollback procedures

Dependencies

  • Production Kubernetes cluster
  • External storage for backups
  • DNS management for ingress
  • Certificate authority for TLS
  • Monitoring infrastructure

Risks and Mitigations

Risk: Extended Downtime During Migration

Mitigation: Blue-green deployment strategy with comprehensive rollback plan

Risk: Data Integrity Issues

Mitigation: Extensive validation and parallel running during transition

Risk: Performance Degradation

Mitigation: Load testing and gradual traffic migration


Previous Phase: Phase 2: High Availability Infrastructure
Next Phase: Phase 4: Advanced Features and Optimization