Files
motovaultpro/K8S-PHASE-2.md
Eric Gullickson 4391cf11ed Architecture Docs
2025-07-28 08:43:00 -05:00

21 KiB

Phase 2: High Availability Infrastructure (Weeks 5-8)

This phase focuses on implementing the supporting infrastructure required for high availability, including MinIO clusters, PostgreSQL HA setup, Redis clusters, and file storage abstraction.

Overview

Phase 2 transforms MotoVaultPro's supporting infrastructure from single-instance services to highly available, distributed systems. This phase establishes the foundation for true high availability by eliminating all single points of failure in the data layer.

Key Objectives

  • MinIO High Availability: Deploy distributed object storage with erasure coding
  • File Storage Abstraction: Create unified interface for file operations
  • PostgreSQL HA: Implement primary/replica configuration with automated failover
  • Redis Cluster: Deploy distributed caching and session storage
  • Data Migration: Seamless transition from local storage to distributed systems

2.1 MinIO High Availability Setup

Objective: Deploy a highly available MinIO cluster for file storage with automatic failover.

Architecture Overview: MinIO will be deployed as a distributed cluster with erasure coding for data protection and automatic healing capabilities.

MinIO Cluster Configuration

# MinIO Tenant Configuration
apiVersion: minio.min.io/v2
kind: Tenant
metadata:
  name: motovault-minio
  namespace: motovault
spec:
  image: minio/minio:RELEASE.2024-01-16T16-07-38Z
  creationDate: 2024-01-20T10:00:00Z
  pools:
  - servers: 4
    name: pool-0
    volumesPerServer: 4
    volumeClaimTemplate:
      metadata:
        name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
        storageClassName: fast-ssd
  mountPath: /export
  subPath: /data
  requestAutoCert: false
  certConfig:
    commonName: ""
    organizationName: []
    dnsNames: []
  console:
    image: minio/console:v0.22.5
    replicas: 2
    consoleSecret:
      name: motovault-minio-console-secret
  configuration:
    name: motovault-minio-config

Implementation Tasks

1. Deploy MinIO Operator

kubectl apply -k "github.com/minio/operator/resources"

2. Create MinIO cluster configuration with erasure coding

  • Configure 4+ nodes for optimal erasure coding
  • Set up data protection with automatic healing
  • Configure storage classes for performance

3. Configure backup policies for disaster recovery

apiVersion: v1
kind: ConfigMap
metadata:
  name: minio-backup-policy
data:
  backup-policy.json: |
    {
      "rules": [
        {
          "id": "motovault-backup",
          "status": "Enabled",
          "transition": {
            "days": 30,
            "storage_class": "GLACIER"
          }
        }
      ]
    }

4. Set up monitoring with Prometheus metrics

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: minio-metrics
spec:
  selector:
    matchLabels:
      app: minio
  endpoints:
  - port: http-minio
    path: /minio/v2/metrics/cluster

5. Create service endpoints for application connectivity

apiVersion: v1
kind: Service
metadata:
  name: minio-service
spec:
  selector:
    app: minio
  ports:
  - name: http
    port: 9000
    targetPort: 9000
  - name: console
    port: 9001
    targetPort: 9001

MinIO High Availability Features

  • Erasure Coding: Data is split across multiple drives with parity for automatic healing
  • Distributed Architecture: No single point of failure
  • Automatic Healing: Corrupted data is automatically detected and repaired
  • Load Balancing: Built-in load balancing across cluster nodes
  • Bucket Policies: Fine-grained access control for different data types

2.2 File Storage Abstraction Implementation

Objective: Create an abstraction layer that allows seamless switching between local filesystem and MinIO object storage.

Current State:

  • Direct filesystem operations throughout the application
  • File paths hardcoded in various controllers and services
  • No abstraction for different storage backends

Target State:

  • Unified file storage interface
  • Pluggable storage implementations
  • Transparent migration between storage types

Implementation Tasks

1. Define storage abstraction interface

public interface IFileStorageService
{
    Task<string> UploadFileAsync(Stream fileStream, string fileName, string contentType, CancellationToken cancellationToken = default);
    Task<Stream> DownloadFileAsync(string fileId, CancellationToken cancellationToken = default);
    Task<bool> DeleteFileAsync(string fileId, CancellationToken cancellationToken = default);
    Task<FileMetadata> GetFileMetadataAsync(string fileId, CancellationToken cancellationToken = default);
    Task<IEnumerable<FileMetadata>> ListFilesAsync(string prefix = null, CancellationToken cancellationToken = default);
    Task<string> GeneratePresignedUrlAsync(string fileId, TimeSpan expiration, CancellationToken cancellationToken = default);
}

public class FileMetadata
{
    public string Id { get; set; }
    public string FileName { get; set; }
    public string ContentType { get; set; }
    public long Size { get; set; }
    public DateTime CreatedDate { get; set; }
    public DateTime ModifiedDate { get; set; }
    public Dictionary<string, string> Tags { get; set; }
}

2. Implement MinIO storage service

public class MinIOFileStorageService : IFileStorageService
{
    private readonly IMinioClient _minioClient;
    private readonly ILogger<MinIOFileStorageService> _logger;
    private readonly string _bucketName;
    
    public MinIOFileStorageService(IMinioClient minioClient, IConfiguration configuration, ILogger<MinIOFileStorageService> logger)
    {
        _minioClient = minioClient;
        _logger = logger;
        _bucketName = configuration["MinIO:BucketName"] ?? "motovault-files";
    }
    
    public async Task<string> UploadFileAsync(Stream fileStream, string fileName, string contentType, CancellationToken cancellationToken = default)
    {
        var fileId = $"{Guid.NewGuid()}/{fileName}";
        
        try
        {
            await _minioClient.PutObjectAsync(new PutObjectArgs()
                .WithBucket(_bucketName)
                .WithObject(fileId)
                .WithStreamData(fileStream)
                .WithObjectSize(fileStream.Length)
                .WithContentType(contentType)
                .WithHeaders(new Dictionary<string, string>
                {
                    ["X-Amz-Meta-Original-Name"] = fileName,
                    ["X-Amz-Meta-Upload-Date"] = DateTime.UtcNow.ToString("O")
                }), cancellationToken);
            
            _logger.LogInformation("File uploaded successfully: {FileId}", fileId);
            return fileId;
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "Failed to upload file: {FileName}", fileName);
            throw;
        }
    }
    
    public async Task<Stream> DownloadFileAsync(string fileId, CancellationToken cancellationToken = default)
    {
        try
        {
            var memoryStream = new MemoryStream();
            await _minioClient.GetObjectAsync(new GetObjectArgs()
                .WithBucket(_bucketName)
                .WithObject(fileId)
                .WithCallbackStream(stream => stream.CopyTo(memoryStream)), cancellationToken);
            
            memoryStream.Position = 0;
            return memoryStream;
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "Failed to download file: {FileId}", fileId);
            throw;
        }
    }
    
    // Additional method implementations...
}

3. Create fallback storage service for graceful degradation

public class FallbackFileStorageService : IFileStorageService
{
    private readonly IFileStorageService _primaryService;
    private readonly IFileStorageService _fallbackService;
    private readonly ILogger<FallbackFileStorageService> _logger;
    
    public FallbackFileStorageService(
        IFileStorageService primaryService,
        IFileStorageService fallbackService,
        ILogger<FallbackFileStorageService> logger)
    {
        _primaryService = primaryService;
        _fallbackService = fallbackService;
        _logger = logger;
    }
    
    public async Task<string> UploadFileAsync(Stream fileStream, string fileName, string contentType, CancellationToken cancellationToken = default)
    {
        try
        {
            return await _primaryService.UploadFileAsync(fileStream, fileName, contentType, cancellationToken);
        }
        catch (Exception ex)
        {
            _logger.LogWarning(ex, "Primary storage failed, falling back to secondary storage");
            fileStream.Position = 0; // Reset stream position
            return await _fallbackService.UploadFileAsync(fileStream, fileName, contentType, cancellationToken);
        }
    }
    
    // Implementation with automatic fallback logic for other methods...
}

4. Update all file operations to use the abstraction layer

  • Replace direct File.WriteAllBytes, File.ReadAllBytes calls
  • Update all controllers to use IFileStorageService
  • Modify attachment handling in vehicle records

5. Implement file migration utility for existing local files

public class FileMigrationService
{
    private readonly IFileStorageService _targetStorage;
    private readonly ILogger<FileMigrationService> _logger;
    
    public async Task<MigrationResult> MigrateLocalFilesAsync(string localPath)
    {
        var result = new MigrationResult();
        var files = Directory.GetFiles(localPath, "*", SearchOption.AllDirectories);
        
        foreach (var filePath in files)
        {
            try
            {
                using var fileStream = File.OpenRead(filePath);
                var fileName = Path.GetFileName(filePath);
                var contentType = GetContentType(fileName);
                
                var fileId = await _targetStorage.UploadFileAsync(fileStream, fileName, contentType);
                result.ProcessedFiles.Add(new MigratedFile
                {
                    OriginalPath = filePath,
                    NewFileId = fileId,
                    Success = true
                });
            }
            catch (Exception ex)
            {
                _logger.LogError(ex, "Failed to migrate file: {FilePath}", filePath);
                result.ProcessedFiles.Add(new MigratedFile
                {
                    OriginalPath = filePath,
                    Success = false,
                    Error = ex.Message
                });
            }
        }
        
        return result;
    }
}

2.3 PostgreSQL High Availability Configuration

Objective: Set up a PostgreSQL cluster with automatic failover and read replicas.

Architecture Overview: PostgreSQL will be deployed using an operator (like CloudNativePG or Postgres Operator) to provide automated failover, backup, and scaling capabilities.

PostgreSQL Cluster Configuration

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: motovault-postgres
  namespace: motovault
spec:
  instances: 3
  primaryUpdateStrategy: unsupervised
  
  postgresql:
    parameters:
      max_connections: "200"
      shared_buffers: "256MB"
      effective_cache_size: "1GB"
      maintenance_work_mem: "64MB"
      checkpoint_completion_target: "0.9"
      wal_buffers: "16MB"
      default_statistics_target: "100"
      random_page_cost: "1.1"
      effective_io_concurrency: "200"
      
  resources:
    requests:
      memory: "2Gi"
      cpu: "1000m"
    limits:
      memory: "4Gi"
      cpu: "2000m"
      
  storage:
    size: "100Gi"
    storageClass: "fast-ssd"
    
  monitoring:
    enabled: true
    
  backup:
    retentionPolicy: "30d"
    barmanObjectStore:
      destinationPath: "s3://motovault-backups/postgres"
      s3Credentials:
        accessKeyId:
          name: postgres-backup-credentials
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: postgres-backup-credentials
          key: SECRET_ACCESS_KEY
      wal:
        retention: "5d"
      data:
        retention: "30d"
        jobs: 1

Implementation Tasks

kubectl apply -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.20/releases/cnpg-1.20.1.yaml

2. Configure cluster with primary/replica setup

  • 3-node cluster with automatic failover
  • Read-write split capability
  • Streaming replication configuration

3. Set up automated backups to MinIO or external storage

apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
  name: motovault-postgres-backup
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM
  backupOwnerReference: self
  cluster:
    name: motovault-postgres

4. Implement connection pooling with PgBouncer

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pgbouncer
spec:
  replicas: 2
  selector:
    matchLabels:
      app: pgbouncer
  template:
    spec:
      containers:
      - name: pgbouncer
        image: pgbouncer/pgbouncer:latest
        env:
        - name: DATABASES_HOST
          value: motovault-postgres-rw
        - name: DATABASES_PORT
          value: "5432"
        - name: DATABASES_DATABASE
          value: motovault
        - name: POOL_MODE
          value: session
        - name: MAX_CLIENT_CONN
          value: "1000"
        - name: DEFAULT_POOL_SIZE
          value: "25"

5. Configure monitoring and alerting for database health

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: postgres-metrics
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: cloudnative-pg
  endpoints:
  - port: metrics
    path: /metrics

2.4 Redis Cluster for Session Management

Objective: Implement distributed session storage and caching using Redis cluster.

Current State:

  • In-memory session storage tied to individual application instances
  • No distributed caching for expensive operations
  • Configuration and translation data loaded on each application start

Target State:

  • Redis cluster for distributed session storage
  • Centralized caching for frequently accessed data
  • High availability with automatic failover

Redis Cluster Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-cluster-config
  namespace: motovault
data:
  redis.conf: |
    cluster-enabled yes
    cluster-require-full-coverage no
    cluster-node-timeout 15000
    cluster-config-file /data/nodes.conf
    cluster-migration-barrier 1
    appendonly yes
    appendfsync everysec
    save 900 1
    save 300 10
    save 60 10000

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis-cluster
  namespace: motovault
spec:
  serviceName: redis-cluster
  replicas: 6
  selector:
    matchLabels:
      app: redis-cluster
  template:
    metadata:
      labels:
        app: redis-cluster
    spec:
      containers:
      - name: redis
        image: redis:7-alpine
        command:
        - redis-server
        - /etc/redis/redis.conf
        ports:
        - containerPort: 6379
        - containerPort: 16379
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"
        volumeMounts:
        - name: redis-config
          mountPath: /etc/redis
        - name: redis-data
          mountPath: /data
      volumes:
      - name: redis-config
        configMap:
          name: redis-cluster-config
  volumeClaimTemplates:
  - metadata:
      name: redis-data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi

Implementation Tasks

1. Deploy Redis cluster with 6 nodes (3 masters, 3 replicas)

# Initialize Redis cluster after deployment
kubectl exec -it redis-cluster-0 -- redis-cli --cluster create \
  redis-cluster-0.redis-cluster:6379 \
  redis-cluster-1.redis-cluster:6379 \
  redis-cluster-2.redis-cluster:6379 \
  redis-cluster-3.redis-cluster:6379 \
  redis-cluster-4.redis-cluster:6379 \
  redis-cluster-5.redis-cluster:6379 \
  --cluster-replicas 1

2. Configure session storage

services.AddStackExchangeRedisCache(options =>
{
    options.Configuration = configuration.GetConnectionString("Redis");
    options.InstanceName = "MotoVault";
});

services.AddSession(options =>
{
    options.IdleTimeout = TimeSpan.FromMinutes(30);
    options.Cookie.HttpOnly = true;
    options.Cookie.IsEssential = true;
    options.Cookie.SecurePolicy = CookieSecurePolicy.Always;
});

3. Implement distributed caching

public class CachedTranslationService : ITranslationService
{
    private readonly IDistributedCache _cache;
    private readonly ITranslationService _translationService;
    private readonly ILogger<CachedTranslationService> _logger;
    
    public async Task<string> GetTranslationAsync(string key, string language)
    {
        var cacheKey = $"translation:{language}:{key}";
        var cached = await _cache.GetStringAsync(cacheKey);
        
        if (cached != null)
        {
            return cached;
        }
        
        var translation = await _translationService.GetTranslationAsync(key, language);
        
        await _cache.SetStringAsync(cacheKey, translation, new DistributedCacheEntryOptions
        {
            SlidingExpiration = TimeSpan.FromHours(1)
        });
        
        return translation;
    }
}

4. Add cache monitoring and performance metrics

public class CacheMetricsService
{
    private readonly Counter _cacheHits;
    private readonly Counter _cacheMisses;
    private readonly Histogram _cacheOperationDuration;
    
    public CacheMetricsService()
    {
        _cacheHits = Metrics.CreateCounter(
            "motovault_cache_hits_total", 
            "Total cache hits",
            new[] { "cache_type" });
            
        _cacheMisses = Metrics.CreateCounter(
            "motovault_cache_misses_total", 
            "Total cache misses",
            new[] { "cache_type" });
            
        _cacheOperationDuration = Metrics.CreateHistogram(
            "motovault_cache_operation_duration_seconds",
            "Cache operation duration",
            new[] { "operation", "cache_type" });
    }
}

Week-by-Week Breakdown

Week 5: MinIO Deployment

  • Days 1-2: Deploy MinIO operator and configure basic cluster
  • Days 3-4: Implement file storage abstraction interface
  • Days 5-7: Create MinIO storage service implementation

Week 6: File Migration and PostgreSQL HA

  • Days 1-2: Complete file storage abstraction and migration tools
  • Days 3-4: Deploy PostgreSQL operator and HA cluster
  • Days 5-7: Configure connection pooling and backup strategies

Week 7: Redis Cluster and Caching

  • Days 1-3: Deploy Redis cluster and configure session storage
  • Days 4-5: Implement distributed caching layer
  • Days 6-7: Add cache monitoring and performance metrics

Week 8: Integration and Testing

  • Days 1-3: End-to-end testing of all HA components
  • Days 4-5: Performance testing and optimization
  • Days 6-7: Documentation and preparation for Phase 3

Success Criteria

  • MinIO cluster operational with erasure coding
  • File storage abstraction implemented and tested
  • PostgreSQL HA cluster with automatic failover
  • Redis cluster providing distributed sessions
  • All file operations migrated to object storage
  • Comprehensive monitoring for all infrastructure components
  • Backup and recovery procedures validated

Testing Requirements

Infrastructure Tests

  • MinIO cluster failover scenarios
  • PostgreSQL primary/replica failover
  • Redis cluster node failure recovery
  • Network partition handling

Application Integration Tests

  • File upload/download through abstraction layer
  • Session persistence across application restarts
  • Cache performance and invalidation
  • Database connection pool behavior

Performance Tests

  • File storage throughput and latency
  • Database query performance with connection pooling
  • Cache hit/miss ratios and response times

Deliverables

  1. Infrastructure Components

    • MinIO HA cluster configuration
    • PostgreSQL HA cluster with operator
    • Redis cluster deployment
    • Monitoring and alerting setup
  2. Application Updates

    • File storage abstraction implementation
    • Session management configuration
    • Distributed caching integration
    • Connection pooling optimization
  3. Migration Tools

    • File migration utility
    • Database migration scripts
    • Configuration migration helpers
  4. Documentation

    • Infrastructure architecture diagrams
    • Operational procedures
    • Monitoring and alerting guides

Dependencies

  • Kubernetes cluster with sufficient resources
  • Storage classes for persistent volumes
  • Prometheus and Grafana for monitoring
  • Network connectivity between components

Risks and Mitigations

Risk: Data Corruption During File Migration

Mitigation: Checksum validation and parallel running of old/new systems

Risk: Database Failover Issues

Mitigation: Extensive testing of failover scenarios and automated recovery

Risk: Cache Inconsistency

Mitigation: Proper cache invalidation strategies and monitoring


Previous Phase: Phase 1: Core Kubernetes Readiness
Next Phase: Phase 3: Production Deployment