# Phase 2: High Availability Infrastructure (Weeks 5-8) This phase focuses on implementing the supporting infrastructure required for high availability, including MinIO clusters, PostgreSQL HA setup, Redis clusters, and file storage abstraction. ## Overview Phase 2 transforms MotoVaultPro's supporting infrastructure from single-instance services to highly available, distributed systems. This phase establishes the foundation for true high availability by eliminating all single points of failure in the data layer. ## Key Objectives - **MinIO High Availability**: Deploy distributed object storage with erasure coding - **File Storage Abstraction**: Create unified interface for file operations - **PostgreSQL HA**: Implement primary/replica configuration with automated failover - **Redis Cluster**: Deploy distributed caching and session storage - **Data Migration**: Seamless transition from local storage to distributed systems ## 2.1 MinIO High Availability Setup **Objective**: Deploy a highly available MinIO cluster for file storage with automatic failover. **Architecture Overview**: MinIO will be deployed as a distributed cluster with erasure coding for data protection and automatic healing capabilities. ### MinIO Cluster Configuration ```yaml # MinIO Tenant Configuration apiVersion: minio.min.io/v2 kind: Tenant metadata: name: motovault-minio namespace: motovault spec: image: minio/minio:RELEASE.2024-01-16T16-07-38Z creationDate: 2024-01-20T10:00:00Z pools: - servers: 4 name: pool-0 volumesPerServer: 4 volumeClaimTemplate: metadata: name: data spec: accessModes: - ReadWriteOnce resources: requests: storage: 100Gi storageClassName: fast-ssd mountPath: /export subPath: /data requestAutoCert: false certConfig: commonName: "" organizationName: [] dnsNames: [] console: image: minio/console:v0.22.5 replicas: 2 consoleSecret: name: motovault-minio-console-secret configuration: name: motovault-minio-config ``` ### Implementation Tasks #### 1. Deploy MinIO Operator ```bash kubectl apply -k "github.com/minio/operator/resources" ``` #### 2. Create MinIO cluster configuration with erasure coding - Configure 4+ nodes for optimal erasure coding - Set up data protection with automatic healing - Configure storage classes for performance #### 3. Configure backup policies for disaster recovery ```yaml apiVersion: v1 kind: ConfigMap metadata: name: minio-backup-policy data: backup-policy.json: | { "rules": [ { "id": "motovault-backup", "status": "Enabled", "transition": { "days": 30, "storage_class": "GLACIER" } } ] } ``` #### 4. Set up monitoring with Prometheus metrics ```yaml apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: minio-metrics spec: selector: matchLabels: app: minio endpoints: - port: http-minio path: /minio/v2/metrics/cluster ``` #### 5. Create service endpoints for application connectivity ```yaml apiVersion: v1 kind: Service metadata: name: minio-service spec: selector: app: minio ports: - name: http port: 9000 targetPort: 9000 - name: console port: 9001 targetPort: 9001 ``` ### MinIO High Availability Features - **Erasure Coding**: Data is split across multiple drives with parity for automatic healing - **Distributed Architecture**: No single point of failure - **Automatic Healing**: Corrupted data is automatically detected and repaired - **Load Balancing**: Built-in load balancing across cluster nodes - **Bucket Policies**: Fine-grained access control for different data types ## 2.2 File Storage Abstraction Implementation **Objective**: Create an abstraction layer that allows seamless switching between local filesystem and MinIO object storage. **Current State**: - Direct filesystem operations throughout the application - File paths hardcoded in various controllers and services - No abstraction for different storage backends **Target State**: - Unified file storage interface - Pluggable storage implementations - Transparent migration between storage types ### Implementation Tasks #### 1. Define storage abstraction interface ```csharp public interface IFileStorageService { Task UploadFileAsync(Stream fileStream, string fileName, string contentType, CancellationToken cancellationToken = default); Task DownloadFileAsync(string fileId, CancellationToken cancellationToken = default); Task DeleteFileAsync(string fileId, CancellationToken cancellationToken = default); Task GetFileMetadataAsync(string fileId, CancellationToken cancellationToken = default); Task> ListFilesAsync(string prefix = null, CancellationToken cancellationToken = default); Task GeneratePresignedUrlAsync(string fileId, TimeSpan expiration, CancellationToken cancellationToken = default); } public class FileMetadata { public string Id { get; set; } public string FileName { get; set; } public string ContentType { get; set; } public long Size { get; set; } public DateTime CreatedDate { get; set; } public DateTime ModifiedDate { get; set; } public Dictionary Tags { get; set; } } ``` #### 2. Implement MinIO storage service ```csharp public class MinIOFileStorageService : IFileStorageService { private readonly IMinioClient _minioClient; private readonly ILogger _logger; private readonly string _bucketName; public MinIOFileStorageService(IMinioClient minioClient, IConfiguration configuration, ILogger logger) { _minioClient = minioClient; _logger = logger; _bucketName = configuration["MinIO:BucketName"] ?? "motovault-files"; } public async Task UploadFileAsync(Stream fileStream, string fileName, string contentType, CancellationToken cancellationToken = default) { var fileId = $"{Guid.NewGuid()}/{fileName}"; try { await _minioClient.PutObjectAsync(new PutObjectArgs() .WithBucket(_bucketName) .WithObject(fileId) .WithStreamData(fileStream) .WithObjectSize(fileStream.Length) .WithContentType(contentType) .WithHeaders(new Dictionary { ["X-Amz-Meta-Original-Name"] = fileName, ["X-Amz-Meta-Upload-Date"] = DateTime.UtcNow.ToString("O") }), cancellationToken); _logger.LogInformation("File uploaded successfully: {FileId}", fileId); return fileId; } catch (Exception ex) { _logger.LogError(ex, "Failed to upload file: {FileName}", fileName); throw; } } public async Task DownloadFileAsync(string fileId, CancellationToken cancellationToken = default) { try { var memoryStream = new MemoryStream(); await _minioClient.GetObjectAsync(new GetObjectArgs() .WithBucket(_bucketName) .WithObject(fileId) .WithCallbackStream(stream => stream.CopyTo(memoryStream)), cancellationToken); memoryStream.Position = 0; return memoryStream; } catch (Exception ex) { _logger.LogError(ex, "Failed to download file: {FileId}", fileId); throw; } } // Additional method implementations... } ``` #### 3. Create fallback storage service for graceful degradation ```csharp public class FallbackFileStorageService : IFileStorageService { private readonly IFileStorageService _primaryService; private readonly IFileStorageService _fallbackService; private readonly ILogger _logger; public FallbackFileStorageService( IFileStorageService primaryService, IFileStorageService fallbackService, ILogger logger) { _primaryService = primaryService; _fallbackService = fallbackService; _logger = logger; } public async Task UploadFileAsync(Stream fileStream, string fileName, string contentType, CancellationToken cancellationToken = default) { try { return await _primaryService.UploadFileAsync(fileStream, fileName, contentType, cancellationToken); } catch (Exception ex) { _logger.LogWarning(ex, "Primary storage failed, falling back to secondary storage"); fileStream.Position = 0; // Reset stream position return await _fallbackService.UploadFileAsync(fileStream, fileName, contentType, cancellationToken); } } // Implementation with automatic fallback logic for other methods... } ``` #### 4. Update all file operations to use the abstraction layer - Replace direct File.WriteAllBytes, File.ReadAllBytes calls - Update all controllers to use IFileStorageService - Modify attachment handling in vehicle records #### 5. Implement file migration utility for existing local files ```csharp public class FileMigrationService { private readonly IFileStorageService _targetStorage; private readonly ILogger _logger; public async Task MigrateLocalFilesAsync(string localPath) { var result = new MigrationResult(); var files = Directory.GetFiles(localPath, "*", SearchOption.AllDirectories); foreach (var filePath in files) { try { using var fileStream = File.OpenRead(filePath); var fileName = Path.GetFileName(filePath); var contentType = GetContentType(fileName); var fileId = await _targetStorage.UploadFileAsync(fileStream, fileName, contentType); result.ProcessedFiles.Add(new MigratedFile { OriginalPath = filePath, NewFileId = fileId, Success = true }); } catch (Exception ex) { _logger.LogError(ex, "Failed to migrate file: {FilePath}", filePath); result.ProcessedFiles.Add(new MigratedFile { OriginalPath = filePath, Success = false, Error = ex.Message }); } } return result; } } ``` ## 2.3 PostgreSQL High Availability Configuration **Objective**: Set up a PostgreSQL cluster with automatic failover and read replicas. **Architecture Overview**: PostgreSQL will be deployed using an operator (like CloudNativePG or Postgres Operator) to provide automated failover, backup, and scaling capabilities. ### PostgreSQL Cluster Configuration ```yaml apiVersion: postgresql.cnpg.io/v1 kind: Cluster metadata: name: motovault-postgres namespace: motovault spec: instances: 3 primaryUpdateStrategy: unsupervised postgresql: parameters: max_connections: "200" shared_buffers: "256MB" effective_cache_size: "1GB" maintenance_work_mem: "64MB" checkpoint_completion_target: "0.9" wal_buffers: "16MB" default_statistics_target: "100" random_page_cost: "1.1" effective_io_concurrency: "200" resources: requests: memory: "2Gi" cpu: "1000m" limits: memory: "4Gi" cpu: "2000m" storage: size: "100Gi" storageClass: "fast-ssd" monitoring: enabled: true backup: retentionPolicy: "30d" barmanObjectStore: destinationPath: "s3://motovault-backups/postgres" s3Credentials: accessKeyId: name: postgres-backup-credentials key: ACCESS_KEY_ID secretAccessKey: name: postgres-backup-credentials key: SECRET_ACCESS_KEY wal: retention: "5d" data: retention: "30d" jobs: 1 ``` ### Implementation Tasks #### 1. Deploy PostgreSQL operator (CloudNativePG recommended) ```bash kubectl apply -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.20/releases/cnpg-1.20.1.yaml ``` #### 2. Configure cluster with primary/replica setup - 3-node cluster with automatic failover - Read-write split capability - Streaming replication configuration #### 3. Set up automated backups to MinIO or external storage ```yaml apiVersion: postgresql.cnpg.io/v1 kind: ScheduledBackup metadata: name: motovault-postgres-backup spec: schedule: "0 2 * * *" # Daily at 2 AM backupOwnerReference: self cluster: name: motovault-postgres ``` #### 4. Implement connection pooling with PgBouncer ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: pgbouncer spec: replicas: 2 selector: matchLabels: app: pgbouncer template: spec: containers: - name: pgbouncer image: pgbouncer/pgbouncer:latest env: - name: DATABASES_HOST value: motovault-postgres-rw - name: DATABASES_PORT value: "5432" - name: DATABASES_DATABASE value: motovault - name: POOL_MODE value: session - name: MAX_CLIENT_CONN value: "1000" - name: DEFAULT_POOL_SIZE value: "25" ``` #### 5. Configure monitoring and alerting for database health ```yaml apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: postgres-metrics spec: selector: matchLabels: app.kubernetes.io/name: cloudnative-pg endpoints: - port: metrics path: /metrics ``` ## 2.4 Redis Cluster for Session Management **Objective**: Implement distributed session storage and caching using Redis cluster. **Current State**: - In-memory session storage tied to individual application instances - No distributed caching for expensive operations - Configuration and translation data loaded on each application start **Target State**: - Redis cluster for distributed session storage - Centralized caching for frequently accessed data - High availability with automatic failover ### Redis Cluster Configuration ```yaml apiVersion: v1 kind: ConfigMap metadata: name: redis-cluster-config namespace: motovault data: redis.conf: | cluster-enabled yes cluster-require-full-coverage no cluster-node-timeout 15000 cluster-config-file /data/nodes.conf cluster-migration-barrier 1 appendonly yes appendfsync everysec save 900 1 save 300 10 save 60 10000 --- apiVersion: apps/v1 kind: StatefulSet metadata: name: redis-cluster namespace: motovault spec: serviceName: redis-cluster replicas: 6 selector: matchLabels: app: redis-cluster template: metadata: labels: app: redis-cluster spec: containers: - name: redis image: redis:7-alpine command: - redis-server - /etc/redis/redis.conf ports: - containerPort: 6379 - containerPort: 16379 resources: requests: memory: "512Mi" cpu: "250m" limits: memory: "1Gi" cpu: "500m" volumeMounts: - name: redis-config mountPath: /etc/redis - name: redis-data mountPath: /data volumes: - name: redis-config configMap: name: redis-cluster-config volumeClaimTemplates: - metadata: name: redis-data spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 10Gi ``` ### Implementation Tasks #### 1. Deploy Redis cluster with 6 nodes (3 masters, 3 replicas) ```bash # Initialize Redis cluster after deployment kubectl exec -it redis-cluster-0 -- redis-cli --cluster create \ redis-cluster-0.redis-cluster:6379 \ redis-cluster-1.redis-cluster:6379 \ redis-cluster-2.redis-cluster:6379 \ redis-cluster-3.redis-cluster:6379 \ redis-cluster-4.redis-cluster:6379 \ redis-cluster-5.redis-cluster:6379 \ --cluster-replicas 1 ``` #### 2. Configure session storage ```csharp services.AddStackExchangeRedisCache(options => { options.Configuration = configuration.GetConnectionString("Redis"); options.InstanceName = "MotoVault"; }); services.AddSession(options => { options.IdleTimeout = TimeSpan.FromMinutes(30); options.Cookie.HttpOnly = true; options.Cookie.IsEssential = true; options.Cookie.SecurePolicy = CookieSecurePolicy.Always; }); ``` #### 3. Implement distributed caching ```csharp public class CachedTranslationService : ITranslationService { private readonly IDistributedCache _cache; private readonly ITranslationService _translationService; private readonly ILogger _logger; public async Task GetTranslationAsync(string key, string language) { var cacheKey = $"translation:{language}:{key}"; var cached = await _cache.GetStringAsync(cacheKey); if (cached != null) { return cached; } var translation = await _translationService.GetTranslationAsync(key, language); await _cache.SetStringAsync(cacheKey, translation, new DistributedCacheEntryOptions { SlidingExpiration = TimeSpan.FromHours(1) }); return translation; } } ``` #### 4. Add cache monitoring and performance metrics ```csharp public class CacheMetricsService { private readonly Counter _cacheHits; private readonly Counter _cacheMisses; private readonly Histogram _cacheOperationDuration; public CacheMetricsService() { _cacheHits = Metrics.CreateCounter( "motovault_cache_hits_total", "Total cache hits", new[] { "cache_type" }); _cacheMisses = Metrics.CreateCounter( "motovault_cache_misses_total", "Total cache misses", new[] { "cache_type" }); _cacheOperationDuration = Metrics.CreateHistogram( "motovault_cache_operation_duration_seconds", "Cache operation duration", new[] { "operation", "cache_type" }); } } ``` ## Week-by-Week Breakdown ### Week 5: MinIO Deployment - **Days 1-2**: Deploy MinIO operator and configure basic cluster - **Days 3-4**: Implement file storage abstraction interface - **Days 5-7**: Create MinIO storage service implementation ### Week 6: File Migration and PostgreSQL HA - **Days 1-2**: Complete file storage abstraction and migration tools - **Days 3-4**: Deploy PostgreSQL operator and HA cluster - **Days 5-7**: Configure connection pooling and backup strategies ### Week 7: Redis Cluster and Caching - **Days 1-3**: Deploy Redis cluster and configure session storage - **Days 4-5**: Implement distributed caching layer - **Days 6-7**: Add cache monitoring and performance metrics ### Week 8: Integration and Testing - **Days 1-3**: End-to-end testing of all HA components - **Days 4-5**: Performance testing and optimization - **Days 6-7**: Documentation and preparation for Phase 3 ## Success Criteria - [ ] MinIO cluster operational with erasure coding - [ ] File storage abstraction implemented and tested - [ ] PostgreSQL HA cluster with automatic failover - [ ] Redis cluster providing distributed sessions - [ ] All file operations migrated to object storage - [ ] Comprehensive monitoring for all infrastructure components - [ ] Backup and recovery procedures validated ## Testing Requirements ### Infrastructure Tests - MinIO cluster failover scenarios - PostgreSQL primary/replica failover - Redis cluster node failure recovery - Network partition handling ### Application Integration Tests - File upload/download through abstraction layer - Session persistence across application restarts - Cache performance and invalidation - Database connection pool behavior ### Performance Tests - File storage throughput and latency - Database query performance with connection pooling - Cache hit/miss ratios and response times ## Deliverables 1. **Infrastructure Components** - MinIO HA cluster configuration - PostgreSQL HA cluster with operator - Redis cluster deployment - Monitoring and alerting setup 2. **Application Updates** - File storage abstraction implementation - Session management configuration - Distributed caching integration - Connection pooling optimization 3. **Migration Tools** - File migration utility - Database migration scripts - Configuration migration helpers 4. **Documentation** - Infrastructure architecture diagrams - Operational procedures - Monitoring and alerting guides ## Dependencies - Kubernetes cluster with sufficient resources - Storage classes for persistent volumes - Prometheus and Grafana for monitoring - Network connectivity between components ## Risks and Mitigations ### Risk: Data Corruption During File Migration **Mitigation**: Checksum validation and parallel running of old/new systems ### Risk: Database Failover Issues **Mitigation**: Extensive testing of failover scenarios and automated recovery ### Risk: Cache Inconsistency **Mitigation**: Proper cache invalidation strategies and monitoring --- **Previous Phase**: [Phase 1: Core Kubernetes Readiness](K8S-PHASE-1.md) **Next Phase**: [Phase 3: Production Deployment](K8S-PHASE-3.md)