Files

Eric Gullickson 4391cf11ed Architecture Docs

2025-07-28 08:43:00 -05:00

21 KiB

Raw Blame History

Phase 2: High Availability Infrastructure (Weeks 5-8)

This phase focuses on implementing the supporting infrastructure required for high availability, including MinIO clusters, PostgreSQL HA setup, Redis clusters, and file storage abstraction.

Overview

Phase 2 transforms MotoVaultPro's supporting infrastructure from single-instance services to highly available, distributed systems. This phase establishes the foundation for true high availability by eliminating all single points of failure in the data layer.

Key Objectives

MinIO High Availability: Deploy distributed object storage with erasure coding
File Storage Abstraction: Create unified interface for file operations
PostgreSQL HA: Implement primary/replica configuration with automated failover
Redis Cluster: Deploy distributed caching and session storage
Data Migration: Seamless transition from local storage to distributed systems

2.1 MinIO High Availability Setup

Objective: Deploy a highly available MinIO cluster for file storage with automatic failover.

Architecture Overview: MinIO will be deployed as a distributed cluster with erasure coding for data protection and automatic healing capabilities.

MinIO Cluster Configuration

# MinIO Tenant Configuration
apiVersion: minio.min.io/v2
kind: Tenant
metadata:
  name: motovault-minio
  namespace: motovault
spec:
  image: minio/minio:RELEASE.2024-01-16T16-07-38Z
  creationDate: 2024-01-20T10:00:00Z
  pools:
  - servers: 4
    name: pool-0
    volumesPerServer: 4
    volumeClaimTemplate:
      metadata:
        name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
        storageClassName: fast-ssd
  mountPath: /export
  subPath: /data
  requestAutoCert: false
  certConfig:
    commonName: ""
    organizationName: []
    dnsNames: []
  console:
    image: minio/console:v0.22.5
    replicas: 2
    consoleSecret:
      name: motovault-minio-console-secret
  configuration:
    name: motovault-minio-config

Implementation Tasks

1. Deploy MinIO Operator

kubectl apply -k "github.com/minio/operator/resources"

2. Create MinIO cluster configuration with erasure coding

Configure 4+ nodes for optimal erasure coding
Set up data protection with automatic healing
Configure storage classes for performance

3. Configure backup policies for disaster recovery

apiVersion: v1
kind: ConfigMap
metadata:
  name: minio-backup-policy
data:
  backup-policy.json: |
    {
      "rules": [
        {
          "id": "motovault-backup",
          "status": "Enabled",
          "transition": {
            "days": 30,
            "storage_class": "GLACIER"
          }
        }
      ]
    }

4. Set up monitoring with Prometheus metrics

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: minio-metrics
spec:
  selector:
    matchLabels:
      app: minio
  endpoints:
  - port: http-minio
    path: /minio/v2/metrics/cluster

5. Create service endpoints for application connectivity

apiVersion: v1
kind: Service
metadata:
  name: minio-service
spec:
  selector:
    app: minio
  ports:
  - name: http
    port: 9000
    targetPort: 9000
  - name: console
    port: 9001
    targetPort: 9001

MinIO High Availability Features

Erasure Coding: Data is split across multiple drives with parity for automatic healing
Distributed Architecture: No single point of failure
Automatic Healing: Corrupted data is automatically detected and repaired
Load Balancing: Built-in load balancing across cluster nodes
Bucket Policies: Fine-grained access control for different data types

2.2 File Storage Abstraction Implementation

Objective: Create an abstraction layer that allows seamless switching between local filesystem and MinIO object storage.

Current State:

Direct filesystem operations throughout the application
File paths hardcoded in various controllers and services
No abstraction for different storage backends

Target State:

Unified file storage interface
Pluggable storage implementations
Transparent migration between storage types

Implementation Tasks

1. Define storage abstraction interface

public interface IFileStorageService
{
    Task<string> UploadFileAsync(Stream fileStream, string fileName, string contentType, CancellationToken cancellationToken = default);
    Task<Stream> DownloadFileAsync(string fileId, CancellationToken cancellationToken = default);
    Task<bool> DeleteFileAsync(string fileId, CancellationToken cancellationToken = default);
    Task<FileMetadata> GetFileMetadataAsync(string fileId, CancellationToken cancellationToken = default);
    Task<IEnumerable<FileMetadata>> ListFilesAsync(string prefix = null, CancellationToken cancellationToken = default);
    Task<string> GeneratePresignedUrlAsync(string fileId, TimeSpan expiration, CancellationToken cancellationToken = default);
}

public class FileMetadata
{
    public string Id { get; set; }
    public string FileName { get; set; }
    public string ContentType { get; set; }
    public long Size { get; set; }
    public DateTime CreatedDate { get; set; }
    public DateTime ModifiedDate { get; set; }
    public Dictionary<string, string> Tags { get; set; }
}

2. Implement MinIO storage service

public class MinIOFileStorageService : IFileStorageService
{
    private readonly IMinioClient _minioClient;
    private readonly ILogger<MinIOFileStorageService> _logger;
    private readonly string _bucketName;
    
    public MinIOFileStorageService(IMinioClient minioClient, IConfiguration configuration, ILogger<MinIOFileStorageService> logger)
    {
        _minioClient = minioClient;
        _logger = logger;
        _bucketName = configuration["MinIO:BucketName"] ?? "motovault-files";
    }
    
    public async Task<string> UploadFileAsync(Stream fileStream, string fileName, string contentType, CancellationToken cancellationToken = default)
    {
        var fileId = $"{Guid.NewGuid()}/{fileName}";
        
        try
        {
            await _minioClient.PutObjectAsync(new PutObjectArgs()
                .WithBucket(_bucketName)
                .WithObject(fileId)
                .WithStreamData(fileStream)
                .WithObjectSize(fileStream.Length)
                .WithContentType(contentType)
                .WithHeaders(new Dictionary<string, string>
                {
                    ["X-Amz-Meta-Original-Name"] = fileName,
                    ["X-Amz-Meta-Upload-Date"] = DateTime.UtcNow.ToString("O")
                }), cancellationToken);
            
            _logger.LogInformation("File uploaded successfully: {FileId}", fileId);
            return fileId;
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "Failed to upload file: {FileName}", fileName);
            throw;
        }
    }
    
    public async Task<Stream> DownloadFileAsync(string fileId, CancellationToken cancellationToken = default)
    {
        try
        {
            var memoryStream = new MemoryStream();
            await _minioClient.GetObjectAsync(new GetObjectArgs()
                .WithBucket(_bucketName)
                .WithObject(fileId)
                .WithCallbackStream(stream => stream.CopyTo(memoryStream)), cancellationToken);
            
            memoryStream.Position = 0;
            return memoryStream;
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "Failed to download file: {FileId}", fileId);
            throw;
        }
    }
    
    // Additional method implementations...
}

3. Create fallback storage service for graceful degradation

public class FallbackFileStorageService : IFileStorageService
{
    private readonly IFileStorageService _primaryService;
    private readonly IFileStorageService _fallbackService;
    private readonly ILogger<FallbackFileStorageService> _logger;
    
    public FallbackFileStorageService(
        IFileStorageService primaryService,
        IFileStorageService fallbackService,
        ILogger<FallbackFileStorageService> logger)
    {
        _primaryService = primaryService;
        _fallbackService = fallbackService;
        _logger = logger;
    }
    
    public async Task<string> UploadFileAsync(Stream fileStream, string fileName, string contentType, CancellationToken cancellationToken = default)
    {
        try
        {
            return await _primaryService.UploadFileAsync(fileStream, fileName, contentType, cancellationToken);
        }
        catch (Exception ex)
        {
            _logger.LogWarning(ex, "Primary storage failed, falling back to secondary storage");
            fileStream.Position = 0; // Reset stream position
            return await _fallbackService.UploadFileAsync(fileStream, fileName, contentType, cancellationToken);
        }
    }
    
    // Implementation with automatic fallback logic for other methods...
}

4. Update all file operations to use the abstraction layer

Replace direct File.WriteAllBytes, File.ReadAllBytes calls
Update all controllers to use IFileStorageService
Modify attachment handling in vehicle records

5. Implement file migration utility for existing local files

public class FileMigrationService
{
    private readonly IFileStorageService _targetStorage;
    private readonly ILogger<FileMigrationService> _logger;
    
    public async Task<MigrationResult> MigrateLocalFilesAsync(string localPath)
    {
        var result = new MigrationResult();
        var files = Directory.GetFiles(localPath, "*", SearchOption.AllDirectories);
        
        foreach (var filePath in files)
        {
            try
            {
                using var fileStream = File.OpenRead(filePath);
                var fileName = Path.GetFileName(filePath);
                var contentType = GetContentType(fileName);
                
                var fileId = await _targetStorage.UploadFileAsync(fileStream, fileName, contentType);
                result.ProcessedFiles.Add(new MigratedFile
                {
                    OriginalPath = filePath,
                    NewFileId = fileId,
                    Success = true
                });
            }
            catch (Exception ex)
            {
                _logger.LogError(ex, "Failed to migrate file: {FilePath}", filePath);
                result.ProcessedFiles.Add(new MigratedFile
                {
                    OriginalPath = filePath,
                    Success = false,
                    Error = ex.Message
                });
            }
        }
        
        return result;
    }
}

2.3 PostgreSQL High Availability Configuration

Objective: Set up a PostgreSQL cluster with automatic failover and read replicas.

Architecture Overview: PostgreSQL will be deployed using an operator (like CloudNativePG or Postgres Operator) to provide automated failover, backup, and scaling capabilities.

PostgreSQL Cluster Configuration

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: motovault-postgres
  namespace: motovault
spec:
  instances: 3
  primaryUpdateStrategy: unsupervised
  
  postgresql:
    parameters:
      max_connections: "200"
      shared_buffers: "256MB"
      effective_cache_size: "1GB"
      maintenance_work_mem: "64MB"
      checkpoint_completion_target: "0.9"
      wal_buffers: "16MB"
      default_statistics_target: "100"
      random_page_cost: "1.1"
      effective_io_concurrency: "200"
      
  resources:
    requests:
      memory: "2Gi"
      cpu: "1000m"
    limits:
      memory: "4Gi"
      cpu: "2000m"
      
  storage:
    size: "100Gi"
    storageClass: "fast-ssd"
    
  monitoring:
    enabled: true
    
  backup:
    retentionPolicy: "30d"
    barmanObjectStore:
      destinationPath: "s3://motovault-backups/postgres"
      s3Credentials:
        accessKeyId:
          name: postgres-backup-credentials
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: postgres-backup-credentials
          key: SECRET_ACCESS_KEY
      wal:
        retention: "5d"
      data:
        retention: "30d"
        jobs: 1

Implementation Tasks

1. Deploy PostgreSQL operator (CloudNativePG recommended)

kubectl apply -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.20/releases/cnpg-1.20.1.yaml

2. Configure cluster with primary/replica setup

3-node cluster with automatic failover
Read-write split capability
Streaming replication configuration

3. Set up automated backups to MinIO or external storage

apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
  name: motovault-postgres-backup
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM
  backupOwnerReference: self
  cluster:
    name: motovault-postgres

4. Implement connection pooling with PgBouncer

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pgbouncer
spec:
  replicas: 2
  selector:
    matchLabels:
      app: pgbouncer
  template:
    spec:
      containers:
      - name: pgbouncer
        image: pgbouncer/pgbouncer:latest
        env:
        - name: DATABASES_HOST
          value: motovault-postgres-rw
        - name: DATABASES_PORT
          value: "5432"
        - name: DATABASES_DATABASE
          value: motovault
        - name: POOL_MODE
          value: session
        - name: MAX_CLIENT_CONN
          value: "1000"
        - name: DEFAULT_POOL_SIZE
          value: "25"

5. Configure monitoring and alerting for database health

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: postgres-metrics
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: cloudnative-pg
  endpoints:
  - port: metrics
    path: /metrics

2.4 Redis Cluster for Session Management

Objective: Implement distributed session storage and caching using Redis cluster.

Current State:

In-memory session storage tied to individual application instances
No distributed caching for expensive operations
Configuration and translation data loaded on each application start

Target State:

Redis cluster for distributed session storage
Centralized caching for frequently accessed data
High availability with automatic failover

Redis Cluster Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-cluster-config
  namespace: motovault
data:
  redis.conf: |
    cluster-enabled yes
    cluster-require-full-coverage no
    cluster-node-timeout 15000
    cluster-config-file /data/nodes.conf
    cluster-migration-barrier 1
    appendonly yes
    appendfsync everysec
    save 900 1
    save 300 10
    save 60 10000

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis-cluster
  namespace: motovault
spec:
  serviceName: redis-cluster
  replicas: 6
  selector:
    matchLabels:
      app: redis-cluster
  template:
    metadata:
      labels:
        app: redis-cluster
    spec:
      containers:
      - name: redis
        image: redis:7-alpine
        command:
        - redis-server
        - /etc/redis/redis.conf
        ports:
        - containerPort: 6379
        - containerPort: 16379
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"
        volumeMounts:
        - name: redis-config
          mountPath: /etc/redis
        - name: redis-data
          mountPath: /data
      volumes:
      - name: redis-config
        configMap:
          name: redis-cluster-config
  volumeClaimTemplates:
  - metadata:
      name: redis-data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi

Implementation Tasks

1. Deploy Redis cluster with 6 nodes (3 masters, 3 replicas)

# Initialize Redis cluster after deployment
kubectl exec -it redis-cluster-0 -- redis-cli --cluster create \
  redis-cluster-0.redis-cluster:6379 \
  redis-cluster-1.redis-cluster:6379 \
  redis-cluster-2.redis-cluster:6379 \
  redis-cluster-3.redis-cluster:6379 \
  redis-cluster-4.redis-cluster:6379 \
  redis-cluster-5.redis-cluster:6379 \
  --cluster-replicas 1

2. Configure session storage

services.AddStackExchangeRedisCache(options =>
{
    options.Configuration = configuration.GetConnectionString("Redis");
    options.InstanceName = "MotoVault";
});

services.AddSession(options =>
{
    options.IdleTimeout = TimeSpan.FromMinutes(30);
    options.Cookie.HttpOnly = true;
    options.Cookie.IsEssential = true;
    options.Cookie.SecurePolicy = CookieSecurePolicy.Always;
});

3. Implement distributed caching

public class CachedTranslationService : ITranslationService
{
    private readonly IDistributedCache _cache;
    private readonly ITranslationService _translationService;
    private readonly ILogger<CachedTranslationService> _logger;
    
    public async Task<string> GetTranslationAsync(string key, string language)
    {
        var cacheKey = $"translation:{language}:{key}";
        var cached = await _cache.GetStringAsync(cacheKey);
        
        if (cached != null)
        {
            return cached;
        }
        
        var translation = await _translationService.GetTranslationAsync(key, language);
        
        await _cache.SetStringAsync(cacheKey, translation, new DistributedCacheEntryOptions
        {
            SlidingExpiration = TimeSpan.FromHours(1)
        });
        
        return translation;
    }
}

4. Add cache monitoring and performance metrics

public class CacheMetricsService
{
    private readonly Counter _cacheHits;
    private readonly Counter _cacheMisses;
    private readonly Histogram _cacheOperationDuration;
    
    public CacheMetricsService()
    {
        _cacheHits = Metrics.CreateCounter(
            "motovault_cache_hits_total", 
            "Total cache hits",
            new[] { "cache_type" });
            
        _cacheMisses = Metrics.CreateCounter(
            "motovault_cache_misses_total", 
            "Total cache misses",
            new[] { "cache_type" });
            
        _cacheOperationDuration = Metrics.CreateHistogram(
            "motovault_cache_operation_duration_seconds",
            "Cache operation duration",
            new[] { "operation", "cache_type" });
    }
}

Week-by-Week Breakdown

Week 5: MinIO Deployment

Days 1-2: Deploy MinIO operator and configure basic cluster
Days 3-4: Implement file storage abstraction interface
Days 5-7: Create MinIO storage service implementation

Week 6: File Migration and PostgreSQL HA

Days 1-2: Complete file storage abstraction and migration tools
Days 3-4: Deploy PostgreSQL operator and HA cluster
Days 5-7: Configure connection pooling and backup strategies

Week 7: Redis Cluster and Caching

Days 1-3: Deploy Redis cluster and configure session storage
Days 4-5: Implement distributed caching layer
Days 6-7: Add cache monitoring and performance metrics

Week 8: Integration and Testing

Days 1-3: End-to-end testing of all HA components
Days 4-5: Performance testing and optimization
Days 6-7: Documentation and preparation for Phase 3

Success Criteria

MinIO cluster operational with erasure coding
File storage abstraction implemented and tested
PostgreSQL HA cluster with automatic failover
Redis cluster providing distributed sessions
All file operations migrated to object storage
Comprehensive monitoring for all infrastructure components
Backup and recovery procedures validated

Testing Requirements

Infrastructure Tests

MinIO cluster failover scenarios
PostgreSQL primary/replica failover
Redis cluster node failure recovery
Network partition handling

Application Integration Tests

File upload/download through abstraction layer
Session persistence across application restarts
Cache performance and invalidation
Database connection pool behavior

Performance Tests

File storage throughput and latency
Database query performance with connection pooling
Cache hit/miss ratios and response times

Deliverables

Infrastructure Components
- MinIO HA cluster configuration
- PostgreSQL HA cluster with operator
- Redis cluster deployment
- Monitoring and alerting setup
Application Updates
- File storage abstraction implementation
- Session management configuration
- Distributed caching integration
- Connection pooling optimization
Migration Tools
- File migration utility
- Database migration scripts
- Configuration migration helpers
Documentation
- Infrastructure architecture diagrams
- Operational procedures
- Monitoring and alerting guides

Dependencies

Kubernetes cluster with sufficient resources
Storage classes for persistent volumes
Prometheus and Grafana for monitoring
Network connectivity between components

Risks and Mitigations

Risk: Data Corruption During File Migration

Mitigation: Checksum validation and parallel running of old/new systems

Risk: Database Failover Issues

Mitigation: Extensive testing of failover scenarios and automated recovery

Risk: Cache Inconsistency

Mitigation: Proper cache invalidation strategies and monitoring

Previous Phase: Phase 1: Core Kubernetes Readiness
Next Phase: Phase 3: Production Deployment

21 KiB Raw Blame History

Phase 2: High Availability Infrastructure (Weeks 5-8)

Overview

Key Objectives

2.1 MinIO High Availability Setup

MinIO Cluster Configuration

Implementation Tasks

1. Deploy MinIO Operator

2. Create MinIO cluster configuration with erasure coding

3. Configure backup policies for disaster recovery

4. Set up monitoring with Prometheus metrics

5. Create service endpoints for application connectivity

MinIO High Availability Features

2.2 File Storage Abstraction Implementation

Implementation Tasks

1. Define storage abstraction interface

2. Implement MinIO storage service

3. Create fallback storage service for graceful degradation

4. Update all file operations to use the abstraction layer

5. Implement file migration utility for existing local files

2.3 PostgreSQL High Availability Configuration

PostgreSQL Cluster Configuration

Implementation Tasks

1. Deploy PostgreSQL operator (CloudNativePG recommended)

2. Configure cluster with primary/replica setup

3. Set up automated backups to MinIO or external storage

4. Implement connection pooling with PgBouncer

5. Configure monitoring and alerting for database health

2.4 Redis Cluster for Session Management

Redis Cluster Configuration

Implementation Tasks

1. Deploy Redis cluster with 6 nodes (3 masters, 3 replicas)

2. Configure session storage

3. Implement distributed caching

4. Add cache monitoring and performance metrics

Week-by-Week Breakdown

Week 5: MinIO Deployment

Week 6: File Migration and PostgreSQL HA

Week 7: Redis Cluster and Caching

Week 8: Integration and Testing

Success Criteria

Testing Requirements

Infrastructure Tests

Application Integration Tests

Performance Tests

Deliverables

Dependencies

Risks and Mitigations

Risk: Data Corruption During File Migration

Risk: Database Failover Issues

Risk: Cache Inconsistency

21 KiB

Raw Blame History