Fixed Dark Mode

This commit is contained in:
Eric Gullickson
2025-07-28 09:39:17 -05:00
parent 4391cf11ed
commit 01a03263c9
455 changed files with 143757 additions and 0 deletions

742
docs/K8S-PHASE-2.md Normal file
View File

@@ -0,0 +1,742 @@
# Phase 2: High Availability Infrastructure (Weeks 5-8)
This phase focuses on implementing the supporting infrastructure required for high availability, including MinIO clusters, PostgreSQL HA setup, Redis clusters, and file storage abstraction.
## Overview
Phase 2 transforms MotoVaultPro's supporting infrastructure from single-instance services to highly available, distributed systems. This phase establishes the foundation for true high availability by eliminating all single points of failure in the data layer.
## Key Objectives
- **MinIO High Availability**: Deploy distributed object storage with erasure coding
- **File Storage Abstraction**: Create unified interface for file operations
- **PostgreSQL HA**: Implement primary/replica configuration with automated failover
- **Redis Cluster**: Deploy distributed caching and session storage
- **Data Migration**: Seamless transition from local storage to distributed systems
## 2.1 MinIO High Availability Setup
**Objective**: Deploy a highly available MinIO cluster for file storage with automatic failover.
**Architecture Overview**:
MinIO will be deployed as a distributed cluster with erasure coding for data protection and automatic healing capabilities.
### MinIO Cluster Configuration
```yaml
# MinIO Tenant Configuration
apiVersion: minio.min.io/v2
kind: Tenant
metadata:
name: motovault-minio
namespace: motovault
spec:
image: minio/minio:RELEASE.2024-01-16T16-07-38Z
creationDate: 2024-01-20T10:00:00Z
pools:
- servers: 4
name: pool-0
volumesPerServer: 4
volumeClaimTemplate:
metadata:
name: data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: fast-ssd
mountPath: /export
subPath: /data
requestAutoCert: false
certConfig:
commonName: ""
organizationName: []
dnsNames: []
console:
image: minio/console:v0.22.5
replicas: 2
consoleSecret:
name: motovault-minio-console-secret
configuration:
name: motovault-minio-config
```
### Implementation Tasks
#### 1. Deploy MinIO Operator
```bash
kubectl apply -k "github.com/minio/operator/resources"
```
#### 2. Create MinIO cluster configuration with erasure coding
- Configure 4+ nodes for optimal erasure coding
- Set up data protection with automatic healing
- Configure storage classes for performance
#### 3. Configure backup policies for disaster recovery
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: minio-backup-policy
data:
backup-policy.json: |
{
"rules": [
{
"id": "motovault-backup",
"status": "Enabled",
"transition": {
"days": 30,
"storage_class": "GLACIER"
}
}
]
}
```
#### 4. Set up monitoring with Prometheus metrics
```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: minio-metrics
spec:
selector:
matchLabels:
app: minio
endpoints:
- port: http-minio
path: /minio/v2/metrics/cluster
```
#### 5. Create service endpoints for application connectivity
```yaml
apiVersion: v1
kind: Service
metadata:
name: minio-service
spec:
selector:
app: minio
ports:
- name: http
port: 9000
targetPort: 9000
- name: console
port: 9001
targetPort: 9001
```
### MinIO High Availability Features
- **Erasure Coding**: Data is split across multiple drives with parity for automatic healing
- **Distributed Architecture**: No single point of failure
- **Automatic Healing**: Corrupted data is automatically detected and repaired
- **Load Balancing**: Built-in load balancing across cluster nodes
- **Bucket Policies**: Fine-grained access control for different data types
## 2.2 File Storage Abstraction Implementation
**Objective**: Create an abstraction layer that allows seamless switching between local filesystem and MinIO object storage.
**Current State**:
- Direct filesystem operations throughout the application
- File paths hardcoded in various controllers and services
- No abstraction for different storage backends
**Target State**:
- Unified file storage interface
- Pluggable storage implementations
- Transparent migration between storage types
### Implementation Tasks
#### 1. Define storage abstraction interface
```csharp
public interface IFileStorageService
{
Task<string> UploadFileAsync(Stream fileStream, string fileName, string contentType, CancellationToken cancellationToken = default);
Task<Stream> DownloadFileAsync(string fileId, CancellationToken cancellationToken = default);
Task<bool> DeleteFileAsync(string fileId, CancellationToken cancellationToken = default);
Task<FileMetadata> GetFileMetadataAsync(string fileId, CancellationToken cancellationToken = default);
Task<IEnumerable<FileMetadata>> ListFilesAsync(string prefix = null, CancellationToken cancellationToken = default);
Task<string> GeneratePresignedUrlAsync(string fileId, TimeSpan expiration, CancellationToken cancellationToken = default);
}
public class FileMetadata
{
public string Id { get; set; }
public string FileName { get; set; }
public string ContentType { get; set; }
public long Size { get; set; }
public DateTime CreatedDate { get; set; }
public DateTime ModifiedDate { get; set; }
public Dictionary<string, string> Tags { get; set; }
}
```
#### 2. Implement MinIO storage service
```csharp
public class MinIOFileStorageService : IFileStorageService
{
private readonly IMinioClient _minioClient;
private readonly ILogger<MinIOFileStorageService> _logger;
private readonly string _bucketName;
public MinIOFileStorageService(IMinioClient minioClient, IConfiguration configuration, ILogger<MinIOFileStorageService> logger)
{
_minioClient = minioClient;
_logger = logger;
_bucketName = configuration["MinIO:BucketName"] ?? "motovault-files";
}
public async Task<string> UploadFileAsync(Stream fileStream, string fileName, string contentType, CancellationToken cancellationToken = default)
{
var fileId = $"{Guid.NewGuid()}/{fileName}";
try
{
await _minioClient.PutObjectAsync(new PutObjectArgs()
.WithBucket(_bucketName)
.WithObject(fileId)
.WithStreamData(fileStream)
.WithObjectSize(fileStream.Length)
.WithContentType(contentType)
.WithHeaders(new Dictionary<string, string>
{
["X-Amz-Meta-Original-Name"] = fileName,
["X-Amz-Meta-Upload-Date"] = DateTime.UtcNow.ToString("O")
}), cancellationToken);
_logger.LogInformation("File uploaded successfully: {FileId}", fileId);
return fileId;
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to upload file: {FileName}", fileName);
throw;
}
}
public async Task<Stream> DownloadFileAsync(string fileId, CancellationToken cancellationToken = default)
{
try
{
var memoryStream = new MemoryStream();
await _minioClient.GetObjectAsync(new GetObjectArgs()
.WithBucket(_bucketName)
.WithObject(fileId)
.WithCallbackStream(stream => stream.CopyTo(memoryStream)), cancellationToken);
memoryStream.Position = 0;
return memoryStream;
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to download file: {FileId}", fileId);
throw;
}
}
// Additional method implementations...
}
```
#### 3. Create fallback storage service for graceful degradation
```csharp
public class FallbackFileStorageService : IFileStorageService
{
private readonly IFileStorageService _primaryService;
private readonly IFileStorageService _fallbackService;
private readonly ILogger<FallbackFileStorageService> _logger;
public FallbackFileStorageService(
IFileStorageService primaryService,
IFileStorageService fallbackService,
ILogger<FallbackFileStorageService> logger)
{
_primaryService = primaryService;
_fallbackService = fallbackService;
_logger = logger;
}
public async Task<string> UploadFileAsync(Stream fileStream, string fileName, string contentType, CancellationToken cancellationToken = default)
{
try
{
return await _primaryService.UploadFileAsync(fileStream, fileName, contentType, cancellationToken);
}
catch (Exception ex)
{
_logger.LogWarning(ex, "Primary storage failed, falling back to secondary storage");
fileStream.Position = 0; // Reset stream position
return await _fallbackService.UploadFileAsync(fileStream, fileName, contentType, cancellationToken);
}
}
// Implementation with automatic fallback logic for other methods...
}
```
#### 4. Update all file operations to use the abstraction layer
- Replace direct File.WriteAllBytes, File.ReadAllBytes calls
- Update all controllers to use IFileStorageService
- Modify attachment handling in vehicle records
#### 5. Implement file migration utility for existing local files
```csharp
public class FileMigrationService
{
private readonly IFileStorageService _targetStorage;
private readonly ILogger<FileMigrationService> _logger;
public async Task<MigrationResult> MigrateLocalFilesAsync(string localPath)
{
var result = new MigrationResult();
var files = Directory.GetFiles(localPath, "*", SearchOption.AllDirectories);
foreach (var filePath in files)
{
try
{
using var fileStream = File.OpenRead(filePath);
var fileName = Path.GetFileName(filePath);
var contentType = GetContentType(fileName);
var fileId = await _targetStorage.UploadFileAsync(fileStream, fileName, contentType);
result.ProcessedFiles.Add(new MigratedFile
{
OriginalPath = filePath,
NewFileId = fileId,
Success = true
});
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to migrate file: {FilePath}", filePath);
result.ProcessedFiles.Add(new MigratedFile
{
OriginalPath = filePath,
Success = false,
Error = ex.Message
});
}
}
return result;
}
}
```
## 2.3 PostgreSQL High Availability Configuration
**Objective**: Set up a PostgreSQL cluster with automatic failover and read replicas.
**Architecture Overview**:
PostgreSQL will be deployed using an operator (like CloudNativePG or Postgres Operator) to provide automated failover, backup, and scaling capabilities.
### PostgreSQL Cluster Configuration
```yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: motovault-postgres
namespace: motovault
spec:
instances: 3
primaryUpdateStrategy: unsupervised
postgresql:
parameters:
max_connections: "200"
shared_buffers: "256MB"
effective_cache_size: "1GB"
maintenance_work_mem: "64MB"
checkpoint_completion_target: "0.9"
wal_buffers: "16MB"
default_statistics_target: "100"
random_page_cost: "1.1"
effective_io_concurrency: "200"
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
storage:
size: "100Gi"
storageClass: "fast-ssd"
monitoring:
enabled: true
backup:
retentionPolicy: "30d"
barmanObjectStore:
destinationPath: "s3://motovault-backups/postgres"
s3Credentials:
accessKeyId:
name: postgres-backup-credentials
key: ACCESS_KEY_ID
secretAccessKey:
name: postgres-backup-credentials
key: SECRET_ACCESS_KEY
wal:
retention: "5d"
data:
retention: "30d"
jobs: 1
```
### Implementation Tasks
#### 1. Deploy PostgreSQL operator (CloudNativePG recommended)
```bash
kubectl apply -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.20/releases/cnpg-1.20.1.yaml
```
#### 2. Configure cluster with primary/replica setup
- 3-node cluster with automatic failover
- Read-write split capability
- Streaming replication configuration
#### 3. Set up automated backups to MinIO or external storage
```yaml
apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
name: motovault-postgres-backup
spec:
schedule: "0 2 * * *" # Daily at 2 AM
backupOwnerReference: self
cluster:
name: motovault-postgres
```
#### 4. Implement connection pooling with PgBouncer
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: pgbouncer
spec:
replicas: 2
selector:
matchLabels:
app: pgbouncer
template:
spec:
containers:
- name: pgbouncer
image: pgbouncer/pgbouncer:latest
env:
- name: DATABASES_HOST
value: motovault-postgres-rw
- name: DATABASES_PORT
value: "5432"
- name: DATABASES_DATABASE
value: motovault
- name: POOL_MODE
value: session
- name: MAX_CLIENT_CONN
value: "1000"
- name: DEFAULT_POOL_SIZE
value: "25"
```
#### 5. Configure monitoring and alerting for database health
```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: postgres-metrics
spec:
selector:
matchLabels:
app.kubernetes.io/name: cloudnative-pg
endpoints:
- port: metrics
path: /metrics
```
## 2.4 Redis Cluster for Session Management
**Objective**: Implement distributed session storage and caching using Redis cluster.
**Current State**:
- In-memory session storage tied to individual application instances
- No distributed caching for expensive operations
- Configuration and translation data loaded on each application start
**Target State**:
- Redis cluster for distributed session storage
- Centralized caching for frequently accessed data
- High availability with automatic failover
### Redis Cluster Configuration
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-cluster-config
namespace: motovault
data:
redis.conf: |
cluster-enabled yes
cluster-require-full-coverage no
cluster-node-timeout 15000
cluster-config-file /data/nodes.conf
cluster-migration-barrier 1
appendonly yes
appendfsync everysec
save 900 1
save 300 10
save 60 10000
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis-cluster
namespace: motovault
spec:
serviceName: redis-cluster
replicas: 6
selector:
matchLabels:
app: redis-cluster
template:
metadata:
labels:
app: redis-cluster
spec:
containers:
- name: redis
image: redis:7-alpine
command:
- redis-server
- /etc/redis/redis.conf
ports:
- containerPort: 6379
- containerPort: 16379
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
volumeMounts:
- name: redis-config
mountPath: /etc/redis
- name: redis-data
mountPath: /data
volumes:
- name: redis-config
configMap:
name: redis-cluster-config
volumeClaimTemplates:
- metadata:
name: redis-data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
```
### Implementation Tasks
#### 1. Deploy Redis cluster with 6 nodes (3 masters, 3 replicas)
```bash
# Initialize Redis cluster after deployment
kubectl exec -it redis-cluster-0 -- redis-cli --cluster create \
redis-cluster-0.redis-cluster:6379 \
redis-cluster-1.redis-cluster:6379 \
redis-cluster-2.redis-cluster:6379 \
redis-cluster-3.redis-cluster:6379 \
redis-cluster-4.redis-cluster:6379 \
redis-cluster-5.redis-cluster:6379 \
--cluster-replicas 1
```
#### 2. Configure session storage
```csharp
services.AddStackExchangeRedisCache(options =>
{
options.Configuration = configuration.GetConnectionString("Redis");
options.InstanceName = "MotoVault";
});
services.AddSession(options =>
{
options.IdleTimeout = TimeSpan.FromMinutes(30);
options.Cookie.HttpOnly = true;
options.Cookie.IsEssential = true;
options.Cookie.SecurePolicy = CookieSecurePolicy.Always;
});
```
#### 3. Implement distributed caching
```csharp
public class CachedTranslationService : ITranslationService
{
private readonly IDistributedCache _cache;
private readonly ITranslationService _translationService;
private readonly ILogger<CachedTranslationService> _logger;
public async Task<string> GetTranslationAsync(string key, string language)
{
var cacheKey = $"translation:{language}:{key}";
var cached = await _cache.GetStringAsync(cacheKey);
if (cached != null)
{
return cached;
}
var translation = await _translationService.GetTranslationAsync(key, language);
await _cache.SetStringAsync(cacheKey, translation, new DistributedCacheEntryOptions
{
SlidingExpiration = TimeSpan.FromHours(1)
});
return translation;
}
}
```
#### 4. Add cache monitoring and performance metrics
```csharp
public class CacheMetricsService
{
private readonly Counter _cacheHits;
private readonly Counter _cacheMisses;
private readonly Histogram _cacheOperationDuration;
public CacheMetricsService()
{
_cacheHits = Metrics.CreateCounter(
"motovault_cache_hits_total",
"Total cache hits",
new[] { "cache_type" });
_cacheMisses = Metrics.CreateCounter(
"motovault_cache_misses_total",
"Total cache misses",
new[] { "cache_type" });
_cacheOperationDuration = Metrics.CreateHistogram(
"motovault_cache_operation_duration_seconds",
"Cache operation duration",
new[] { "operation", "cache_type" });
}
}
```
## Week-by-Week Breakdown
### Week 5: MinIO Deployment
- **Days 1-2**: Deploy MinIO operator and configure basic cluster
- **Days 3-4**: Implement file storage abstraction interface
- **Days 5-7**: Create MinIO storage service implementation
### Week 6: File Migration and PostgreSQL HA
- **Days 1-2**: Complete file storage abstraction and migration tools
- **Days 3-4**: Deploy PostgreSQL operator and HA cluster
- **Days 5-7**: Configure connection pooling and backup strategies
### Week 7: Redis Cluster and Caching
- **Days 1-3**: Deploy Redis cluster and configure session storage
- **Days 4-5**: Implement distributed caching layer
- **Days 6-7**: Add cache monitoring and performance metrics
### Week 8: Integration and Testing
- **Days 1-3**: End-to-end testing of all HA components
- **Days 4-5**: Performance testing and optimization
- **Days 6-7**: Documentation and preparation for Phase 3
## Success Criteria
- [ ] MinIO cluster operational with erasure coding
- [ ] File storage abstraction implemented and tested
- [ ] PostgreSQL HA cluster with automatic failover
- [ ] Redis cluster providing distributed sessions
- [ ] All file operations migrated to object storage
- [ ] Comprehensive monitoring for all infrastructure components
- [ ] Backup and recovery procedures validated
## Testing Requirements
### Infrastructure Tests
- MinIO cluster failover scenarios
- PostgreSQL primary/replica failover
- Redis cluster node failure recovery
- Network partition handling
### Application Integration Tests
- File upload/download through abstraction layer
- Session persistence across application restarts
- Cache performance and invalidation
- Database connection pool behavior
### Performance Tests
- File storage throughput and latency
- Database query performance with connection pooling
- Cache hit/miss ratios and response times
## Deliverables
1. **Infrastructure Components**
- MinIO HA cluster configuration
- PostgreSQL HA cluster with operator
- Redis cluster deployment
- Monitoring and alerting setup
2. **Application Updates**
- File storage abstraction implementation
- Session management configuration
- Distributed caching integration
- Connection pooling optimization
3. **Migration Tools**
- File migration utility
- Database migration scripts
- Configuration migration helpers
4. **Documentation**
- Infrastructure architecture diagrams
- Operational procedures
- Monitoring and alerting guides
## Dependencies
- Kubernetes cluster with sufficient resources
- Storage classes for persistent volumes
- Prometheus and Grafana for monitoring
- Network connectivity between components
## Risks and Mitigations
### Risk: Data Corruption During File Migration
**Mitigation**: Checksum validation and parallel running of old/new systems
### Risk: Database Failover Issues
**Mitigation**: Extensive testing of failover scenarios and automated recovery
### Risk: Cache Inconsistency
**Mitigation**: Proper cache invalidation strategies and monitoring
---
**Previous Phase**: [Phase 1: Core Kubernetes Readiness](K8S-PHASE-1.md)
**Next Phase**: [Phase 3: Production Deployment](K8S-PHASE-3.md)