Files
motovaultpro/docs/redesign/ROLLBACK-STRATEGY.md
Eric Gullickson 046c66fc7d Redesign
2025-11-01 21:27:42 -05:00

12 KiB

Rollback Strategy - Recovery Procedures

Overview

This document provides comprehensive rollback procedures for each phase of the simplification. Each phase can be rolled back independently, and full system rollback is available.

Pre-Execution Backup

Before Starting ANY Phase

# 1. Create backup branch
git checkout -b backup-$(date +%Y%m%d-%H%M%S)
git push origin backup-$(date +%Y%m%d-%H%M%S)

# 2. Tag current state
git tag -a pre-simplification-$(date +%Y%m%d) \
  -m "State before architecture simplification"
git push origin --tags

# 3. Export docker volumes
docker run --rm -v admin_postgres_data:/data -v $(pwd):/backup \
  alpine tar czf /backup/postgres-backup-$(date +%Y%m%d).tar.gz /data

docker run --rm -v admin_redis_data:/data -v $(pwd):/backup \
  alpine tar czf /backup/redis-backup-$(date +%Y%m%d).tar.gz /data

# 4. Export MinIO data (if documents exist)
docker run --rm -v admin_minio_data:/data -v $(pwd):/backup \
  alpine tar czf /backup/minio-backup-$(date +%Y%m%d).tar.gz /data

# 5. Document current state
docker compose ps > container-state-$(date +%Y%m%d).txt
docker network ls > network-state-$(date +%Y%m%d).txt

Per-Phase Rollback Procedures

Phase 1: Docker Compose Rollback

Rollback Trigger:

  • docker-compose.yml validation fails
  • Containers fail to start
  • Network errors
  • Volume mount issues

Rollback Steps:

# 1. Stop current containers
docker compose down

# 2. Restore docker-compose.yml
git checkout HEAD~1 -- docker-compose.yml

# 3. Restart with original config
docker compose up -d

# 4. Verify original state
docker compose ps  # Should show 14 containers

Validation:

  • 14 containers running
  • All containers healthy
  • No errors in logs

Risk: Low - file-based rollback, no data loss


Phase 2: Remove Tenant Rollback

Rollback Trigger:

  • Build errors after tenant code removal
  • Application won't start
  • Tests failing
  • Missing functionality

Rollback Steps:

# 1. Restore deleted files
git checkout HEAD~1 -- backend/src/core/middleware/tenant.ts
git checkout HEAD~1 -- backend/src/core/config/tenant.ts
git checkout HEAD~1 -- backend/src/features/tenant-management/

# 2. Restore modified files
git checkout HEAD~1 -- backend/src/app.ts
git checkout HEAD~1 -- backend/src/core/plugins/auth.plugin.ts

# 3. Rebuild backend
cd backend
npm install
npm run build

# 4. Restart backend container
docker compose restart mvp-backend  # or admin-backend if Phase 1 not done

Validation:

  • Backend builds successfully
  • Backend starts without errors
  • Tests pass
  • Tenant functionality restored

Risk: Low-Medium - code rollback, no data impact


Phase 3: Filesystem Storage Rollback

Rollback Trigger:

  • Document upload/download fails
  • File system errors
  • Permission issues
  • Data access errors

Rollback Steps:

# 1. Stop backend
docker compose stop mvp-backend

# 2. Restore storage adapter
git checkout HEAD~1 -- backend/src/core/storage/

# 3. Restore documents feature
git checkout HEAD~1 -- backend/src/features/documents/

# 4. Re-add MinIO to docker-compose.yml
git checkout HEAD~1 -- docker-compose.yml
# (Only MinIO service, keep other Phase 1 changes if applicable)

# 5. Restore MinIO data if backed up
docker volume create admin_minio_data
docker run --rm -v admin_minio_data:/data -v $(pwd):/backup \
  alpine tar xzf /backup/minio-backup-YYYYMMDD.tar.gz -C /

# 6. Rebuild and restart
docker compose up -d admin-minio
docker compose restart mvp-backend

Validation:

  • MinIO container running
  • Document upload works
  • Document download works
  • Existing documents accessible

Risk: Medium - requires MinIO restore, potential data migration


Phase 4: Config Cleanup Rollback

Rollback Trigger:

  • Service connection failures
  • Authentication errors
  • Missing configuration
  • Environment variable errors

Rollback Steps:

# 1. Restore config files
git checkout HEAD~1 -- config/app/production.yml
git checkout HEAD~1 -- .env
git checkout HEAD~1 -- .env.development

# 2. Restore secrets
git checkout HEAD~1 -- secrets/app/
git checkout HEAD~1 -- secrets/platform/

# 3. Restart affected services
docker compose restart mvp-backend mvp-platform

Validation:

  • Backend connects to database
  • Backend connects to Redis
  • Platform service accessible
  • Auth0 integration works

Risk: Low - configuration rollback, no data loss


Phase 5: Network Simplification Rollback

Rollback Trigger:

  • Service discovery failures
  • Network isolation broken
  • Container communication errors
  • Traefik routing issues

Rollback Steps:

# 1. Stop all services
docker compose down

# 2. Remove simplified networks
docker network rm motovaultpro_frontend motovaultpro_backend motovaultpro_database

# 3. Restore network configuration
git checkout HEAD~1 -- docker-compose.yml
# (Restore only networks section if possible)

# 4. Recreate networks and restart
docker compose up -d

# 5. Verify routing
curl http://localhost:8080/api/http/routers  # Traefik dashboard

Validation:

  • All 5 networks exist
  • Services can communicate
  • Traefik routes correctly
  • No network errors

Risk: Medium - requires container restart, brief downtime


Phase 6: Backend Updates Rollback

Rollback Trigger:

  • Service reference errors
  • API connection failures
  • Database connection issues
  • Build failures

Rollback Steps:

# 1. Restore backend code
git checkout HEAD~1 -- backend/src/core/config/config-loader.ts
git checkout HEAD~1 -- backend/src/features/vehicles/external/

# 2. Rebuild backend
cd backend
npm run build

# 3. Restart backend
docker compose restart mvp-backend

Validation:

  • Backend starts successfully
  • Connects to database
  • Platform client works
  • Tests pass

Risk: Low - code rollback, no data impact


Phase 7: Database Updates Rollback

Rollback Trigger:

  • Database connection failures
  • Schema errors
  • Migration failures
  • Data access issues

Rollback Steps:

# 1. Restore database configuration
git checkout HEAD~1 -- backend/src/_system/migrations/
git checkout HEAD~1 -- docker-compose.yml
# (Only database section)

# 2. Restore database volume if corrupted
docker compose down mvp-postgres
docker volume rm mvp_postgres_data
docker volume create admin_postgres_data
docker run --rm -v admin_postgres_data:/data -v $(pwd):/backup \
  alpine tar xzf /backup/postgres-backup-YYYYMMDD.tar.gz -C /

# 3. Restart database
docker compose up -d mvp-postgres

# 4. Re-run migrations if needed
docker compose exec mvp-backend node dist/_system/migrations/run-all.js

Validation:

  • Database accessible
  • All tables exist
  • Data intact
  • Migrations current

Risk: High - potential data loss if volume restore needed


Phase 8: Platform Service Rollback

Rollback Trigger:

  • Platform API failures
  • Database connection errors
  • Service crashes
  • API endpoint errors

Rollback Steps:

# 1. Stop simplified platform service
docker compose down mvp-platform

# 2. Restore platform service files
git checkout HEAD~1 -- mvp-platform-services/vehicles/

# 3. Restore full platform architecture in docker-compose.yml
git checkout HEAD~1 -- docker-compose.yml
# (Only platform services section)

# 4. Restore platform database
docker volume create mvp_platform_vehicles_db_data
docker run --rm -v mvp_platform_vehicles_db_data:/data -v $(pwd):/backup \
  alpine tar xzf /backup/platform-db-backup-YYYYMMDD.tar.gz -C /

# 5. Restart all platform services
docker compose up -d mvp-platform-vehicles-api mvp-platform-vehicles-db \
  mvp-platform-vehicles-redis mvp-platform-vehicles-etl

Validation:

  • Platform service accessible
  • API endpoints work
  • VIN decode works
  • Hierarchical data loads

Risk: Medium-High - requires multi-container restore


Phase 9: Documentation Rollback

Rollback Trigger:

  • Incorrect documentation
  • Missing instructions
  • Broken links
  • Confusion among team

Rollback Steps:

# 1. Restore all documentation
git checkout HEAD~1 -- README.md CLAUDE.md AI-INDEX.md
git checkout HEAD~1 -- docs/
git checkout HEAD~1 -- .ai/context.json
git checkout HEAD~1 -- Makefile
git checkout HEAD~1 -- backend/src/features/*/README.md

Validation:

  • Documentation accurate
  • Examples work
  • Makefile commands work

Risk: None - documentation only, no functional impact


Phase 10: Frontend Rollback

Rollback Trigger:

  • Build errors
  • Runtime errors
  • UI broken
  • API calls failing

Rollback Steps:

# 1. Restore frontend code
git checkout HEAD~1 -- frontend/src/

# 2. Rebuild frontend
cd frontend
npm install
npm run build

# 3. Restart frontend container
docker compose restart mvp-frontend

Validation:

  • Frontend builds successfully
  • UI loads without errors
  • Auth works
  • API calls work

Risk: Low - frontend rollback, no data impact


Phase 11: Testing Rollback

Note: Testing phase doesn't modify code, only validates. If tests fail, rollback appropriate phases based on failure analysis.


Full System Rollback

Complete Rollback to Pre-Simplification State

When to Use:

  • Multiple phases failing
  • Unrecoverable errors
  • Production blocker
  • Need to abort entire simplification

Rollback Steps:

# 1. Stop all services
docker compose down

# 2. Restore entire codebase
git checkout pre-simplification

# 3. Restore volumes
docker volume rm mvp_postgres_data mvp_redis_data
docker volume create admin_postgres_data admin_redis_data admin_minio_data

docker run --rm -v admin_postgres_data:/data -v $(pwd):/backup \
  alpine tar xzf /backup/postgres-backup-YYYYMMDD.tar.gz -C /

docker run --rm -v admin_redis_data:/data -v $(pwd):/backup \
  alpine tar xzf /backup/redis-backup-YYYYMMDD.tar.gz -C /

docker run --rm -v admin_minio_data:/data -v $(pwd):/backup \
  alpine tar xzf /backup/minio-backup-YYYYMMDD.tar.gz -C /

# 4. Restart all services
docker compose up -d

# 5. Verify original state
docker compose ps  # Should show 14 containers
make test

Validation:

  • All 14 containers running
  • All tests passing
  • Application functional
  • Data intact

Duration: 15-30 minutes Risk: Low if backups are good


Partial Rollback Scenarios

Scenario 1: Keep Infrastructure Changes, Rollback Backend

# Keep Phase 1 (Docker), rollback Phases 2-11
git checkout pre-simplification -- backend/ frontend/
docker compose restart mvp-backend mvp-frontend

Scenario 2: Keep Config Cleanup, Rollback Code

# Keep Phase 4, rollback Phases 1-3, 5-11
git checkout pre-simplification -- docker-compose.yml backend/src/ frontend/src/

Scenario 3: Rollback Only Storage

# Rollback Phase 3 only
git checkout HEAD~1 -- backend/src/core/storage/ backend/src/features/documents/
docker compose up -d admin-minio

Rollback Decision Matrix

Failure Type Rollback Scope Risk Duration
Container start fails Phase 1 Low 5 min
Build errors Specific phase Low 10 min
Test failures Investigate, partial Medium 15-30 min
Data corruption Full + restore High 30-60 min
Network issues Phase 5 Medium 10 min
Platform API down Phase 8 Medium 15 min
Critical production bug Full system Medium 30 min

Post-Rollback Actions

After any rollback:

  1. Document the Issue:

    # Create incident report
    echo "Rollback performed: $(date)" >> docs/redesign/ROLLBACK-LOG.md
    echo "Reason: [description]" >> docs/redesign/ROLLBACK-LOG.md
    echo "Phases rolled back: [list]" >> docs/redesign/ROLLBACK-LOG.md
    
  2. Analyze Root Cause:

    • Review logs
    • Identify failure point
    • Document lessons learned
  3. Plan Fix:

    • Address root cause
    • Update phase documentation
    • Add validation checks
  4. Retry (if appropriate):

    • Apply fix
    • Re-execute phase
    • Validate thoroughly

Emergency Contacts

If rollback fails or assistance needed:

  • Technical Lead: [contact]
  • DevOps Lead: [contact]
  • Database Admin: [contact]
  • Emergency Hotline: [contact]

Rollback Testing

Before starting simplification, test rollback procedures:

# Dry run rollback
git checkout -b rollback-test
# Make test change
echo "test" > test.txt
git add test.txt
git commit -m "test"

# Rollback test
git checkout HEAD~1 -- test.txt
# Verify rollback works

git checkout main
git branch -D rollback-test