Files
motovaultpro/docs/changes/database-20251111/database-migration.md
2025-11-11 10:29:02 -06:00

558 lines
14 KiB
Markdown

# Database Migration Guide - Agent 1
## Task: Replace vehicles.* schema with new ETL-generated database
**Status**: Ready for Implementation
**Dependencies**: None (can start immediately)
**Estimated Time**: 30 minutes
**Assigned To**: Agent 1 (Database)
---
## Overview
Replace the normalized vehicles.* schema with a denormalized vehicle_options table populated from ETL-generated data (1.1M+ records from 1980-2026).
---
## Prerequisites
### Required Files
All files are already present in the repository:
```
data/make-model-import/migrations/001_create_vehicle_database.sql
data/make-model-import/output/01_engines.sql
data/make-model-import/output/02_transmissions.sql
data/make-model-import/output/03_vehicle_options.sql
```
### Database Access
```bash
# Verify Docker container is running
docker ps | grep mvp-postgres
# Access PostgreSQL
docker exec -it mvp-postgres psql -U postgres -d motovaultpro
```
---
## Step 1: Backup Current Schema (Safety)
Before making any changes, backup the existing vehicles.* schema:
```bash
# Create backup directory
mkdir -p data/backups
# Dump vehicles schema only
docker exec mvp-postgres pg_dump -U postgres -d motovaultpro \
--schema=vehicles \
--format=plain \
--file=/tmp/vehicles_schema_backup.sql
# Copy backup to host
docker cp mvp-postgres:/tmp/vehicles_schema_backup.sql \
data/backups/vehicles_schema_backup_$(date +%Y%m%d_%H%M%S).sql
# Verify backup exists
ls -lh data/backups/
```
**Verification**: Backup file should be 100KB-1MB in size
---
## Step 2: Drop Existing vehicles.* Tables
Drop all normalized tables in the vehicles schema:
```bash
docker exec -it mvp-postgres psql -U postgres -d motovaultpro <<EOF
-- Drop tables in correct order (respect foreign keys)
DROP TABLE IF EXISTS vehicles.trim_transmission CASCADE;
DROP TABLE IF EXISTS vehicles.trim_engine CASCADE;
DROP TABLE IF EXISTS vehicles.transmission CASCADE;
DROP TABLE IF EXISTS vehicles.engine CASCADE;
DROP TABLE IF EXISTS vehicles.trim CASCADE;
DROP TABLE IF EXISTS vehicles.model_year CASCADE;
DROP TABLE IF EXISTS vehicles.model CASCADE;
DROP TABLE IF EXISTS vehicles.make CASCADE;
-- Drop views if they exist
DROP VIEW IF EXISTS vehicles.available_years CASCADE;
DROP VIEW IF EXISTS vehicles.makes_by_year CASCADE;
DROP VIEW IF EXISTS vehicles.models_by_year_make CASCADE;
-- Optionally drop the entire schema
-- DROP SCHEMA IF EXISTS vehicles CASCADE;
-- Verify all tables dropped
SELECT table_name FROM information_schema.tables
WHERE table_schema = 'vehicles';
EOF
```
**Verification**: Query should return 0 rows (no tables left in vehicles schema)
**Note**: This is a destructive operation. Ensure backup completed successfully before proceeding.
---
## Step 3: Run New Migration
Execute the new schema migration that creates:
- `engines` table
- `transmissions` table
- `vehicle_options` table
- Database functions for cascade queries
- Composite indexes
```bash
# Run migration SQL
docker exec -i mvp-postgres psql -U postgres -d motovaultpro \
< data/make-model-import/migrations/001_create_vehicle_database.sql
```
**Verification**: Check for error messages. Successful output should include:
```
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE INDEX
CREATE INDEX
CREATE FUNCTION
...
```
---
## Step 4: Verify Schema Created
Check that all tables and functions were created successfully:
```bash
docker exec -it mvp-postgres psql -U postgres -d motovaultpro <<EOF
-- List all tables
\dt
-- Describe engines table
\d engines
-- Describe transmissions table
\d transmissions
-- Describe vehicle_options table
\d vehicle_options
-- List indexes on vehicle_options
\di vehicle_options*
-- List functions
\df get_makes_for_year
\df get_models_for_year_make
\df get_trims_for_year_make_model
\df get_options_for_vehicle
EOF
```
**Expected Output**:
- 3 tables: `engines`, `transmissions`, `vehicle_options`
- Indexes: `idx_vehicle_year_make`, `idx_vehicle_year_make_model`, `idx_vehicle_year_make_model_trim`
- 4 database functions
---
## Step 5: Import Engines Data
Import 30,066 engine records:
```bash
docker exec -i mvp-postgres psql -U postgres -d motovaultpro \
< data/make-model-import/output/01_engines.sql
```
**Verification**:
```bash
docker exec mvp-postgres psql -U postgres -d motovaultpro \
-c "SELECT COUNT(*) FROM engines;"
```
**Expected**: 30,066 rows
**Sample Data Check**:
```bash
docker exec mvp-postgres psql -U postgres -d motovaultpro \
-c "SELECT id, name FROM engines LIMIT 10;"
```
**Expected Format**: Names like "V8 5.0L", "L4 2.0L Turbo", "V6 3.5L"
---
## Step 6: Import Transmissions Data
Import 828 transmission records:
```bash
docker exec -i mvp-postgres psql -U postgres -d motovaultpro \
< data/make-model-import/output/02_transmissions.sql
```
**Verification**:
```bash
docker exec mvp-postgres psql -U postgres -d motovaultpro \
-c "SELECT COUNT(*) FROM transmissions;"
```
**Expected**: 828 rows
**Sample Data Check**:
```bash
docker exec mvp-postgres psql -U postgres -d motovaultpro \
-c "SELECT id, type FROM transmissions LIMIT 10;"
```
**Expected Format**: Types like "8-Speed Automatic", "6-Speed Manual", "CVT"
---
## Step 7: Import Vehicle Options Data
Import 1,122,644 vehicle option records (this may take 2-5 minutes):
```bash
docker exec -i mvp-postgres psql -U postgres -d motovaultpro \
< data/make-model-import/output/03_vehicle_options.sql
```
**Note**: This is the largest import (51MB SQL file). You should see periodic output as batches are inserted.
**Verification**:
```bash
docker exec mvp-postgres psql -U postgres -d motovaultpro \
-c "SELECT COUNT(*) FROM vehicle_options;"
```
**Expected**: 1,122,644 rows
**Sample Data Check**:
```bash
docker exec mvp-postgres psql -U postgres -d motovaultpro \
-c "SELECT year, make, model, trim FROM vehicle_options LIMIT 10;"
```
**Expected**: Data like:
```
year | make | model | trim
------+---------+---------+---------------
2024 | Ford | F-150 | XLT SuperCrew
2024 | Honda | Civic | Sport Touring
2023 | Toyota | Camry | SE
```
---
## Step 8: Verify Data Quality
Run quality checks on imported data:
### Check Year Range
```bash
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
SELECT
MIN(year) as min_year,
MAX(year) as max_year,
COUNT(DISTINCT year) as total_years
FROM vehicle_options;
EOF
```
**Expected**: min_year=1980, max_year=2026, total_years=47
### Check Make Count
```bash
docker exec mvp-postgres psql -U postgres -d motovaultpro \
-c "SELECT COUNT(DISTINCT make) FROM vehicle_options;"
```
**Expected**: 53 makes
### Check NULL Engine IDs (Electric Vehicles)
```bash
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
SELECT
COUNT(*) as total_records,
COUNT(*) FILTER (WHERE engine_id IS NULL) as null_engines,
ROUND(100.0 * COUNT(*) FILTER (WHERE engine_id IS NULL) / COUNT(*), 2) as null_percentage
FROM vehicle_options;
EOF
```
**Expected**: ~1.1% NULL engine_id (approximately 11,951 records)
### Sample Electric Vehicle Data
```bash
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
SELECT year, make, model, trim, engine_id, transmission_id
FROM vehicle_options
WHERE engine_id IS NULL
LIMIT 10;
EOF
```
**Expected**: Should see Tesla, Lucid, Rivian, or other electric vehicles with NULL engine_id
---
## Step 9: Test Database Functions
Test the cascade query functions:
### Test 1: Get Makes for Year
```bash
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
SELECT * FROM get_makes_for_year(2024) LIMIT 10;
EOF
```
**Expected**: Returns string list of makes: "Ford", "Honda", "Toyota", etc.
### Test 2: Get Models for Year and Make
```bash
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
SELECT * FROM get_models_for_year_make(2024, 'Ford') LIMIT 10;
EOF
```
**Expected**: Returns Ford models: "F-150", "Mustang", "Explorer", etc.
### Test 3: Get Trims for Year, Make, Model
```bash
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
SELECT * FROM get_trims_for_year_make_model(2024, 'Ford', 'F-150') LIMIT 10;
EOF
```
**Expected**: Returns F-150 trims: "XLT", "Lariat", "King Ranch", etc.
### Test 4: Get Options for Vehicle
```bash
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
SELECT engine_name, transmission_type
FROM get_options_for_vehicle(2024, 'Ford', 'F-150', 'XLT')
LIMIT 10;
EOF
```
**Expected**: Returns engine/transmission combinations available for 2024 Ford F-150 XLT
---
## Step 10: Performance Validation
Verify query performance is sub-50ms as claimed:
### Test Index Usage
```bash
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
EXPLAIN ANALYZE
SELECT DISTINCT make
FROM vehicle_options
WHERE year = 2024;
EOF
```
**Expected**: Query plan should show index usage:
```
Index Scan using idx_vehicle_year_make ...
Execution Time: < 50 ms
```
### Test Cascade Query Performance
```bash
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
EXPLAIN ANALYZE
SELECT DISTINCT model
FROM vehicle_options
WHERE year = 2024 AND make = 'Ford';
EOF
```
**Expected**: Should use composite index `idx_vehicle_year_make`, execution time < 50ms
---
## Completion Checklist
Before signaling completion, verify:
- [ ] Backup of old schema created successfully
- [ ] Old vehicles.* tables dropped
- [ ] New migration executed without errors
- [ ] Engines table has 30,066 records
- [ ] Transmissions table has 828 records
- [ ] Vehicle_options table has 1,122,644 records
- [ ] Year range is 1980-2026 (47 years)
- [ ] 53 distinct makes present
- [ ] ~1.1% of records have NULL engine_id
- [ ] All 4 database functions exist and return data
- [ ] Composite indexes created (3 indexes)
- [ ] Query performance is sub-50ms
- [ ] No error messages in PostgreSQL logs
---
## Troubleshooting
### Error: "relation already exists"
**Cause**: Tables from old migration still present
**Solution**:
```bash
# Drop tables explicitly
docker exec -it mvp-postgres psql -U postgres -d motovaultpro \
-c "DROP TABLE IF EXISTS vehicle_options CASCADE;"
# Then re-run migration
```
### Error: "duplicate key value violates unique constraint"
**Cause**: Data already imported, trying to import again
**Solution**:
```bash
# Truncate tables
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
TRUNCATE TABLE vehicle_options CASCADE;
TRUNCATE TABLE engines CASCADE;
TRUNCATE TABLE transmissions CASCADE;
EOF
# Then re-import data
```
### Import Takes Too Long
**Symptom**: Import hangs or takes > 10 minutes
**Solution**:
1. Check Docker resources (increase memory/CPU if needed)
2. Check disk space: `df -h`
3. Check PostgreSQL logs: `docker logs mvp-postgres`
4. Try importing in smaller batches (split SQL files if necessary)
### Performance Issues
**Symptom**: Queries take > 100ms
**Solution**:
```bash
# Verify indexes were created
docker exec mvp-postgres psql -U postgres -d motovaultpro \
-c "\di vehicle_options*"
# Analyze tables for query optimizer
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
ANALYZE engines;
ANALYZE transmissions;
ANALYZE vehicle_options;
EOF
```
---
## Rollback Procedure
If you need to rollback:
```bash
# Drop new tables
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
DROP TABLE IF EXISTS vehicle_options CASCADE;
DROP TABLE IF EXISTS transmissions CASCADE;
DROP TABLE IF EXISTS engines CASCADE;
DROP FUNCTION IF EXISTS get_makes_for_year;
DROP FUNCTION IF EXISTS get_models_for_year_make;
DROP FUNCTION IF EXISTS get_trims_for_year_make_model;
DROP FUNCTION IF EXISTS get_options_for_vehicle;
EOF
# Restore from backup
docker cp data/backups/vehicles_schema_backup_<timestamp>.sql mvp-postgres:/tmp/
docker exec -i mvp-postgres psql -U postgres -d motovaultpro \
< /tmp/vehicles_schema_backup_<timestamp>.sql
```
---
## Handoff to Agent 2
Once complete, provide this information to Agent 2 (Platform Repository):
### Database Contract
**Tables Available:**
```sql
engines (id, name)
transmissions (id, type)
vehicle_options (id, year, make, model, trim, engine_id, transmission_id)
```
**Functions Available:**
```sql
get_makes_for_year(year INT) TABLE(make VARCHAR)
get_models_for_year_make(year INT, make VARCHAR) TABLE(model VARCHAR)
get_trims_for_year_make_model(year INT, make VARCHAR, model VARCHAR) TABLE(trim_name VARCHAR)
get_options_for_vehicle(year INT, make VARCHAR, model VARCHAR, trim VARCHAR)
TABLE(engine_name VARCHAR, transmission_type VARCHAR, ...)
```
**Data Quality Notes:**
- Makes are in Title Case: "Ford", not "FORD"
- 1.1% of records have NULL engine_id (electric vehicles)
- Year range: 1980-2026
- 53 makes, 1,741 models, 1,122,644 total configurations
**Performance:**
- All queries using indexes perform sub-50ms
- Cascade queries optimized with composite indexes
### Verification Command
Agent 2 can verify database is ready:
```bash
docker exec mvp-postgres psql -U postgres -d motovaultpro \
-c "SELECT COUNT(*) FROM vehicle_options;"
```
Should return: 1122644
---
## Completion Message Template
```
Agent 1 (Database Migration): COMPLETE
Changes Made:
- Dropped vehicles.* schema tables (backup created)
- Executed 001_create_vehicle_database.sql migration
- Imported 30,066 engines
- Imported 828 transmissions
- Imported 1,122,644 vehicle options
Verification:
✓ All tables created with correct record counts
✓ Database functions operational
✓ Composite indexes created
✓ Query performance sub-50ms
✓ Data quality checks passed
Database is ready for Agent 2 (Platform Repository) to begin implementation.
Files modified: None (database only)
New schema: public.engines, public.transmissions, public.vehicle_options
```
---
**Document Version**: 1.0
**Last Updated**: 2025-11-10
**Status**: Ready for Implementation