558 lines
14 KiB
Markdown
558 lines
14 KiB
Markdown
# Database Migration Guide - Agent 1
|
|
|
|
## Task: Replace vehicles.* schema with new ETL-generated database
|
|
|
|
**Status**: Ready for Implementation
|
|
**Dependencies**: None (can start immediately)
|
|
**Estimated Time**: 30 minutes
|
|
**Assigned To**: Agent 1 (Database)
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
Replace the normalized vehicles.* schema with a denormalized vehicle_options table populated from ETL-generated data (1.1M+ records from 1980-2026).
|
|
|
|
---
|
|
|
|
## Prerequisites
|
|
|
|
### Required Files
|
|
All files are already present in the repository:
|
|
|
|
```
|
|
data/make-model-import/migrations/001_create_vehicle_database.sql
|
|
data/make-model-import/output/01_engines.sql
|
|
data/make-model-import/output/02_transmissions.sql
|
|
data/make-model-import/output/03_vehicle_options.sql
|
|
```
|
|
|
|
### Database Access
|
|
```bash
|
|
# Verify Docker container is running
|
|
docker ps | grep mvp-postgres
|
|
|
|
# Access PostgreSQL
|
|
docker exec -it mvp-postgres psql -U postgres -d motovaultpro
|
|
```
|
|
|
|
---
|
|
|
|
## Step 1: Backup Current Schema (Safety)
|
|
|
|
Before making any changes, backup the existing vehicles.* schema:
|
|
|
|
```bash
|
|
# Create backup directory
|
|
mkdir -p data/backups
|
|
|
|
# Dump vehicles schema only
|
|
docker exec mvp-postgres pg_dump -U postgres -d motovaultpro \
|
|
--schema=vehicles \
|
|
--format=plain \
|
|
--file=/tmp/vehicles_schema_backup.sql
|
|
|
|
# Copy backup to host
|
|
docker cp mvp-postgres:/tmp/vehicles_schema_backup.sql \
|
|
data/backups/vehicles_schema_backup_$(date +%Y%m%d_%H%M%S).sql
|
|
|
|
# Verify backup exists
|
|
ls -lh data/backups/
|
|
```
|
|
|
|
**Verification**: Backup file should be 100KB-1MB in size
|
|
|
|
---
|
|
|
|
## Step 2: Drop Existing vehicles.* Tables
|
|
|
|
Drop all normalized tables in the vehicles schema:
|
|
|
|
```bash
|
|
docker exec -it mvp-postgres psql -U postgres -d motovaultpro <<EOF
|
|
-- Drop tables in correct order (respect foreign keys)
|
|
DROP TABLE IF EXISTS vehicles.trim_transmission CASCADE;
|
|
DROP TABLE IF EXISTS vehicles.trim_engine CASCADE;
|
|
DROP TABLE IF EXISTS vehicles.transmission CASCADE;
|
|
DROP TABLE IF EXISTS vehicles.engine CASCADE;
|
|
DROP TABLE IF EXISTS vehicles.trim CASCADE;
|
|
DROP TABLE IF EXISTS vehicles.model_year CASCADE;
|
|
DROP TABLE IF EXISTS vehicles.model CASCADE;
|
|
DROP TABLE IF EXISTS vehicles.make CASCADE;
|
|
|
|
-- Drop views if they exist
|
|
DROP VIEW IF EXISTS vehicles.available_years CASCADE;
|
|
DROP VIEW IF EXISTS vehicles.makes_by_year CASCADE;
|
|
DROP VIEW IF EXISTS vehicles.models_by_year_make CASCADE;
|
|
|
|
-- Optionally drop the entire schema
|
|
-- DROP SCHEMA IF EXISTS vehicles CASCADE;
|
|
|
|
-- Verify all tables dropped
|
|
SELECT table_name FROM information_schema.tables
|
|
WHERE table_schema = 'vehicles';
|
|
EOF
|
|
```
|
|
|
|
**Verification**: Query should return 0 rows (no tables left in vehicles schema)
|
|
|
|
**Note**: This is a destructive operation. Ensure backup completed successfully before proceeding.
|
|
|
|
---
|
|
|
|
## Step 3: Run New Migration
|
|
|
|
Execute the new schema migration that creates:
|
|
- `engines` table
|
|
- `transmissions` table
|
|
- `vehicle_options` table
|
|
- Database functions for cascade queries
|
|
- Composite indexes
|
|
|
|
```bash
|
|
# Run migration SQL
|
|
docker exec -i mvp-postgres psql -U postgres -d motovaultpro \
|
|
< data/make-model-import/migrations/001_create_vehicle_database.sql
|
|
```
|
|
|
|
**Verification**: Check for error messages. Successful output should include:
|
|
```
|
|
CREATE TABLE
|
|
CREATE TABLE
|
|
CREATE TABLE
|
|
CREATE INDEX
|
|
CREATE INDEX
|
|
CREATE FUNCTION
|
|
...
|
|
```
|
|
|
|
---
|
|
|
|
## Step 4: Verify Schema Created
|
|
|
|
Check that all tables and functions were created successfully:
|
|
|
|
```bash
|
|
docker exec -it mvp-postgres psql -U postgres -d motovaultpro <<EOF
|
|
-- List all tables
|
|
\dt
|
|
|
|
-- Describe engines table
|
|
\d engines
|
|
|
|
-- Describe transmissions table
|
|
\d transmissions
|
|
|
|
-- Describe vehicle_options table
|
|
\d vehicle_options
|
|
|
|
-- List indexes on vehicle_options
|
|
\di vehicle_options*
|
|
|
|
-- List functions
|
|
\df get_makes_for_year
|
|
\df get_models_for_year_make
|
|
\df get_trims_for_year_make_model
|
|
\df get_options_for_vehicle
|
|
EOF
|
|
```
|
|
|
|
**Expected Output**:
|
|
- 3 tables: `engines`, `transmissions`, `vehicle_options`
|
|
- Indexes: `idx_vehicle_year_make`, `idx_vehicle_year_make_model`, `idx_vehicle_year_make_model_trim`
|
|
- 4 database functions
|
|
|
|
---
|
|
|
|
## Step 5: Import Engines Data
|
|
|
|
Import 30,066 engine records:
|
|
|
|
```bash
|
|
docker exec -i mvp-postgres psql -U postgres -d motovaultpro \
|
|
< data/make-model-import/output/01_engines.sql
|
|
```
|
|
|
|
**Verification**:
|
|
```bash
|
|
docker exec mvp-postgres psql -U postgres -d motovaultpro \
|
|
-c "SELECT COUNT(*) FROM engines;"
|
|
```
|
|
|
|
**Expected**: 30,066 rows
|
|
|
|
**Sample Data Check**:
|
|
```bash
|
|
docker exec mvp-postgres psql -U postgres -d motovaultpro \
|
|
-c "SELECT id, name FROM engines LIMIT 10;"
|
|
```
|
|
|
|
**Expected Format**: Names like "V8 5.0L", "L4 2.0L Turbo", "V6 3.5L"
|
|
|
|
---
|
|
|
|
## Step 6: Import Transmissions Data
|
|
|
|
Import 828 transmission records:
|
|
|
|
```bash
|
|
docker exec -i mvp-postgres psql -U postgres -d motovaultpro \
|
|
< data/make-model-import/output/02_transmissions.sql
|
|
```
|
|
|
|
**Verification**:
|
|
```bash
|
|
docker exec mvp-postgres psql -U postgres -d motovaultpro \
|
|
-c "SELECT COUNT(*) FROM transmissions;"
|
|
```
|
|
|
|
**Expected**: 828 rows
|
|
|
|
**Sample Data Check**:
|
|
```bash
|
|
docker exec mvp-postgres psql -U postgres -d motovaultpro \
|
|
-c "SELECT id, type FROM transmissions LIMIT 10;"
|
|
```
|
|
|
|
**Expected Format**: Types like "8-Speed Automatic", "6-Speed Manual", "CVT"
|
|
|
|
---
|
|
|
|
## Step 7: Import Vehicle Options Data
|
|
|
|
Import 1,122,644 vehicle option records (this may take 2-5 minutes):
|
|
|
|
```bash
|
|
docker exec -i mvp-postgres psql -U postgres -d motovaultpro \
|
|
< data/make-model-import/output/03_vehicle_options.sql
|
|
```
|
|
|
|
**Note**: This is the largest import (51MB SQL file). You should see periodic output as batches are inserted.
|
|
|
|
**Verification**:
|
|
```bash
|
|
docker exec mvp-postgres psql -U postgres -d motovaultpro \
|
|
-c "SELECT COUNT(*) FROM vehicle_options;"
|
|
```
|
|
|
|
**Expected**: 1,122,644 rows
|
|
|
|
**Sample Data Check**:
|
|
```bash
|
|
docker exec mvp-postgres psql -U postgres -d motovaultpro \
|
|
-c "SELECT year, make, model, trim FROM vehicle_options LIMIT 10;"
|
|
```
|
|
|
|
**Expected**: Data like:
|
|
```
|
|
year | make | model | trim
|
|
------+---------+---------+---------------
|
|
2024 | Ford | F-150 | XLT SuperCrew
|
|
2024 | Honda | Civic | Sport Touring
|
|
2023 | Toyota | Camry | SE
|
|
```
|
|
|
|
---
|
|
|
|
## Step 8: Verify Data Quality
|
|
|
|
Run quality checks on imported data:
|
|
|
|
### Check Year Range
|
|
```bash
|
|
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
|
|
SELECT
|
|
MIN(year) as min_year,
|
|
MAX(year) as max_year,
|
|
COUNT(DISTINCT year) as total_years
|
|
FROM vehicle_options;
|
|
EOF
|
|
```
|
|
|
|
**Expected**: min_year=1980, max_year=2026, total_years=47
|
|
|
|
### Check Make Count
|
|
```bash
|
|
docker exec mvp-postgres psql -U postgres -d motovaultpro \
|
|
-c "SELECT COUNT(DISTINCT make) FROM vehicle_options;"
|
|
```
|
|
|
|
**Expected**: 53 makes
|
|
|
|
### Check NULL Engine IDs (Electric Vehicles)
|
|
```bash
|
|
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
|
|
SELECT
|
|
COUNT(*) as total_records,
|
|
COUNT(*) FILTER (WHERE engine_id IS NULL) as null_engines,
|
|
ROUND(100.0 * COUNT(*) FILTER (WHERE engine_id IS NULL) / COUNT(*), 2) as null_percentage
|
|
FROM vehicle_options;
|
|
EOF
|
|
```
|
|
|
|
**Expected**: ~1.1% NULL engine_id (approximately 11,951 records)
|
|
|
|
### Sample Electric Vehicle Data
|
|
```bash
|
|
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
|
|
SELECT year, make, model, trim, engine_id, transmission_id
|
|
FROM vehicle_options
|
|
WHERE engine_id IS NULL
|
|
LIMIT 10;
|
|
EOF
|
|
```
|
|
|
|
**Expected**: Should see Tesla, Lucid, Rivian, or other electric vehicles with NULL engine_id
|
|
|
|
---
|
|
|
|
## Step 9: Test Database Functions
|
|
|
|
Test the cascade query functions:
|
|
|
|
### Test 1: Get Makes for Year
|
|
```bash
|
|
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
|
|
SELECT * FROM get_makes_for_year(2024) LIMIT 10;
|
|
EOF
|
|
```
|
|
|
|
**Expected**: Returns string list of makes: "Ford", "Honda", "Toyota", etc.
|
|
|
|
### Test 2: Get Models for Year and Make
|
|
```bash
|
|
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
|
|
SELECT * FROM get_models_for_year_make(2024, 'Ford') LIMIT 10;
|
|
EOF
|
|
```
|
|
|
|
**Expected**: Returns Ford models: "F-150", "Mustang", "Explorer", etc.
|
|
|
|
### Test 3: Get Trims for Year, Make, Model
|
|
```bash
|
|
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
|
|
SELECT * FROM get_trims_for_year_make_model(2024, 'Ford', 'F-150') LIMIT 10;
|
|
EOF
|
|
```
|
|
|
|
**Expected**: Returns F-150 trims: "XLT", "Lariat", "King Ranch", etc.
|
|
|
|
### Test 4: Get Options for Vehicle
|
|
```bash
|
|
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
|
|
SELECT engine_name, transmission_type
|
|
FROM get_options_for_vehicle(2024, 'Ford', 'F-150', 'XLT')
|
|
LIMIT 10;
|
|
EOF
|
|
```
|
|
|
|
**Expected**: Returns engine/transmission combinations available for 2024 Ford F-150 XLT
|
|
|
|
---
|
|
|
|
## Step 10: Performance Validation
|
|
|
|
Verify query performance is sub-50ms as claimed:
|
|
|
|
### Test Index Usage
|
|
```bash
|
|
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
|
|
EXPLAIN ANALYZE
|
|
SELECT DISTINCT make
|
|
FROM vehicle_options
|
|
WHERE year = 2024;
|
|
EOF
|
|
```
|
|
|
|
**Expected**: Query plan should show index usage:
|
|
```
|
|
Index Scan using idx_vehicle_year_make ...
|
|
Execution Time: < 50 ms
|
|
```
|
|
|
|
### Test Cascade Query Performance
|
|
```bash
|
|
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
|
|
EXPLAIN ANALYZE
|
|
SELECT DISTINCT model
|
|
FROM vehicle_options
|
|
WHERE year = 2024 AND make = 'Ford';
|
|
EOF
|
|
```
|
|
|
|
**Expected**: Should use composite index `idx_vehicle_year_make`, execution time < 50ms
|
|
|
|
---
|
|
|
|
## Completion Checklist
|
|
|
|
Before signaling completion, verify:
|
|
|
|
- [ ] Backup of old schema created successfully
|
|
- [ ] Old vehicles.* tables dropped
|
|
- [ ] New migration executed without errors
|
|
- [ ] Engines table has 30,066 records
|
|
- [ ] Transmissions table has 828 records
|
|
- [ ] Vehicle_options table has 1,122,644 records
|
|
- [ ] Year range is 1980-2026 (47 years)
|
|
- [ ] 53 distinct makes present
|
|
- [ ] ~1.1% of records have NULL engine_id
|
|
- [ ] All 4 database functions exist and return data
|
|
- [ ] Composite indexes created (3 indexes)
|
|
- [ ] Query performance is sub-50ms
|
|
- [ ] No error messages in PostgreSQL logs
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Error: "relation already exists"
|
|
**Cause**: Tables from old migration still present
|
|
|
|
**Solution**:
|
|
```bash
|
|
# Drop tables explicitly
|
|
docker exec -it mvp-postgres psql -U postgres -d motovaultpro \
|
|
-c "DROP TABLE IF EXISTS vehicle_options CASCADE;"
|
|
# Then re-run migration
|
|
```
|
|
|
|
### Error: "duplicate key value violates unique constraint"
|
|
**Cause**: Data already imported, trying to import again
|
|
|
|
**Solution**:
|
|
```bash
|
|
# Truncate tables
|
|
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
|
|
TRUNCATE TABLE vehicle_options CASCADE;
|
|
TRUNCATE TABLE engines CASCADE;
|
|
TRUNCATE TABLE transmissions CASCADE;
|
|
EOF
|
|
# Then re-import data
|
|
```
|
|
|
|
### Import Takes Too Long
|
|
**Symptom**: Import hangs or takes > 10 minutes
|
|
|
|
**Solution**:
|
|
1. Check Docker resources (increase memory/CPU if needed)
|
|
2. Check disk space: `df -h`
|
|
3. Check PostgreSQL logs: `docker logs mvp-postgres`
|
|
4. Try importing in smaller batches (split SQL files if necessary)
|
|
|
|
### Performance Issues
|
|
**Symptom**: Queries take > 100ms
|
|
|
|
**Solution**:
|
|
```bash
|
|
# Verify indexes were created
|
|
docker exec mvp-postgres psql -U postgres -d motovaultpro \
|
|
-c "\di vehicle_options*"
|
|
|
|
# Analyze tables for query optimizer
|
|
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
|
|
ANALYZE engines;
|
|
ANALYZE transmissions;
|
|
ANALYZE vehicle_options;
|
|
EOF
|
|
```
|
|
|
|
---
|
|
|
|
## Rollback Procedure
|
|
|
|
If you need to rollback:
|
|
|
|
```bash
|
|
# Drop new tables
|
|
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
|
|
DROP TABLE IF EXISTS vehicle_options CASCADE;
|
|
DROP TABLE IF EXISTS transmissions CASCADE;
|
|
DROP TABLE IF EXISTS engines CASCADE;
|
|
DROP FUNCTION IF EXISTS get_makes_for_year;
|
|
DROP FUNCTION IF EXISTS get_models_for_year_make;
|
|
DROP FUNCTION IF EXISTS get_trims_for_year_make_model;
|
|
DROP FUNCTION IF EXISTS get_options_for_vehicle;
|
|
EOF
|
|
|
|
# Restore from backup
|
|
docker cp data/backups/vehicles_schema_backup_<timestamp>.sql mvp-postgres:/tmp/
|
|
docker exec -i mvp-postgres psql -U postgres -d motovaultpro \
|
|
< /tmp/vehicles_schema_backup_<timestamp>.sql
|
|
```
|
|
|
|
---
|
|
|
|
## Handoff to Agent 2
|
|
|
|
Once complete, provide this information to Agent 2 (Platform Repository):
|
|
|
|
### Database Contract
|
|
|
|
**Tables Available:**
|
|
```sql
|
|
engines (id, name)
|
|
transmissions (id, type)
|
|
vehicle_options (id, year, make, model, trim, engine_id, transmission_id)
|
|
```
|
|
|
|
**Functions Available:**
|
|
```sql
|
|
get_makes_for_year(year INT) → TABLE(make VARCHAR)
|
|
get_models_for_year_make(year INT, make VARCHAR) → TABLE(model VARCHAR)
|
|
get_trims_for_year_make_model(year INT, make VARCHAR, model VARCHAR) → TABLE(trim_name VARCHAR)
|
|
get_options_for_vehicle(year INT, make VARCHAR, model VARCHAR, trim VARCHAR)
|
|
→ TABLE(engine_name VARCHAR, transmission_type VARCHAR, ...)
|
|
```
|
|
|
|
**Data Quality Notes:**
|
|
- Makes are in Title Case: "Ford", not "FORD"
|
|
- 1.1% of records have NULL engine_id (electric vehicles)
|
|
- Year range: 1980-2026
|
|
- 53 makes, 1,741 models, 1,122,644 total configurations
|
|
|
|
**Performance:**
|
|
- All queries using indexes perform sub-50ms
|
|
- Cascade queries optimized with composite indexes
|
|
|
|
### Verification Command
|
|
Agent 2 can verify database is ready:
|
|
```bash
|
|
docker exec mvp-postgres psql -U postgres -d motovaultpro \
|
|
-c "SELECT COUNT(*) FROM vehicle_options;"
|
|
```
|
|
Should return: 1122644
|
|
|
|
---
|
|
|
|
## Completion Message Template
|
|
|
|
```
|
|
Agent 1 (Database Migration): COMPLETE
|
|
|
|
Changes Made:
|
|
- Dropped vehicles.* schema tables (backup created)
|
|
- Executed 001_create_vehicle_database.sql migration
|
|
- Imported 30,066 engines
|
|
- Imported 828 transmissions
|
|
- Imported 1,122,644 vehicle options
|
|
|
|
Verification:
|
|
✓ All tables created with correct record counts
|
|
✓ Database functions operational
|
|
✓ Composite indexes created
|
|
✓ Query performance sub-50ms
|
|
✓ Data quality checks passed
|
|
|
|
Database is ready for Agent 2 (Platform Repository) to begin implementation.
|
|
|
|
Files modified: None (database only)
|
|
New schema: public.engines, public.transmissions, public.vehicle_options
|
|
```
|
|
|
|
---
|
|
|
|
**Document Version**: 1.0
|
|
**Last Updated**: 2025-11-10
|
|
**Status**: Ready for Implementation
|