14 KiB
Database Migration Guide - Agent 1
Task: Replace vehicles.* schema with new ETL-generated database
Status: Ready for Implementation Dependencies: None (can start immediately) Estimated Time: 30 minutes Assigned To: Agent 1 (Database)
Overview
Replace the normalized vehicles.* schema with a denormalized vehicle_options table populated from ETL-generated data (1.1M+ records from 1980-2026).
Prerequisites
Required Files
All files are already present in the repository:
data/make-model-import/migrations/001_create_vehicle_database.sql
data/make-model-import/output/01_engines.sql
data/make-model-import/output/02_transmissions.sql
data/make-model-import/output/03_vehicle_options.sql
Database Access
# Verify Docker container is running
docker ps | grep mvp-postgres
# Access PostgreSQL
docker exec -it mvp-postgres psql -U postgres -d motovaultpro
Step 1: Backup Current Schema (Safety)
Before making any changes, backup the existing vehicles.* schema:
# Create backup directory
mkdir -p data/backups
# Dump vehicles schema only
docker exec mvp-postgres pg_dump -U postgres -d motovaultpro \
--schema=vehicles \
--format=plain \
--file=/tmp/vehicles_schema_backup.sql
# Copy backup to host
docker cp mvp-postgres:/tmp/vehicles_schema_backup.sql \
data/backups/vehicles_schema_backup_$(date +%Y%m%d_%H%M%S).sql
# Verify backup exists
ls -lh data/backups/
Verification: Backup file should be 100KB-1MB in size
Step 2: Drop Existing vehicles.* Tables
Drop all normalized tables in the vehicles schema:
docker exec -it mvp-postgres psql -U postgres -d motovaultpro <<EOF
-- Drop tables in correct order (respect foreign keys)
DROP TABLE IF EXISTS vehicles.trim_transmission CASCADE;
DROP TABLE IF EXISTS vehicles.trim_engine CASCADE;
DROP TABLE IF EXISTS vehicles.transmission CASCADE;
DROP TABLE IF EXISTS vehicles.engine CASCADE;
DROP TABLE IF EXISTS vehicles.trim CASCADE;
DROP TABLE IF EXISTS vehicles.model_year CASCADE;
DROP TABLE IF EXISTS vehicles.model CASCADE;
DROP TABLE IF EXISTS vehicles.make CASCADE;
-- Drop views if they exist
DROP VIEW IF EXISTS vehicles.available_years CASCADE;
DROP VIEW IF EXISTS vehicles.makes_by_year CASCADE;
DROP VIEW IF EXISTS vehicles.models_by_year_make CASCADE;
-- Optionally drop the entire schema
-- DROP SCHEMA IF EXISTS vehicles CASCADE;
-- Verify all tables dropped
SELECT table_name FROM information_schema.tables
WHERE table_schema = 'vehicles';
EOF
Verification: Query should return 0 rows (no tables left in vehicles schema)
Note: This is a destructive operation. Ensure backup completed successfully before proceeding.
Step 3: Run New Migration
Execute the new schema migration that creates:
enginestabletransmissionstablevehicle_optionstable- Database functions for cascade queries
- Composite indexes
# Run migration SQL
docker exec -i mvp-postgres psql -U postgres -d motovaultpro \
< data/make-model-import/migrations/001_create_vehicle_database.sql
Verification: Check for error messages. Successful output should include:
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE INDEX
CREATE INDEX
CREATE FUNCTION
...
Step 4: Verify Schema Created
Check that all tables and functions were created successfully:
docker exec -it mvp-postgres psql -U postgres -d motovaultpro <<EOF
-- List all tables
\dt
-- Describe engines table
\d engines
-- Describe transmissions table
\d transmissions
-- Describe vehicle_options table
\d vehicle_options
-- List indexes on vehicle_options
\di vehicle_options*
-- List functions
\df get_makes_for_year
\df get_models_for_year_make
\df get_trims_for_year_make_model
\df get_options_for_vehicle
EOF
Expected Output:
- 3 tables:
engines,transmissions,vehicle_options - Indexes:
idx_vehicle_year_make,idx_vehicle_year_make_model,idx_vehicle_year_make_model_trim - 4 database functions
Step 5: Import Engines Data
Import 30,066 engine records:
docker exec -i mvp-postgres psql -U postgres -d motovaultpro \
< data/make-model-import/output/01_engines.sql
Verification:
docker exec mvp-postgres psql -U postgres -d motovaultpro \
-c "SELECT COUNT(*) FROM engines;"
Expected: 30,066 rows
Sample Data Check:
docker exec mvp-postgres psql -U postgres -d motovaultpro \
-c "SELECT id, name FROM engines LIMIT 10;"
Expected Format: Names like "V8 5.0L", "L4 2.0L Turbo", "V6 3.5L"
Step 6: Import Transmissions Data
Import 828 transmission records:
docker exec -i mvp-postgres psql -U postgres -d motovaultpro \
< data/make-model-import/output/02_transmissions.sql
Verification:
docker exec mvp-postgres psql -U postgres -d motovaultpro \
-c "SELECT COUNT(*) FROM transmissions;"
Expected: 828 rows
Sample Data Check:
docker exec mvp-postgres psql -U postgres -d motovaultpro \
-c "SELECT id, type FROM transmissions LIMIT 10;"
Expected Format: Types like "8-Speed Automatic", "6-Speed Manual", "CVT"
Step 7: Import Vehicle Options Data
Import 1,122,644 vehicle option records (this may take 2-5 minutes):
docker exec -i mvp-postgres psql -U postgres -d motovaultpro \
< data/make-model-import/output/03_vehicle_options.sql
Note: This is the largest import (51MB SQL file). You should see periodic output as batches are inserted.
Verification:
docker exec mvp-postgres psql -U postgres -d motovaultpro \
-c "SELECT COUNT(*) FROM vehicle_options;"
Expected: 1,122,644 rows
Sample Data Check:
docker exec mvp-postgres psql -U postgres -d motovaultpro \
-c "SELECT year, make, model, trim FROM vehicle_options LIMIT 10;"
Expected: Data like:
year | make | model | trim
------+---------+---------+---------------
2024 | Ford | F-150 | XLT SuperCrew
2024 | Honda | Civic | Sport Touring
2023 | Toyota | Camry | SE
Step 8: Verify Data Quality
Run quality checks on imported data:
Check Year Range
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
SELECT
MIN(year) as min_year,
MAX(year) as max_year,
COUNT(DISTINCT year) as total_years
FROM vehicle_options;
EOF
Expected: min_year=1980, max_year=2026, total_years=47
Check Make Count
docker exec mvp-postgres psql -U postgres -d motovaultpro \
-c "SELECT COUNT(DISTINCT make) FROM vehicle_options;"
Expected: 53 makes
Check NULL Engine IDs (Electric Vehicles)
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
SELECT
COUNT(*) as total_records,
COUNT(*) FILTER (WHERE engine_id IS NULL) as null_engines,
ROUND(100.0 * COUNT(*) FILTER (WHERE engine_id IS NULL) / COUNT(*), 2) as null_percentage
FROM vehicle_options;
EOF
Expected: ~1.1% NULL engine_id (approximately 11,951 records)
Sample Electric Vehicle Data
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
SELECT year, make, model, trim, engine_id, transmission_id
FROM vehicle_options
WHERE engine_id IS NULL
LIMIT 10;
EOF
Expected: Should see Tesla, Lucid, Rivian, or other electric vehicles with NULL engine_id
Step 9: Test Database Functions
Test the cascade query functions:
Test 1: Get Makes for Year
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
SELECT * FROM get_makes_for_year(2024) LIMIT 10;
EOF
Expected: Returns string list of makes: "Ford", "Honda", "Toyota", etc.
Test 2: Get Models for Year and Make
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
SELECT * FROM get_models_for_year_make(2024, 'Ford') LIMIT 10;
EOF
Expected: Returns Ford models: "F-150", "Mustang", "Explorer", etc.
Test 3: Get Trims for Year, Make, Model
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
SELECT * FROM get_trims_for_year_make_model(2024, 'Ford', 'F-150') LIMIT 10;
EOF
Expected: Returns F-150 trims: "XLT", "Lariat", "King Ranch", etc.
Test 4: Get Options for Vehicle
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
SELECT engine_name, transmission_type
FROM get_options_for_vehicle(2024, 'Ford', 'F-150', 'XLT')
LIMIT 10;
EOF
Expected: Returns engine/transmission combinations available for 2024 Ford F-150 XLT
Step 10: Performance Validation
Verify query performance is sub-50ms as claimed:
Test Index Usage
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
EXPLAIN ANALYZE
SELECT DISTINCT make
FROM vehicle_options
WHERE year = 2024;
EOF
Expected: Query plan should show index usage:
Index Scan using idx_vehicle_year_make ...
Execution Time: < 50 ms
Test Cascade Query Performance
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
EXPLAIN ANALYZE
SELECT DISTINCT model
FROM vehicle_options
WHERE year = 2024 AND make = 'Ford';
EOF
Expected: Should use composite index idx_vehicle_year_make, execution time < 50ms
Completion Checklist
Before signaling completion, verify:
- Backup of old schema created successfully
- Old vehicles.* tables dropped
- New migration executed without errors
- Engines table has 30,066 records
- Transmissions table has 828 records
- Vehicle_options table has 1,122,644 records
- Year range is 1980-2026 (47 years)
- 53 distinct makes present
- ~1.1% of records have NULL engine_id
- All 4 database functions exist and return data
- Composite indexes created (3 indexes)
- Query performance is sub-50ms
- No error messages in PostgreSQL logs
Troubleshooting
Error: "relation already exists"
Cause: Tables from old migration still present
Solution:
# Drop tables explicitly
docker exec -it mvp-postgres psql -U postgres -d motovaultpro \
-c "DROP TABLE IF EXISTS vehicle_options CASCADE;"
# Then re-run migration
Error: "duplicate key value violates unique constraint"
Cause: Data already imported, trying to import again
Solution:
# Truncate tables
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
TRUNCATE TABLE vehicle_options CASCADE;
TRUNCATE TABLE engines CASCADE;
TRUNCATE TABLE transmissions CASCADE;
EOF
# Then re-import data
Import Takes Too Long
Symptom: Import hangs or takes > 10 minutes
Solution:
- Check Docker resources (increase memory/CPU if needed)
- Check disk space:
df -h - Check PostgreSQL logs:
docker logs mvp-postgres - Try importing in smaller batches (split SQL files if necessary)
Performance Issues
Symptom: Queries take > 100ms
Solution:
# Verify indexes were created
docker exec mvp-postgres psql -U postgres -d motovaultpro \
-c "\di vehicle_options*"
# Analyze tables for query optimizer
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
ANALYZE engines;
ANALYZE transmissions;
ANALYZE vehicle_options;
EOF
Rollback Procedure
If you need to rollback:
# Drop new tables
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
DROP TABLE IF EXISTS vehicle_options CASCADE;
DROP TABLE IF EXISTS transmissions CASCADE;
DROP TABLE IF EXISTS engines CASCADE;
DROP FUNCTION IF EXISTS get_makes_for_year;
DROP FUNCTION IF EXISTS get_models_for_year_make;
DROP FUNCTION IF EXISTS get_trims_for_year_make_model;
DROP FUNCTION IF EXISTS get_options_for_vehicle;
EOF
# Restore from backup
docker cp data/backups/vehicles_schema_backup_<timestamp>.sql mvp-postgres:/tmp/
docker exec -i mvp-postgres psql -U postgres -d motovaultpro \
< /tmp/vehicles_schema_backup_<timestamp>.sql
Handoff to Agent 2
Once complete, provide this information to Agent 2 (Platform Repository):
Database Contract
Tables Available:
engines (id, name)
transmissions (id, type)
vehicle_options (id, year, make, model, trim, engine_id, transmission_id)
Functions Available:
get_makes_for_year(year INT) → TABLE(make VARCHAR)
get_models_for_year_make(year INT, make VARCHAR) → TABLE(model VARCHAR)
get_trims_for_year_make_model(year INT, make VARCHAR, model VARCHAR) → TABLE(trim_name VARCHAR)
get_options_for_vehicle(year INT, make VARCHAR, model VARCHAR, trim VARCHAR)
→ TABLE(engine_name VARCHAR, transmission_type VARCHAR, ...)
Data Quality Notes:
- Makes are in Title Case: "Ford", not "FORD"
- 1.1% of records have NULL engine_id (electric vehicles)
- Year range: 1980-2026
- 53 makes, 1,741 models, 1,122,644 total configurations
Performance:
- All queries using indexes perform sub-50ms
- Cascade queries optimized with composite indexes
Verification Command
Agent 2 can verify database is ready:
docker exec mvp-postgres psql -U postgres -d motovaultpro \
-c "SELECT COUNT(*) FROM vehicle_options;"
Should return: 1122644
Completion Message Template
Agent 1 (Database Migration): COMPLETE
Changes Made:
- Dropped vehicles.* schema tables (backup created)
- Executed 001_create_vehicle_database.sql migration
- Imported 30,066 engines
- Imported 828 transmissions
- Imported 1,122,644 vehicle options
Verification:
✓ All tables created with correct record counts
✓ Database functions operational
✓ Composite indexes created
✓ Query performance sub-50ms
✓ Data quality checks passed
Database is ready for Agent 2 (Platform Repository) to begin implementation.
Files modified: None (database only)
New schema: public.engines, public.transmissions, public.vehicle_options
Document Version: 1.0 Last Updated: 2025-11-10 Status: Ready for Implementation