Files
motovaultpro/docs/changes/database-20251111/database-migration.md
2025-11-11 10:29:02 -06:00

14 KiB

Database Migration Guide - Agent 1

Task: Replace vehicles.* schema with new ETL-generated database

Status: Ready for Implementation Dependencies: None (can start immediately) Estimated Time: 30 minutes Assigned To: Agent 1 (Database)


Overview

Replace the normalized vehicles.* schema with a denormalized vehicle_options table populated from ETL-generated data (1.1M+ records from 1980-2026).


Prerequisites

Required Files

All files are already present in the repository:

data/make-model-import/migrations/001_create_vehicle_database.sql
data/make-model-import/output/01_engines.sql
data/make-model-import/output/02_transmissions.sql
data/make-model-import/output/03_vehicle_options.sql

Database Access

# Verify Docker container is running
docker ps | grep mvp-postgres

# Access PostgreSQL
docker exec -it mvp-postgres psql -U postgres -d motovaultpro

Step 1: Backup Current Schema (Safety)

Before making any changes, backup the existing vehicles.* schema:

# Create backup directory
mkdir -p data/backups

# Dump vehicles schema only
docker exec mvp-postgres pg_dump -U postgres -d motovaultpro \
  --schema=vehicles \
  --format=plain \
  --file=/tmp/vehicles_schema_backup.sql

# Copy backup to host
docker cp mvp-postgres:/tmp/vehicles_schema_backup.sql \
  data/backups/vehicles_schema_backup_$(date +%Y%m%d_%H%M%S).sql

# Verify backup exists
ls -lh data/backups/

Verification: Backup file should be 100KB-1MB in size


Step 2: Drop Existing vehicles.* Tables

Drop all normalized tables in the vehicles schema:

docker exec -it mvp-postgres psql -U postgres -d motovaultpro <<EOF
-- Drop tables in correct order (respect foreign keys)
DROP TABLE IF EXISTS vehicles.trim_transmission CASCADE;
DROP TABLE IF EXISTS vehicles.trim_engine CASCADE;
DROP TABLE IF EXISTS vehicles.transmission CASCADE;
DROP TABLE IF EXISTS vehicles.engine CASCADE;
DROP TABLE IF EXISTS vehicles.trim CASCADE;
DROP TABLE IF EXISTS vehicles.model_year CASCADE;
DROP TABLE IF EXISTS vehicles.model CASCADE;
DROP TABLE IF EXISTS vehicles.make CASCADE;

-- Drop views if they exist
DROP VIEW IF EXISTS vehicles.available_years CASCADE;
DROP VIEW IF EXISTS vehicles.makes_by_year CASCADE;
DROP VIEW IF EXISTS vehicles.models_by_year_make CASCADE;

-- Optionally drop the entire schema
-- DROP SCHEMA IF EXISTS vehicles CASCADE;

-- Verify all tables dropped
SELECT table_name FROM information_schema.tables
WHERE table_schema = 'vehicles';
EOF

Verification: Query should return 0 rows (no tables left in vehicles schema)

Note: This is a destructive operation. Ensure backup completed successfully before proceeding.


Step 3: Run New Migration

Execute the new schema migration that creates:

  • engines table
  • transmissions table
  • vehicle_options table
  • Database functions for cascade queries
  • Composite indexes
# Run migration SQL
docker exec -i mvp-postgres psql -U postgres -d motovaultpro \
  < data/make-model-import/migrations/001_create_vehicle_database.sql

Verification: Check for error messages. Successful output should include:

CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE INDEX
CREATE INDEX
CREATE FUNCTION
...

Step 4: Verify Schema Created

Check that all tables and functions were created successfully:

docker exec -it mvp-postgres psql -U postgres -d motovaultpro <<EOF
-- List all tables
\dt

-- Describe engines table
\d engines

-- Describe transmissions table
\d transmissions

-- Describe vehicle_options table
\d vehicle_options

-- List indexes on vehicle_options
\di vehicle_options*

-- List functions
\df get_makes_for_year
\df get_models_for_year_make
\df get_trims_for_year_make_model
\df get_options_for_vehicle
EOF

Expected Output:

  • 3 tables: engines, transmissions, vehicle_options
  • Indexes: idx_vehicle_year_make, idx_vehicle_year_make_model, idx_vehicle_year_make_model_trim
  • 4 database functions

Step 5: Import Engines Data

Import 30,066 engine records:

docker exec -i mvp-postgres psql -U postgres -d motovaultpro \
  < data/make-model-import/output/01_engines.sql

Verification:

docker exec mvp-postgres psql -U postgres -d motovaultpro \
  -c "SELECT COUNT(*) FROM engines;"

Expected: 30,066 rows

Sample Data Check:

docker exec mvp-postgres psql -U postgres -d motovaultpro \
  -c "SELECT id, name FROM engines LIMIT 10;"

Expected Format: Names like "V8 5.0L", "L4 2.0L Turbo", "V6 3.5L"


Step 6: Import Transmissions Data

Import 828 transmission records:

docker exec -i mvp-postgres psql -U postgres -d motovaultpro \
  < data/make-model-import/output/02_transmissions.sql

Verification:

docker exec mvp-postgres psql -U postgres -d motovaultpro \
  -c "SELECT COUNT(*) FROM transmissions;"

Expected: 828 rows

Sample Data Check:

docker exec mvp-postgres psql -U postgres -d motovaultpro \
  -c "SELECT id, type FROM transmissions LIMIT 10;"

Expected Format: Types like "8-Speed Automatic", "6-Speed Manual", "CVT"


Step 7: Import Vehicle Options Data

Import 1,122,644 vehicle option records (this may take 2-5 minutes):

docker exec -i mvp-postgres psql -U postgres -d motovaultpro \
  < data/make-model-import/output/03_vehicle_options.sql

Note: This is the largest import (51MB SQL file). You should see periodic output as batches are inserted.

Verification:

docker exec mvp-postgres psql -U postgres -d motovaultpro \
  -c "SELECT COUNT(*) FROM vehicle_options;"

Expected: 1,122,644 rows

Sample Data Check:

docker exec mvp-postgres psql -U postgres -d motovaultpro \
  -c "SELECT year, make, model, trim FROM vehicle_options LIMIT 10;"

Expected: Data like:

 year |  make   |  model  |     trim
------+---------+---------+---------------
 2024 | Ford    | F-150   | XLT SuperCrew
 2024 | Honda   | Civic   | Sport Touring
 2023 | Toyota  | Camry   | SE

Step 8: Verify Data Quality

Run quality checks on imported data:

Check Year Range

docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
SELECT
  MIN(year) as min_year,
  MAX(year) as max_year,
  COUNT(DISTINCT year) as total_years
FROM vehicle_options;
EOF

Expected: min_year=1980, max_year=2026, total_years=47

Check Make Count

docker exec mvp-postgres psql -U postgres -d motovaultpro \
  -c "SELECT COUNT(DISTINCT make) FROM vehicle_options;"

Expected: 53 makes

Check NULL Engine IDs (Electric Vehicles)

docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
SELECT
  COUNT(*) as total_records,
  COUNT(*) FILTER (WHERE engine_id IS NULL) as null_engines,
  ROUND(100.0 * COUNT(*) FILTER (WHERE engine_id IS NULL) / COUNT(*), 2) as null_percentage
FROM vehicle_options;
EOF

Expected: ~1.1% NULL engine_id (approximately 11,951 records)

Sample Electric Vehicle Data

docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
SELECT year, make, model, trim, engine_id, transmission_id
FROM vehicle_options
WHERE engine_id IS NULL
LIMIT 10;
EOF

Expected: Should see Tesla, Lucid, Rivian, or other electric vehicles with NULL engine_id


Step 9: Test Database Functions

Test the cascade query functions:

Test 1: Get Makes for Year

docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
SELECT * FROM get_makes_for_year(2024) LIMIT 10;
EOF

Expected: Returns string list of makes: "Ford", "Honda", "Toyota", etc.

Test 2: Get Models for Year and Make

docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
SELECT * FROM get_models_for_year_make(2024, 'Ford') LIMIT 10;
EOF

Expected: Returns Ford models: "F-150", "Mustang", "Explorer", etc.

Test 3: Get Trims for Year, Make, Model

docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
SELECT * FROM get_trims_for_year_make_model(2024, 'Ford', 'F-150') LIMIT 10;
EOF

Expected: Returns F-150 trims: "XLT", "Lariat", "King Ranch", etc.

Test 4: Get Options for Vehicle

docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
SELECT engine_name, transmission_type
FROM get_options_for_vehicle(2024, 'Ford', 'F-150', 'XLT')
LIMIT 10;
EOF

Expected: Returns engine/transmission combinations available for 2024 Ford F-150 XLT


Step 10: Performance Validation

Verify query performance is sub-50ms as claimed:

Test Index Usage

docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
EXPLAIN ANALYZE
SELECT DISTINCT make
FROM vehicle_options
WHERE year = 2024;
EOF

Expected: Query plan should show index usage:

Index Scan using idx_vehicle_year_make ...
Execution Time: < 50 ms

Test Cascade Query Performance

docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
EXPLAIN ANALYZE
SELECT DISTINCT model
FROM vehicle_options
WHERE year = 2024 AND make = 'Ford';
EOF

Expected: Should use composite index idx_vehicle_year_make, execution time < 50ms


Completion Checklist

Before signaling completion, verify:

  • Backup of old schema created successfully
  • Old vehicles.* tables dropped
  • New migration executed without errors
  • Engines table has 30,066 records
  • Transmissions table has 828 records
  • Vehicle_options table has 1,122,644 records
  • Year range is 1980-2026 (47 years)
  • 53 distinct makes present
  • ~1.1% of records have NULL engine_id
  • All 4 database functions exist and return data
  • Composite indexes created (3 indexes)
  • Query performance is sub-50ms
  • No error messages in PostgreSQL logs

Troubleshooting

Error: "relation already exists"

Cause: Tables from old migration still present

Solution:

# Drop tables explicitly
docker exec -it mvp-postgres psql -U postgres -d motovaultpro \
  -c "DROP TABLE IF EXISTS vehicle_options CASCADE;"
# Then re-run migration

Error: "duplicate key value violates unique constraint"

Cause: Data already imported, trying to import again

Solution:

# Truncate tables
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
TRUNCATE TABLE vehicle_options CASCADE;
TRUNCATE TABLE engines CASCADE;
TRUNCATE TABLE transmissions CASCADE;
EOF
# Then re-import data

Import Takes Too Long

Symptom: Import hangs or takes > 10 minutes

Solution:

  1. Check Docker resources (increase memory/CPU if needed)
  2. Check disk space: df -h
  3. Check PostgreSQL logs: docker logs mvp-postgres
  4. Try importing in smaller batches (split SQL files if necessary)

Performance Issues

Symptom: Queries take > 100ms

Solution:

# Verify indexes were created
docker exec mvp-postgres psql -U postgres -d motovaultpro \
  -c "\di vehicle_options*"

# Analyze tables for query optimizer
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
ANALYZE engines;
ANALYZE transmissions;
ANALYZE vehicle_options;
EOF

Rollback Procedure

If you need to rollback:

# Drop new tables
docker exec mvp-postgres psql -U postgres -d motovaultpro <<EOF
DROP TABLE IF EXISTS vehicle_options CASCADE;
DROP TABLE IF EXISTS transmissions CASCADE;
DROP TABLE IF EXISTS engines CASCADE;
DROP FUNCTION IF EXISTS get_makes_for_year;
DROP FUNCTION IF EXISTS get_models_for_year_make;
DROP FUNCTION IF EXISTS get_trims_for_year_make_model;
DROP FUNCTION IF EXISTS get_options_for_vehicle;
EOF

# Restore from backup
docker cp data/backups/vehicles_schema_backup_<timestamp>.sql mvp-postgres:/tmp/
docker exec -i mvp-postgres psql -U postgres -d motovaultpro \
  < /tmp/vehicles_schema_backup_<timestamp>.sql

Handoff to Agent 2

Once complete, provide this information to Agent 2 (Platform Repository):

Database Contract

Tables Available:

engines (id, name)
transmissions (id, type)
vehicle_options (id, year, make, model, trim, engine_id, transmission_id)

Functions Available:

get_makes_for_year(year INT)  TABLE(make VARCHAR)
get_models_for_year_make(year INT, make VARCHAR)  TABLE(model VARCHAR)
get_trims_for_year_make_model(year INT, make VARCHAR, model VARCHAR)  TABLE(trim_name VARCHAR)
get_options_for_vehicle(year INT, make VARCHAR, model VARCHAR, trim VARCHAR)
   TABLE(engine_name VARCHAR, transmission_type VARCHAR, ...)

Data Quality Notes:

  • Makes are in Title Case: "Ford", not "FORD"
  • 1.1% of records have NULL engine_id (electric vehicles)
  • Year range: 1980-2026
  • 53 makes, 1,741 models, 1,122,644 total configurations

Performance:

  • All queries using indexes perform sub-50ms
  • Cascade queries optimized with composite indexes

Verification Command

Agent 2 can verify database is ready:

docker exec mvp-postgres psql -U postgres -d motovaultpro \
  -c "SELECT COUNT(*) FROM vehicle_options;"

Should return: 1122644


Completion Message Template

Agent 1 (Database Migration): COMPLETE

Changes Made:
- Dropped vehicles.* schema tables (backup created)
- Executed 001_create_vehicle_database.sql migration
- Imported 30,066 engines
- Imported 828 transmissions
- Imported 1,122,644 vehicle options

Verification:
✓ All tables created with correct record counts
✓ Database functions operational
✓ Composite indexes created
✓ Query performance sub-50ms
✓ Data quality checks passed

Database is ready for Agent 2 (Platform Repository) to begin implementation.

Files modified: None (database only)
New schema: public.engines, public.transmissions, public.vehicle_options

Document Version: 1.0 Last Updated: 2025-11-10 Status: Ready for Implementation