Updates to database and API for dropdowns.
This commit is contained in:
@@ -9,18 +9,16 @@ This ETL pipeline creates a PostgreSQL database optimized for cascading dropdown
|
||||
|
||||
### Tables
|
||||
|
||||
1. **engines** - Detailed engine specifications
|
||||
- Displacement, configuration, horsepower, torque
|
||||
- Fuel type, fuel system, aspiration
|
||||
- Full specs stored as JSONB
|
||||
1. **engines** - Simplified engine specifications
|
||||
- id (Primary Key)
|
||||
- name (Display format: "V8 3.5L", "L4 2.0L Turbo", "V6 6.2L Supercharged")
|
||||
|
||||
2. **transmissions** - Transmission specifications
|
||||
- Type (Manual, Automatic, CVT, etc.)
|
||||
- Number of speeds
|
||||
- Drive type (FWD, RWD, AWD, 4WD)
|
||||
2. **transmissions** - Simplified transmission specifications
|
||||
- id (Primary Key)
|
||||
- type (Display format: "8-Speed Automatic", "6-Speed Manual", "CVT")
|
||||
|
||||
3. **vehicle_options** - Denormalized vehicle configurations
|
||||
- Year, Make, Model, Trim
|
||||
- Year, Make (Title Case: "Ford", "Acura", "Land Rover"), Model, Trim
|
||||
- Foreign keys to engines and transmissions
|
||||
- Optimized indexes for dropdown queries
|
||||
|
||||
@@ -63,57 +61,72 @@ This ETL pipeline creates a PostgreSQL database optimized for cascading dropdown
|
||||
|
||||
## ETL Process
|
||||
|
||||
### Step 1: Import Engine & Transmission Specs
|
||||
- Parse all records from `engines.json`
|
||||
- Extract detailed specifications
|
||||
- Create engines and transmissions tables
|
||||
- Build in-memory caches for fast lookups
|
||||
### Step 1: Load Source Data
|
||||
- Load `engines.json` (30,066 records)
|
||||
- Load `brands.json` (124 brands)
|
||||
- Load `automobiles.json` (7,207 models)
|
||||
- Load all `makes-filter/*.json` files (55 files)
|
||||
|
||||
### Step 2: Process Makes-Filter Data
|
||||
- Read all 57 JSON files from `makes-filter/`
|
||||
### Step 2: Transform Brand Names
|
||||
- Convert ALL CAPS brand names to Title Case ("FORD" → "Ford")
|
||||
- Preserve acronyms (BMW, GMC, KIA remain uppercase)
|
||||
- Handle special cases (DeLorean, McLaren)
|
||||
|
||||
### Step 3: Process Engine Specifications
|
||||
- Extract engine specs from engines.json
|
||||
- Create simplified display names (e.g., "V8 3.5L Turbo")
|
||||
- Normalize displacement (Cm3 → Liters) for matching
|
||||
- Build engine cache with (displacement, configuration) keys
|
||||
- Generate engines SQL with only id and name columns
|
||||
|
||||
### Step 4: Process Transmission Specifications
|
||||
- Extract transmission specs from engines.json
|
||||
- Create simplified display names (e.g., "8-Speed Automatic")
|
||||
- Parse speed count and transmission type
|
||||
- Build transmission cache for linking
|
||||
- Generate transmissions SQL with only id and type columns
|
||||
|
||||
### Step 5: Process Makes-Filter Data
|
||||
- Read all JSON files from `makes-filter/`
|
||||
- Extract year/make/model/trim/engine combinations
|
||||
- Match engine strings to detailed specs using displacement + configuration
|
||||
- Link transmissions to vehicle records (98.9% success rate)
|
||||
- Apply year filter (1980 and newer only)
|
||||
- Build vehicle_options records
|
||||
|
||||
### Step 3: Hybrid Backfill
|
||||
### Step 6: Hybrid Backfill
|
||||
- Check `automobiles.json` for recent years (2023-2025)
|
||||
- Add any missing year/make/model combinations
|
||||
- Only backfill for the 57 filtered makes
|
||||
- Only backfill for filtered makes
|
||||
- Link transmissions for backfilled records
|
||||
- Limit to 3 engines per backfilled model
|
||||
|
||||
### Step 4: Insert Vehicle Options
|
||||
- Batch insert all vehicle_options records
|
||||
- Create indexes for optimal query performance
|
||||
- Generate views and functions
|
||||
|
||||
### Step 5: Validation
|
||||
- Count records in each table
|
||||
- Test dropdown cascade queries
|
||||
- Display sample data
|
||||
### Step 7: Generate SQL Output
|
||||
- Write SQL files with proper escaping (newlines, quotes, special characters)
|
||||
- Convert empty strings to NULL for data integrity
|
||||
- Use batched inserts (1000 records per batch)
|
||||
- Output to `output/` directory
|
||||
|
||||
## Running the ETL
|
||||
|
||||
### Prerequisites
|
||||
- Docker container `mvp-postgres` running
|
||||
- Python 3 with psycopg2
|
||||
- Python 3 (no additional dependencies required)
|
||||
- JSON source files in project root
|
||||
|
||||
### Quick Start
|
||||
```bash
|
||||
./run_migration.sh
|
||||
# Step 1: Generate SQL files from JSON data
|
||||
python3 etl_generate_sql.py
|
||||
|
||||
# Step 2: Import SQL files into database
|
||||
./import_data.sh
|
||||
```
|
||||
|
||||
### Manual Steps
|
||||
```bash
|
||||
# 1. Run migration
|
||||
docker compose exec mvp-postgres psql -U postgres -d motovaultpro < migrations/001_create_vehicle_database.sql
|
||||
|
||||
# 2. Install Python dependencies
|
||||
pip3 install psycopg2-binary
|
||||
|
||||
# 3. Run ETL script
|
||||
python3 etl_vehicle_data.py
|
||||
```
|
||||
### What Gets Generated
|
||||
- `output/01_engines.sql` (~632KB, 30,066 records)
|
||||
- `output/02_transmissions.sql` (~21KB, 828 records)
|
||||
- `output/03_vehicle_options.sql` (~51MB, 1,122,644 records)
|
||||
|
||||
## Query Examples
|
||||
|
||||
@@ -127,26 +140,26 @@ SELECT * FROM available_years;
|
||||
SELECT * FROM get_makes_for_year(2024);
|
||||
```
|
||||
|
||||
### Get models for 2024 Ford
|
||||
### Get models for 2025 Ford
|
||||
```sql
|
||||
SELECT * FROM get_models_for_year_make(2024, 'Ford');
|
||||
SELECT * FROM get_models_for_year_make(2025, 'Ford');
|
||||
```
|
||||
|
||||
### Get trims for 2024 Ford F-150
|
||||
### Get trims for 2025 Ford F-150
|
||||
```sql
|
||||
SELECT * FROM get_trims_for_year_make_model(2024, 'Ford', 'F-150');
|
||||
SELECT * FROM get_trims_for_year_make_model(2025, 'Ford', 'f-150');
|
||||
```
|
||||
|
||||
### Get engine/transmission options for specific vehicle
|
||||
```sql
|
||||
SELECT * FROM get_options_for_vehicle(2024, 'Ford', 'F-150', 'XLT');
|
||||
SELECT * FROM get_options_for_vehicle(2025, 'Ford', 'f-150', 'XLT');
|
||||
```
|
||||
|
||||
### Complete vehicle configurations
|
||||
```sql
|
||||
SELECT * FROM complete_vehicle_configs
|
||||
WHERE year = 2024 AND make = 'Tesla'
|
||||
ORDER BY model, trim;
|
||||
WHERE year = 2025 AND make = 'Ford' AND model = 'f-150'
|
||||
LIMIT 10;
|
||||
```
|
||||
|
||||
## Performance Optimization
|
||||
@@ -164,35 +177,50 @@ Dropdown queries are optimized to return results in < 50ms for typical datasets.
|
||||
|
||||
## Data Matching Logic
|
||||
|
||||
### Brand Name Transformation
|
||||
- Source data (brands.json) stores names in ALL CAPS: "FORD", "ACURA", "ALFA ROMEO"
|
||||
- ETL converts to Title Case: "Ford", "Acura", "Alfa Romeo"
|
||||
- Preserves acronyms: BMW, GMC, KIA, MINI, FIAT, RAM
|
||||
- Special cases: DeLorean, McLaren
|
||||
|
||||
### Engine Matching
|
||||
The ETL uses intelligent pattern matching to link simple engine strings from makes-filter to detailed specs:
|
||||
|
||||
1. **Parse engine string**: Extract displacement (e.g., "2.0L") and configuration (e.g., "I4")
|
||||
2. **Normalize**: Convert to uppercase, standardize format
|
||||
2. **Normalize displacement**: Convert Cm3 to Liters ("3506 Cm3" → "3.5L")
|
||||
3. **Match to cache**: Look up in engine cache by (displacement, configuration)
|
||||
4. **Handle variations**: Account for I4/L4, V6/V-6, etc.
|
||||
4. **Create display name**: Format as "V8 3.5L", "L4 2.0L Turbo", etc.
|
||||
|
||||
### Transmission Linking
|
||||
- Transmission data is embedded in engines.json under "Transmission Specs"
|
||||
- Each engine record includes gearbox type (e.g., "6-Speed Manual")
|
||||
- ETL links transmissions to vehicle records based on engine match
|
||||
- Success rate: 98.9% (1,109,510 of 1,122,644 records)
|
||||
- Unlinked records: primarily electric vehicles without traditional transmissions
|
||||
|
||||
### Configuration Equivalents
|
||||
- `I4` = `L4` = `INLINE-4`
|
||||
- `I4` = `L4` = `INLINE-4` = `4 Inline`
|
||||
- `V6` = `V-6`
|
||||
- `V8` = `V-8`
|
||||
|
||||
## Filtered Makes (57 Total)
|
||||
## Filtered Makes (53 Total)
|
||||
|
||||
All brand names are stored in Title Case format for user-friendly display.
|
||||
|
||||
### American Brands (12)
|
||||
Acura, Buick, Cadillac, Chevrolet, Chrysler, Dodge, Ford, GMC, Hummer, Jeep, Lincoln, Ram
|
||||
Acura, Buick, Cadillac, Chevrolet, Chrysler, Dodge, Ford, GMC, Hummer, Jeep, Lincoln, RAM
|
||||
|
||||
### Luxury/Performance (13)
|
||||
Aston Martin, Bentley, Ferrari, Lamborghini, Maserati, McLaren, Porsche, Rolls-Royce, Tesla, Jaguar, Audi, BMW, Land Rover
|
||||
Aston Martin, Bentley, Ferrari, Lamborghini, Maserati, McLaren, Porsche, Rolls Royce, Tesla, Jaguar, Audi, BMW, Land Rover
|
||||
|
||||
### Japanese (7)
|
||||
### Japanese (8)
|
||||
Honda, Infiniti, Lexus, Mazda, Mitsubishi, Nissan, Subaru, Toyota
|
||||
|
||||
### European (13)
|
||||
Alfa Romeo, Fiat, Mini, Saab, Saturn, Scion, Smart, Volkswagen, Volvo
|
||||
### European (9)
|
||||
Alfa Romeo, FIAT, MINI, Saab, Saturn, Scion, Smart, Volkswagen, Volvo
|
||||
|
||||
### Other (12)
|
||||
Genesis, Geo, Hyundai, Kia, Lucid, Polestar, Rivian, Lotus, Mercury, Oldsmobile, Plymouth, Pontiac
|
||||
### Other (11)
|
||||
Genesis, Geo, Hyundai, KIA, Lucid, Polestar, Rivian, Lotus, Mercury, Oldsmobile, Plymouth, Pontiac
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
@@ -229,12 +257,14 @@ pip3 install psycopg2-binary
|
||||
## Expected Results
|
||||
|
||||
After successful ETL:
|
||||
- **Engines**: ~30,000 records
|
||||
- **Transmissions**: ~500-1000 unique combinations
|
||||
- **Vehicle Options**: ~50,000-100,000 configurations
|
||||
- **Years**: 10-15 distinct years
|
||||
- **Makes**: 57 manufacturers
|
||||
- **Models**: 1,000-2,000 unique models
|
||||
- **Engines**: 30,066 records
|
||||
- **Transmissions**: 828 records
|
||||
- **Vehicle Options**: 1,122,644 configurations
|
||||
- **Years**: 47 years (1980-2026)
|
||||
- **Makes**: 53 manufacturers
|
||||
- **Models**: 1,741 unique models
|
||||
- **Transmission Linking**: 98.9% success rate
|
||||
- **Output Files**: ~52MB total (632KB engines + 21KB transmissions + 51MB vehicles)
|
||||
|
||||
## Next Steps
|
||||
|
||||
|
||||
Reference in New Issue
Block a user