Initial Commit

2025-09-17 16:09:15 -05:00
parent 0cdb9803de
commit a052040e3a
373 changed files with 437090 additions and 6773 deletions
--- a/docs/changes/vehicles-dropdown-v2/01-analysis-findings.md
+++ b/docs/changes/vehicles-dropdown-v2/01-analysis-findings.md
@@ -0,0 +1,203 @@
+# Analysis Findings - JSON Vehicle Data
+
+## Data Source Overview
+- **Location**: `mvp-platform-services/vehicles/etl/sources/makes/`
+- **File Count**: 55 JSON files
+- **File Naming**: Lowercase with underscores (e.g., `alfa_romeo.json`, `land_rover.json`)
+- **Data Structure**: Hierarchical vehicle data by make
+
+## JSON File Structure Analysis
+
+### Standard Structure
+```json
+{
+  "[make_name]": [
+    {
+      "year": "2024",
+      "models": [
+        {
+          "name": "model_name",
+          "engines": [
+            "2.0L I4",
+            "3.5L V6 TURBO"
+          ],
+          "submodels": [
+            "Base",
+            "Premium",
+            "Limited"
+          ]
+        }
+      ]
+    }
+  ]
+}
+```
+
+### Key Data Points
+1. **Make Level**: Root key matches filename (lowercase)
+2. **Year Level**: Array of yearly data
+3. **Model Level**: Array of models per year
+4. **Engines**: Array of engine specifications
+5. **Submodels**: Array of trim levels
+
+## Make Name Analysis
+
+### File Naming vs Display Name Issues
+| Filename | Required Display Name | Issue |
+|----------|---------------------|--------|
+| `alfa_romeo.json` | "Alfa Romeo" | Underscore → space, title case |
+| `land_rover.json` | "Land Rover" | Underscore → space, title case |
+| `rolls_royce.json` | "Rolls Royce" | Underscore → space, title case |
+| `chevrolet.json` | "Chevrolet" | Direct match |
+| `bmw.json` | "BMW" | Uppercase required |
+
+### Make Name Normalization Rules
+1. **Replace underscores** with spaces
+2. **Title case** each word
+3. **Special cases**: BMW, GMC (all caps)
+4. **Validation**: Cross-reference with `sources/makes.json`
+
+## Engine Specification Analysis
+
+### Discovered Engine Patterns
+From analysis of Nissan, Toyota, Ford, Subaru, and Porsche files:
+
+#### Standard Format: `{displacement}L {config}{cylinders}`
+- `"2.0L I4"` - 2.0 liter, Inline 4-cylinder
+- `"3.5L V6"` - 3.5 liter, V6 configuration  
+- `"2.4L H4"` - 2.4 liter, Horizontal (Boxer) 4-cylinder
+
+#### Configuration Types Found
+- **I** = Inline (most common)
+- **V** = V-configuration
+- **H** = Horizontal/Boxer (Subaru, Porsche)
+- **L** = **MUST BE TREATED AS INLINE** (L3 → I3)
+
+### Engine Modifier Patterns
+
+#### Hybrid Classifications
+- `"PLUG-IN HYBRID EV- (PHEV)"` - Plug-in hybrid electric vehicle
+- `"FULL HYBRID EV- (FHEV)"` - Full hybrid electric vehicle
+- `"HYBRID"` - General hybrid designation
+
+#### Fuel Type Modifiers
+- `"FLEX"` - Flex-fuel capability (e.g., `"5.6L V8 FLEX"`)
+- `"ELECTRIC"` - Pure electric motor
+- `"TURBO"` - Turbocharged (less common in current data)
+
+#### Example Engine Strings
+```
+"2.5L I4 FULL HYBRID EV- (FHEV)"
+"1.5L L3 PLUG-IN HYBRID EV- (PHEV)"  // L3 → I3
+"5.6L V8 FLEX"
+"2.4L H4"  // Subaru Boxer
+"1.8L I4 ELECTRIC"
+```
+
+## Special Cases Analysis
+
+### Electric Vehicle Handling
+**Tesla Example** (`tesla.json`):
+```json
+{
+  "name": "3",
+  "engines": [],  // Empty array
+  "submodels": ["Long Range AWD", "Performance"]
+}
+```
+
+**Lucid Example** (`lucid.json`):
+```json
+{
+  "name": "air",
+  "engines": [],  // Empty array
+  "submodels": []
+}
+```
+
+#### Electric Vehicle Requirements
+- **Empty engines arrays** are common for pure electric vehicles
+- **Must create default engine**: `"Electric Motor"` with appropriate specs
+- **Fuel type**: `"Electric"`
+- **Configuration**: `null` or `"Electric"`
+
+### Hybrid Vehicle Patterns
+From Toyota analysis - hybrid appears in both engines and submodels:
+- **Engine level**: `"1.8L I4 ELECTRIC"`
+- **Submodel level**: `"Hybrid LE"`, `"Hybrid XSE"`
+
+## Data Quality Issues Found
+
+### Missing Engine Data
+- **Tesla models**: Consistently empty engines arrays
+- **Lucid models**: Empty engines arrays  
+- **Some Nissan models**: Empty engines for electric variants
+
+### Inconsistent Submodel Data
+- **Mix of trim levels and descriptors**
+- **Some technical specifications** in submodel names
+- **Inconsistent naming patterns** across makes
+
+### Engine Specification Inconsistencies
+- **L-configuration usage**: Should be normalized to I (Inline)
+- **Mixed hybrid notation**: Sometimes in engine string, sometimes separate
+- **Abbreviation variations**: EV- vs EV, FHEV vs FULL HYBRID
+
+## Database Mapping Strategy
+
+### Make Mapping
+```
+Filename: "alfa_romeo.json" → Database: "Alfa Romeo"
+```
+
+### Model Mapping  
+```
+JSON models.name → vehicles.model.name
+```
+
+### Engine Mapping
+```
+JSON engines[0] → vehicles.engine.name (with parsing)
+Engine parsing → displacement_l, cylinders, fuel_type, aspiration
+```
+
+### Trim Mapping
+```
+JSON submodels[0] → vehicles.trim.name
+```
+
+## Data Volume Estimates
+
+### File Size Analysis
+- **Largest files**: `toyota.json` (~748KB), `volkswagen.json` (~738KB)
+- **Smallest files**: `lucid.json` (~176B), `rivian.json` (~177B)
+- **Average file size**: ~150KB
+
+### Record Estimates (Based on Sample Analysis)
+- **Makes**: 55 (one per file)
+- **Models per make**: 5-50 (highly variable)
+- **Years per model**: 10-15 years average
+- **Trims per model-year**: 3-10 average
+- **Engines**: 500-1000 unique engines total
+
+## Processing Recommendations
+
+### Order of Operations
+1. **Load makes** - Create make records with normalized names
+2. **Load models** - Associate with correct make_id
+3. **Load model_years** - Create year availability
+4. **Parse and load engines** - Handle L→I normalization
+5. **Load trims** - Associate with model_year_id
+6. **Create trim_engine relationships**
+
+### Error Handling Requirements
+- **Handle empty engines arrays** (electric vehicles)
+- **Validate engine parsing** (log unparseable engines)  
+- **Handle duplicate records** (upsert strategy)
+- **Report data quality issues** (missing data, parsing failures)
+
+## Validation Strategy
+- **Cross-reference makes** with existing `sources/makes.json`
+- **Validate engine parsing** with regex patterns
+- **Check referential integrity** during loading
+- **Report statistics** per make (models, engines, trims loaded)