Files
motovaultpro/docs/changes/vehicles-dropdown-v2/01-analysis-findings.md
Eric Gullickson a052040e3a Initial Commit
2025-09-17 16:09:15 -05:00

5.8 KiB

Analysis Findings - JSON Vehicle Data

Data Source Overview

  • Location: mvp-platform-services/vehicles/etl/sources/makes/
  • File Count: 55 JSON files
  • File Naming: Lowercase with underscores (e.g., alfa_romeo.json, land_rover.json)
  • Data Structure: Hierarchical vehicle data by make

JSON File Structure Analysis

Standard Structure

{
  "[make_name]": [
    {
      "year": "2024",
      "models": [
        {
          "name": "model_name",
          "engines": [
            "2.0L I4",
            "3.5L V6 TURBO"
          ],
          "submodels": [
            "Base",
            "Premium",
            "Limited"
          ]
        }
      ]
    }
  ]
}

Key Data Points

  1. Make Level: Root key matches filename (lowercase)
  2. Year Level: Array of yearly data
  3. Model Level: Array of models per year
  4. Engines: Array of engine specifications
  5. Submodels: Array of trim levels

Make Name Analysis

File Naming vs Display Name Issues

Filename Required Display Name Issue
alfa_romeo.json "Alfa Romeo" Underscore → space, title case
land_rover.json "Land Rover" Underscore → space, title case
rolls_royce.json "Rolls Royce" Underscore → space, title case
chevrolet.json "Chevrolet" Direct match
bmw.json "BMW" Uppercase required

Make Name Normalization Rules

  1. Replace underscores with spaces
  2. Title case each word
  3. Special cases: BMW, GMC (all caps)
  4. Validation: Cross-reference with sources/makes.json

Engine Specification Analysis

Discovered Engine Patterns

From analysis of Nissan, Toyota, Ford, Subaru, and Porsche files:

Standard Format: {displacement}L {config}{cylinders}

  • "2.0L I4" - 2.0 liter, Inline 4-cylinder
  • "3.5L V6" - 3.5 liter, V6 configuration
  • "2.4L H4" - 2.4 liter, Horizontal (Boxer) 4-cylinder

Configuration Types Found

  • I = Inline (most common)
  • V = V-configuration
  • H = Horizontal/Boxer (Subaru, Porsche)
  • L = MUST BE TREATED AS INLINE (L3 → I3)

Engine Modifier Patterns

Hybrid Classifications

  • "PLUG-IN HYBRID EV- (PHEV)" - Plug-in hybrid electric vehicle
  • "FULL HYBRID EV- (FHEV)" - Full hybrid electric vehicle
  • "HYBRID" - General hybrid designation

Fuel Type Modifiers

  • "FLEX" - Flex-fuel capability (e.g., "5.6L V8 FLEX")
  • "ELECTRIC" - Pure electric motor
  • "TURBO" - Turbocharged (less common in current data)

Example Engine Strings

"2.5L I4 FULL HYBRID EV- (FHEV)"
"1.5L L3 PLUG-IN HYBRID EV- (PHEV)"  // L3 → I3
"5.6L V8 FLEX"
"2.4L H4"  // Subaru Boxer
"1.8L I4 ELECTRIC"

Special Cases Analysis

Electric Vehicle Handling

Tesla Example (tesla.json):

{
  "name": "3",
  "engines": [],  // Empty array
  "submodels": ["Long Range AWD", "Performance"]
}

Lucid Example (lucid.json):

{
  "name": "air",
  "engines": [],  // Empty array
  "submodels": []
}

Electric Vehicle Requirements

  • Empty engines arrays are common for pure electric vehicles
  • Must create default engine: "Electric Motor" with appropriate specs
  • Fuel type: "Electric"
  • Configuration: null or "Electric"

Hybrid Vehicle Patterns

From Toyota analysis - hybrid appears in both engines and submodels:

  • Engine level: "1.8L I4 ELECTRIC"
  • Submodel level: "Hybrid LE", "Hybrid XSE"

Data Quality Issues Found

Missing Engine Data

  • Tesla models: Consistently empty engines arrays
  • Lucid models: Empty engines arrays
  • Some Nissan models: Empty engines for electric variants

Inconsistent Submodel Data

  • Mix of trim levels and descriptors
  • Some technical specifications in submodel names
  • Inconsistent naming patterns across makes

Engine Specification Inconsistencies

  • L-configuration usage: Should be normalized to I (Inline)
  • Mixed hybrid notation: Sometimes in engine string, sometimes separate
  • Abbreviation variations: EV- vs EV, FHEV vs FULL HYBRID

Database Mapping Strategy

Make Mapping

Filename: "alfa_romeo.json" → Database: "Alfa Romeo"

Model Mapping

JSON models.name → vehicles.model.name

Engine Mapping

JSON engines[0] → vehicles.engine.name (with parsing)
Engine parsing → displacement_l, cylinders, fuel_type, aspiration

Trim Mapping

JSON submodels[0] → vehicles.trim.name

Data Volume Estimates

File Size Analysis

  • Largest files: toyota.json (~748KB), volkswagen.json (~738KB)
  • Smallest files: lucid.json (~176B), rivian.json (~177B)
  • Average file size: ~150KB

Record Estimates (Based on Sample Analysis)

  • Makes: 55 (one per file)
  • Models per make: 5-50 (highly variable)
  • Years per model: 10-15 years average
  • Trims per model-year: 3-10 average
  • Engines: 500-1000 unique engines total

Processing Recommendations

Order of Operations

  1. Load makes - Create make records with normalized names
  2. Load models - Associate with correct make_id
  3. Load model_years - Create year availability
  4. Parse and load engines - Handle L→I normalization
  5. Load trims - Associate with model_year_id
  6. Create trim_engine relationships

Error Handling Requirements

  • Handle empty engines arrays (electric vehicles)
  • Validate engine parsing (log unparseable engines)
  • Handle duplicate records (upsert strategy)
  • Report data quality issues (missing data, parsing failures)

Validation Strategy

  • Cross-reference makes with existing sources/makes.json
  • Validate engine parsing with regex patterns
  • Check referential integrity during loading
  • Report statistics per make (models, engines, trims loaded)