egullickson/motovaultpro

Fork 0

Files

Eric Gullickson a052040e3a Initial Commit

2025-09-17 16:09:15 -05:00

5.8 KiB

Raw Blame History

Analysis Findings - JSON Vehicle Data

Data Source Overview

Location: mvp-platform-services/vehicles/etl/sources/makes/
File Count: 55 JSON files
File Naming: Lowercase with underscores (e.g., alfa_romeo.json, land_rover.json)
Data Structure: Hierarchical vehicle data by make

JSON File Structure Analysis

Standard Structure

{
  "[make_name]": [
    {
      "year": "2024",
      "models": [
        {
          "name": "model_name",
          "engines": [
            "2.0L I4",
            "3.5L V6 TURBO"
          ],
          "submodels": [
            "Base",
            "Premium",
            "Limited"
          ]
        }
      ]
    }
  ]
}

Key Data Points

Make Level: Root key matches filename (lowercase)
Year Level: Array of yearly data
Model Level: Array of models per year
Engines: Array of engine specifications
Submodels: Array of trim levels

Make Name Analysis

File Naming vs Display Name Issues

Filename	Required Display Name	Issue
`alfa_romeo.json`	"Alfa Romeo"	Underscore → space, title case
`land_rover.json`	"Land Rover"	Underscore → space, title case
`rolls_royce.json`	"Rolls Royce"	Underscore → space, title case
`chevrolet.json`	"Chevrolet"	Direct match
`bmw.json`	"BMW"	Uppercase required

Make Name Normalization Rules

Replace underscores with spaces
Title case each word
Special cases: BMW, GMC (all caps)
Validation: Cross-reference with sources/makes.json

Engine Specification Analysis

Discovered Engine Patterns

From analysis of Nissan, Toyota, Ford, Subaru, and Porsche files:

Standard Format: `{displacement}L {config}{cylinders}`

"2.0L I4" - 2.0 liter, Inline 4-cylinder
"3.5L V6" - 3.5 liter, V6 configuration
"2.4L H4" - 2.4 liter, Horizontal (Boxer) 4-cylinder

Configuration Types Found

I = Inline (most common)
V = V-configuration
H = Horizontal/Boxer (Subaru, Porsche)
L = MUST BE TREATED AS INLINE (L3 → I3)

Engine Modifier Patterns

Hybrid Classifications

"PLUG-IN HYBRID EV- (PHEV)" - Plug-in hybrid electric vehicle
"FULL HYBRID EV- (FHEV)" - Full hybrid electric vehicle
"HYBRID" - General hybrid designation

Fuel Type Modifiers

"FLEX" - Flex-fuel capability (e.g., "5.6L V8 FLEX")
"ELECTRIC" - Pure electric motor
"TURBO" - Turbocharged (less common in current data)

Example Engine Strings

"2.5L I4 FULL HYBRID EV- (FHEV)"
"1.5L L3 PLUG-IN HYBRID EV- (PHEV)"  // L3 → I3
"5.6L V8 FLEX"
"2.4L H4"  // Subaru Boxer
"1.8L I4 ELECTRIC"

Special Cases Analysis

Electric Vehicle Handling

Tesla Example (tesla.json):

{
  "name": "3",
  "engines": [],  // Empty array
  "submodels": ["Long Range AWD", "Performance"]
}

Lucid Example (lucid.json):

{
  "name": "air",
  "engines": [],  // Empty array
  "submodels": []
}

Electric Vehicle Requirements

Empty engines arrays are common for pure electric vehicles
Must create default engine: "Electric Motor" with appropriate specs
Fuel type: "Electric"
Configuration: null or "Electric"

Hybrid Vehicle Patterns

From Toyota analysis - hybrid appears in both engines and submodels:

Engine level: "1.8L I4 ELECTRIC"
Submodel level: "Hybrid LE", "Hybrid XSE"

Data Quality Issues Found

Missing Engine Data

Tesla models: Consistently empty engines arrays
Lucid models: Empty engines arrays
Some Nissan models: Empty engines for electric variants

Inconsistent Submodel Data

Mix of trim levels and descriptors
Some technical specifications in submodel names
Inconsistent naming patterns across makes

Engine Specification Inconsistencies

L-configuration usage: Should be normalized to I (Inline)
Mixed hybrid notation: Sometimes in engine string, sometimes separate
Abbreviation variations: EV- vs EV, FHEV vs FULL HYBRID

Database Mapping Strategy

Make Mapping

Filename: "alfa_romeo.json" → Database: "Alfa Romeo"

Model Mapping

JSON models.name → vehicles.model.name

Engine Mapping

JSON engines[0] → vehicles.engine.name (with parsing)
Engine parsing → displacement_l, cylinders, fuel_type, aspiration

Trim Mapping

JSON submodels[0] → vehicles.trim.name

Data Volume Estimates

File Size Analysis

Largest files: toyota.json (~748KB), volkswagen.json (~738KB)
Smallest files: lucid.json (~176B), rivian.json (~177B)
Average file size: ~150KB

Record Estimates (Based on Sample Analysis)

Makes: 55 (one per file)
Models per make: 5-50 (highly variable)
Years per model: 10-15 years average
Trims per model-year: 3-10 average
Engines: 500-1000 unique engines total

Processing Recommendations

Order of Operations

Load makes - Create make records with normalized names
Load models - Associate with correct make_id
Load model_years - Create year availability
Parse and load engines - Handle L→I normalization
Load trims - Associate with model_year_id
Create trim_engine relationships

Error Handling Requirements

Handle empty engines arrays (electric vehicles)
Validate engine parsing (log unparseable engines)
Handle duplicate records (upsert strategy)
Report data quality issues (missing data, parsing failures)

Validation Strategy

Cross-reference makes with existing sources/makes.json
Validate engine parsing with regex patterns
Check referential integrity during loading
Report statistics per make (models, engines, trims loaded)

5.8 KiB Raw Blame History