Files
motovaultpro/docs/changes/vehicles-dropdown-v2/03-engine-spec-parsing.md
Eric Gullickson a052040e3a Initial Commit
2025-09-17 16:09:15 -05:00

7.2 KiB

Engine Specification Parsing Rules

Overview

Comprehensive rules for parsing engine specifications from JSON files into PostgreSQL engine table structure.

Standard Engine Format

Pattern: {displacement}L {configuration}{cylinders} {modifiers}

Examples:

  • "2.0L I4" → 2.0L, Inline, 4-cylinder
  • "3.5L V6 TURBO" → 3.5L, V6, Turbocharged
  • "1.5L L3 PLUG-IN HYBRID EV- (PHEV)" → 1.5L, Inline (L→I), 3-cyl, Plug-in Hybrid

Configuration Normalization Rules

CRITICAL: L-Configuration Handling

L-configurations MUST be treated as Inline (I)

Input Normalized Reasoning
"1.5L L3" "1.5L I3" L3 is alternate notation for Inline 3-cylinder
"2.0L L4" "2.0L I4" L4 is alternate notation for Inline 4-cylinder
"1.2L L3 FULL HYBRID EV- (FHEV)" "1.2L I3" + Hybrid L→I normalization + hybrid flag

Configuration Types

  • I = Inline (most common)
  • V = V-configuration
  • H = Horizontal/Boxer (Subaru, Porsche)
  • L = Convert to I (alternate Inline notation)

Engine Parsing Implementation

Regex Patterns

# Primary engine pattern
ENGINE_PATTERN = r'(\d+\.?\d*)L\s+([IVHL])(\d+)'

# Modifier patterns
HYBRID_PATTERNS = [
    r'PLUG-IN HYBRID EV-?\s*\(PHEV\)',
    r'FULL HYBRID EV-?\s*\(FHEV\)',
    r'HYBRID'
]

FUEL_PATTERNS = [
    r'FLEX',
    r'ELECTRIC',
    r'TURBO',
    r'SUPERCHARGED'
]

Parsing Algorithm

def parse_engine_string(engine_str: str) -> EngineSpec:
    # 1. Extract base components (displacement, config, cylinders)
    match = re.match(ENGINE_PATTERN, engine_str)
    displacement = float(match.group(1))
    config = normalize_configuration(match.group(2))  # L→I here
    cylinders = int(match.group(3))
    
    # 2. Detect fuel type and aspiration from modifiers
    fuel_type = extract_fuel_type(engine_str)
    aspiration = extract_aspiration(engine_str)
    
    return EngineSpec(
        displacement_l=displacement,
        configuration=config,
        cylinders=cylinders,
        fuel_type=fuel_type,
        aspiration=aspiration,
        raw_string=engine_str
    )

def normalize_configuration(config: str) -> str:
    """CRITICAL: Convert L to I"""
    return 'I' if config == 'L' else config

Fuel Type Detection

Hybrid Classifications

Pattern Database Value Description
"PLUG-IN HYBRID EV- (PHEV)" "Plug-in Hybrid" Plug-in hybrid electric
"FULL HYBRID EV- (FHEV)" "Full Hybrid" Full hybrid electric
"HYBRID" "Hybrid" General hybrid

Other Fuel Types

Pattern Database Value Description
"FLEX" "Flex Fuel" Flex-fuel capability
"ELECTRIC" "Electric" Pure electric
No modifier "Gasoline" Default assumption

Aspiration Detection

Forced Induction

Pattern Database Value Description
"TURBO" "Turbocharged" Turbocharged engine
"SUPERCHARGED" "Supercharged" Supercharged engine
"SC" "Supercharged" Supercharged (short form)
No modifier "Natural" Naturally aspirated

Real-World Examples

Standard Engines

Input: "2.0L I4"
Output: EngineSpec(
    displacement_l=2.0,
    configuration="I",
    cylinders=4,
    fuel_type="Gasoline",
    aspiration="Natural",
    raw_string="2.0L I4"
)

L→I Normalization Example

Input: "1.5L L3 PLUG-IN HYBRID EV- (PHEV)"
Output: EngineSpec(
    displacement_l=1.5,
    configuration="I",     # L normalized to I
    cylinders=3,
    fuel_type="Plug-in Hybrid",
    aspiration="Natural",
    raw_string="1.5L L3 PLUG-IN HYBRID EV- (PHEV)"
)

Subaru Boxer Engine

Input: "2.4L H4"
Output: EngineSpec(
    displacement_l=2.4,
    configuration="H",     # Horizontal/Boxer
    cylinders=4,
    fuel_type="Gasoline",
    aspiration="Natural",
    raw_string="2.4L H4"
)

Flex Fuel Engine

Input: "5.6L V8 FLEX"
Output: EngineSpec(
    displacement_l=5.6,
    configuration="V",
    cylinders=8,
    fuel_type="Flex Fuel",
    aspiration="Natural",
    raw_string="5.6L V8 FLEX"
)

Electric Vehicle Handling

Empty Engines Arrays

When engines: [] is found (common in Tesla, Lucid):

def create_default_electric_engine() -> EngineSpec:
    return EngineSpec(
        displacement_l=None,      # N/A for electric
        configuration="Electric", # Special designation
        cylinders=None,          # N/A for electric
        fuel_type="Electric",
        aspiration=None,         # N/A for electric
        raw_string="Electric Motor"
    )

Electric Motor Naming

Default name: "Electric Motor"

Error Handling

Unparseable Engines

For engines that don't match standard patterns:

  1. Log warning with original string
  2. Create fallback engine with raw_string preserved
  3. Continue processing (don't fail entire make)
def create_fallback_engine(raw_string: str) -> EngineSpec:
    return EngineSpec(
        displacement_l=None,
        configuration="Unknown",
        cylinders=None,
        fuel_type="Unknown",
        aspiration="Natural",
        raw_string=raw_string
    )

Validation Rules

  1. Displacement: Must be positive number if present
  2. Configuration: Must be I, V, H, or Electric
  3. Cylinders: Must be positive integer if present
  4. Required: At least raw_string must be preserved

Database Storage

Engine Table Mapping

INSERT INTO vehicles.engine (
    name,           -- Original string or "Electric Motor"
    code,           -- NULL (not available in JSON)
    displacement_l, -- Parsed displacement
    cylinders,      -- Parsed cylinder count
    fuel_type,      -- Parsed or "Gasoline" default
    aspiration      -- Parsed or "Natural" default
)

Example Database Records

-- Standard engine
('2.0L I4', NULL, 2.0, 4, 'Gasoline', 'Natural')

-- L→I normalized
('1.5L I3', NULL, 1.5, 3, 'Plug-in Hybrid', 'Natural')

-- Electric vehicle
('Electric Motor', NULL, NULL, NULL, 'Electric', NULL)

-- Subaru Boxer
('2.4L H4', NULL, 2.4, 4, 'Gasoline', 'Natural')

Testing Requirements

Unit Test Cases

  1. L→I normalization: "1.5L L3"configuration="I"
  2. Hybrid detection: All PHEV, FHEV, HYBRID patterns
  3. Configuration types: I, V, H preservation
  4. Electric vehicles: Empty array handling
  5. Error cases: Unparseable strings
  6. Edge cases: Missing displacement, unusual formats

Integration Test Cases

  1. Real JSON data: Process actual make files
  2. Database storage: Verify correct database records
  3. API compatibility: Ensure dropdown endpoints work
  4. Performance: Parse 1000+ engines efficiently

Future Considerations

Potential Enhancements

  1. Turbo detection: More sophisticated forced induction parsing
  2. Engine codes: Extract manufacturer engine codes where available
  3. Performance specs: Parse horsepower/torque if present in future data
  4. Validation: Cross-reference with automotive databases

Backwards Compatibility

  • MSSQL pipeline: Must continue working unchanged
  • API responses: Same format regardless of data source
  • Database schema: No breaking changes required