# Engine Specification Parsing Rules ## Overview Comprehensive rules for parsing engine specifications from JSON files into PostgreSQL engine table structure. ## Standard Engine Format ### Pattern: `{displacement}L {configuration}{cylinders} {modifiers}` Examples: - `"2.0L I4"` → 2.0L, Inline, 4-cylinder - `"3.5L V6 TURBO"` → 3.5L, V6, Turbocharged - `"1.5L L3 PLUG-IN HYBRID EV- (PHEV)"` → 1.5L, **Inline** (L→I), 3-cyl, Plug-in Hybrid ## Configuration Normalization Rules ### CRITICAL: L-Configuration Handling **L-configurations MUST be treated as Inline (I)** | Input | Normalized | Reasoning | |-------|------------|-----------| | `"1.5L L3"` | `"1.5L I3"` | L3 is alternate notation for Inline 3-cylinder | | `"2.0L L4"` | `"2.0L I4"` | L4 is alternate notation for Inline 4-cylinder | | `"1.2L L3 FULL HYBRID EV- (FHEV)"` | `"1.2L I3"` + Hybrid | L→I normalization + hybrid flag | ### Configuration Types - **I** = Inline (most common) - **V** = V-configuration - **H** = Horizontal/Boxer (Subaru, Porsche) - **L** = **Convert to I** (alternate Inline notation) ## Engine Parsing Implementation ### Regex Patterns ```python # Primary engine pattern ENGINE_PATTERN = r'(\d+\.?\d*)L\s+([IVHL])(\d+)' # Modifier patterns HYBRID_PATTERNS = [ r'PLUG-IN HYBRID EV-?\s*\(PHEV\)', r'FULL HYBRID EV-?\s*\(FHEV\)', r'HYBRID' ] FUEL_PATTERNS = [ r'FLEX', r'ELECTRIC', r'TURBO', r'SUPERCHARGED' ] ``` ### Parsing Algorithm ```python def parse_engine_string(engine_str: str) -> EngineSpec: # 1. Extract base components (displacement, config, cylinders) match = re.match(ENGINE_PATTERN, engine_str) displacement = float(match.group(1)) config = normalize_configuration(match.group(2)) # L→I here cylinders = int(match.group(3)) # 2. Detect fuel type and aspiration from modifiers fuel_type = extract_fuel_type(engine_str) aspiration = extract_aspiration(engine_str) return EngineSpec( displacement_l=displacement, configuration=config, cylinders=cylinders, fuel_type=fuel_type, aspiration=aspiration, raw_string=engine_str ) def normalize_configuration(config: str) -> str: """CRITICAL: Convert L to I""" return 'I' if config == 'L' else config ``` ## Fuel Type Detection ### Hybrid Classifications | Pattern | Database Value | Description | |---------|---------------|-------------| | `"PLUG-IN HYBRID EV- (PHEV)"` | `"Plug-in Hybrid"` | Plug-in hybrid electric | | `"FULL HYBRID EV- (FHEV)"` | `"Full Hybrid"` | Full hybrid electric | | `"HYBRID"` | `"Hybrid"` | General hybrid | ### Other Fuel Types | Pattern | Database Value | Description | |---------|---------------|-------------| | `"FLEX"` | `"Flex Fuel"` | Flex-fuel capability | | `"ELECTRIC"` | `"Electric"` | Pure electric | | No modifier | `"Gasoline"` | Default assumption | ## Aspiration Detection ### Forced Induction | Pattern | Database Value | Description | |---------|---------------|-------------| | `"TURBO"` | `"Turbocharged"` | Turbocharged engine | | `"SUPERCHARGED"` | `"Supercharged"` | Supercharged engine | | `"SC"` | `"Supercharged"` | Supercharged (short form) | | No modifier | `"Natural"` | Naturally aspirated | ## Real-World Examples ### Standard Engines ``` Input: "2.0L I4" Output: EngineSpec( displacement_l=2.0, configuration="I", cylinders=4, fuel_type="Gasoline", aspiration="Natural", raw_string="2.0L I4" ) ``` ### L→I Normalization Example ``` Input: "1.5L L3 PLUG-IN HYBRID EV- (PHEV)" Output: EngineSpec( displacement_l=1.5, configuration="I", # L normalized to I cylinders=3, fuel_type="Plug-in Hybrid", aspiration="Natural", raw_string="1.5L L3 PLUG-IN HYBRID EV- (PHEV)" ) ``` ### Subaru Boxer Engine ``` Input: "2.4L H4" Output: EngineSpec( displacement_l=2.4, configuration="H", # Horizontal/Boxer cylinders=4, fuel_type="Gasoline", aspiration="Natural", raw_string="2.4L H4" ) ``` ### Flex Fuel Engine ``` Input: "5.6L V8 FLEX" Output: EngineSpec( displacement_l=5.6, configuration="V", cylinders=8, fuel_type="Flex Fuel", aspiration="Natural", raw_string="5.6L V8 FLEX" ) ``` ## Electric Vehicle Handling ### Empty Engines Arrays When `engines: []` is found (common in Tesla, Lucid): ```python def create_default_electric_engine() -> EngineSpec: return EngineSpec( displacement_l=None, # N/A for electric configuration="Electric", # Special designation cylinders=None, # N/A for electric fuel_type="Electric", aspiration=None, # N/A for electric raw_string="Electric Motor" ) ``` ### Electric Motor Naming Default name: `"Electric Motor"` ## Error Handling ### Unparseable Engines For engines that don't match standard patterns: 1. **Log warning** with original string 2. **Create fallback engine** with raw_string preserved 3. **Continue processing** (don't fail entire make) ```python def create_fallback_engine(raw_string: str) -> EngineSpec: return EngineSpec( displacement_l=None, configuration="Unknown", cylinders=None, fuel_type="Unknown", aspiration="Natural", raw_string=raw_string ) ``` ### Validation Rules 1. **Displacement**: Must be positive number if present 2. **Configuration**: Must be I, V, H, or Electric 3. **Cylinders**: Must be positive integer if present 4. **Required**: At least raw_string must be preserved ## Database Storage ### Engine Table Mapping ```sql INSERT INTO vehicles.engine ( name, -- Original string or "Electric Motor" code, -- NULL (not available in JSON) displacement_l, -- Parsed displacement cylinders, -- Parsed cylinder count fuel_type, -- Parsed or "Gasoline" default aspiration -- Parsed or "Natural" default ) ``` ### Example Database Records ```sql -- Standard engine ('2.0L I4', NULL, 2.0, 4, 'Gasoline', 'Natural') -- L→I normalized ('1.5L I3', NULL, 1.5, 3, 'Plug-in Hybrid', 'Natural') -- Electric vehicle ('Electric Motor', NULL, NULL, NULL, 'Electric', NULL) -- Subaru Boxer ('2.4L H4', NULL, 2.4, 4, 'Gasoline', 'Natural') ``` ## Testing Requirements ### Unit Test Cases 1. **L→I normalization**: `"1.5L L3"` → `configuration="I"` 2. **Hybrid detection**: All PHEV, FHEV, HYBRID patterns 3. **Configuration types**: I, V, H preservation 4. **Electric vehicles**: Empty array handling 5. **Error cases**: Unparseable strings 6. **Edge cases**: Missing displacement, unusual formats ### Integration Test Cases 1. **Real JSON data**: Process actual make files 2. **Database storage**: Verify correct database records 3. **API compatibility**: Ensure dropdown endpoints work 4. **Performance**: Parse 1000+ engines efficiently ## Future Considerations ### Potential Enhancements 1. **Turbo detection**: More sophisticated forced induction parsing 2. **Engine codes**: Extract manufacturer engine codes where available 3. **Performance specs**: Parse horsepower/torque if present in future data 4. **Validation**: Cross-reference with automotive databases ### Backwards Compatibility - **MSSQL pipeline**: Must continue working unchanged - **API responses**: Same format regardless of data source - **Database schema**: No breaking changes required