Files
motovaultpro/docs/changes/vehicles-dropdown-v2/03-engine-spec-parsing.md
Eric Gullickson a052040e3a Initial Commit
2025-09-17 16:09:15 -05:00

262 lines
7.2 KiB
Markdown

# Engine Specification Parsing Rules
## Overview
Comprehensive rules for parsing engine specifications from JSON files into PostgreSQL engine table structure.
## Standard Engine Format
### Pattern: `{displacement}L {configuration}{cylinders} {modifiers}`
Examples:
- `"2.0L I4"` → 2.0L, Inline, 4-cylinder
- `"3.5L V6 TURBO"` → 3.5L, V6, Turbocharged
- `"1.5L L3 PLUG-IN HYBRID EV- (PHEV)"` → 1.5L, **Inline** (L→I), 3-cyl, Plug-in Hybrid
## Configuration Normalization Rules
### CRITICAL: L-Configuration Handling
**L-configurations MUST be treated as Inline (I)**
| Input | Normalized | Reasoning |
|-------|------------|-----------|
| `"1.5L L3"` | `"1.5L I3"` | L3 is alternate notation for Inline 3-cylinder |
| `"2.0L L4"` | `"2.0L I4"` | L4 is alternate notation for Inline 4-cylinder |
| `"1.2L L3 FULL HYBRID EV- (FHEV)"` | `"1.2L I3"` + Hybrid | L→I normalization + hybrid flag |
### Configuration Types
- **I** = Inline (most common)
- **V** = V-configuration
- **H** = Horizontal/Boxer (Subaru, Porsche)
- **L** = **Convert to I** (alternate Inline notation)
## Engine Parsing Implementation
### Regex Patterns
```python
# Primary engine pattern
ENGINE_PATTERN = r'(\d+\.?\d*)L\s+([IVHL])(\d+)'
# Modifier patterns
HYBRID_PATTERNS = [
r'PLUG-IN HYBRID EV-?\s*\(PHEV\)',
r'FULL HYBRID EV-?\s*\(FHEV\)',
r'HYBRID'
]
FUEL_PATTERNS = [
r'FLEX',
r'ELECTRIC',
r'TURBO',
r'SUPERCHARGED'
]
```
### Parsing Algorithm
```python
def parse_engine_string(engine_str: str) -> EngineSpec:
# 1. Extract base components (displacement, config, cylinders)
match = re.match(ENGINE_PATTERN, engine_str)
displacement = float(match.group(1))
config = normalize_configuration(match.group(2)) # L→I here
cylinders = int(match.group(3))
# 2. Detect fuel type and aspiration from modifiers
fuel_type = extract_fuel_type(engine_str)
aspiration = extract_aspiration(engine_str)
return EngineSpec(
displacement_l=displacement,
configuration=config,
cylinders=cylinders,
fuel_type=fuel_type,
aspiration=aspiration,
raw_string=engine_str
)
def normalize_configuration(config: str) -> str:
"""CRITICAL: Convert L to I"""
return 'I' if config == 'L' else config
```
## Fuel Type Detection
### Hybrid Classifications
| Pattern | Database Value | Description |
|---------|---------------|-------------|
| `"PLUG-IN HYBRID EV- (PHEV)"` | `"Plug-in Hybrid"` | Plug-in hybrid electric |
| `"FULL HYBRID EV- (FHEV)"` | `"Full Hybrid"` | Full hybrid electric |
| `"HYBRID"` | `"Hybrid"` | General hybrid |
### Other Fuel Types
| Pattern | Database Value | Description |
|---------|---------------|-------------|
| `"FLEX"` | `"Flex Fuel"` | Flex-fuel capability |
| `"ELECTRIC"` | `"Electric"` | Pure electric |
| No modifier | `"Gasoline"` | Default assumption |
## Aspiration Detection
### Forced Induction
| Pattern | Database Value | Description |
|---------|---------------|-------------|
| `"TURBO"` | `"Turbocharged"` | Turbocharged engine |
| `"SUPERCHARGED"` | `"Supercharged"` | Supercharged engine |
| `"SC"` | `"Supercharged"` | Supercharged (short form) |
| No modifier | `"Natural"` | Naturally aspirated |
## Real-World Examples
### Standard Engines
```
Input: "2.0L I4"
Output: EngineSpec(
displacement_l=2.0,
configuration="I",
cylinders=4,
fuel_type="Gasoline",
aspiration="Natural",
raw_string="2.0L I4"
)
```
### L→I Normalization Example
```
Input: "1.5L L3 PLUG-IN HYBRID EV- (PHEV)"
Output: EngineSpec(
displacement_l=1.5,
configuration="I", # L normalized to I
cylinders=3,
fuel_type="Plug-in Hybrid",
aspiration="Natural",
raw_string="1.5L L3 PLUG-IN HYBRID EV- (PHEV)"
)
```
### Subaru Boxer Engine
```
Input: "2.4L H4"
Output: EngineSpec(
displacement_l=2.4,
configuration="H", # Horizontal/Boxer
cylinders=4,
fuel_type="Gasoline",
aspiration="Natural",
raw_string="2.4L H4"
)
```
### Flex Fuel Engine
```
Input: "5.6L V8 FLEX"
Output: EngineSpec(
displacement_l=5.6,
configuration="V",
cylinders=8,
fuel_type="Flex Fuel",
aspiration="Natural",
raw_string="5.6L V8 FLEX"
)
```
## Electric Vehicle Handling
### Empty Engines Arrays
When `engines: []` is found (common in Tesla, Lucid):
```python
def create_default_electric_engine() -> EngineSpec:
return EngineSpec(
displacement_l=None, # N/A for electric
configuration="Electric", # Special designation
cylinders=None, # N/A for electric
fuel_type="Electric",
aspiration=None, # N/A for electric
raw_string="Electric Motor"
)
```
### Electric Motor Naming
Default name: `"Electric Motor"`
## Error Handling
### Unparseable Engines
For engines that don't match standard patterns:
1. **Log warning** with original string
2. **Create fallback engine** with raw_string preserved
3. **Continue processing** (don't fail entire make)
```python
def create_fallback_engine(raw_string: str) -> EngineSpec:
return EngineSpec(
displacement_l=None,
configuration="Unknown",
cylinders=None,
fuel_type="Unknown",
aspiration="Natural",
raw_string=raw_string
)
```
### Validation Rules
1. **Displacement**: Must be positive number if present
2. **Configuration**: Must be I, V, H, or Electric
3. **Cylinders**: Must be positive integer if present
4. **Required**: At least raw_string must be preserved
## Database Storage
### Engine Table Mapping
```sql
INSERT INTO vehicles.engine (
name, -- Original string or "Electric Motor"
code, -- NULL (not available in JSON)
displacement_l, -- Parsed displacement
cylinders, -- Parsed cylinder count
fuel_type, -- Parsed or "Gasoline" default
aspiration -- Parsed or "Natural" default
)
```
### Example Database Records
```sql
-- Standard engine
('2.0L I4', NULL, 2.0, 4, 'Gasoline', 'Natural')
-- L→I normalized
('1.5L I3', NULL, 1.5, 3, 'Plug-in Hybrid', 'Natural')
-- Electric vehicle
('Electric Motor', NULL, NULL, NULL, 'Electric', NULL)
-- Subaru Boxer
('2.4L H4', NULL, 2.4, 4, 'Gasoline', 'Natural')
```
## Testing Requirements
### Unit Test Cases
1. **L→I normalization**: `"1.5L L3"``configuration="I"`
2. **Hybrid detection**: All PHEV, FHEV, HYBRID patterns
3. **Configuration types**: I, V, H preservation
4. **Electric vehicles**: Empty array handling
5. **Error cases**: Unparseable strings
6. **Edge cases**: Missing displacement, unusual formats
### Integration Test Cases
1. **Real JSON data**: Process actual make files
2. **Database storage**: Verify correct database records
3. **API compatibility**: Ensure dropdown endpoints work
4. **Performance**: Parse 1000+ engines efficiently
## Future Considerations
### Potential Enhancements
1. **Turbo detection**: More sophisticated forced induction parsing
2. **Engine codes**: Extract manufacturer engine codes where available
3. **Performance specs**: Parse horsepower/torque if present in future data
4. **Validation**: Cross-reference with automotive databases
### Backwards Compatibility
- **MSSQL pipeline**: Must continue working unchanged
- **API responses**: Same format regardless of data source
- **Database schema**: No breaking changes required