Initial Commit
This commit is contained in:
262
docs/changes/vehicles-dropdown-v2/03-engine-spec-parsing.md
Normal file
262
docs/changes/vehicles-dropdown-v2/03-engine-spec-parsing.md
Normal file
@@ -0,0 +1,262 @@
|
||||
# Engine Specification Parsing Rules
|
||||
|
||||
## Overview
|
||||
Comprehensive rules for parsing engine specifications from JSON files into PostgreSQL engine table structure.
|
||||
|
||||
## Standard Engine Format
|
||||
### Pattern: `{displacement}L {configuration}{cylinders} {modifiers}`
|
||||
|
||||
Examples:
|
||||
- `"2.0L I4"` → 2.0L, Inline, 4-cylinder
|
||||
- `"3.5L V6 TURBO"` → 3.5L, V6, Turbocharged
|
||||
- `"1.5L L3 PLUG-IN HYBRID EV- (PHEV)"` → 1.5L, **Inline** (L→I), 3-cyl, Plug-in Hybrid
|
||||
|
||||
## Configuration Normalization Rules
|
||||
|
||||
### CRITICAL: L-Configuration Handling
|
||||
**L-configurations MUST be treated as Inline (I)**
|
||||
|
||||
| Input | Normalized | Reasoning |
|
||||
|-------|------------|-----------|
|
||||
| `"1.5L L3"` | `"1.5L I3"` | L3 is alternate notation for Inline 3-cylinder |
|
||||
| `"2.0L L4"` | `"2.0L I4"` | L4 is alternate notation for Inline 4-cylinder |
|
||||
| `"1.2L L3 FULL HYBRID EV- (FHEV)"` | `"1.2L I3"` + Hybrid | L→I normalization + hybrid flag |
|
||||
|
||||
### Configuration Types
|
||||
- **I** = Inline (most common)
|
||||
- **V** = V-configuration
|
||||
- **H** = Horizontal/Boxer (Subaru, Porsche)
|
||||
- **L** = **Convert to I** (alternate Inline notation)
|
||||
|
||||
## Engine Parsing Implementation
|
||||
|
||||
### Regex Patterns
|
||||
```python
|
||||
# Primary engine pattern
|
||||
ENGINE_PATTERN = r'(\d+\.?\d*)L\s+([IVHL])(\d+)'
|
||||
|
||||
# Modifier patterns
|
||||
HYBRID_PATTERNS = [
|
||||
r'PLUG-IN HYBRID EV-?\s*\(PHEV\)',
|
||||
r'FULL HYBRID EV-?\s*\(FHEV\)',
|
||||
r'HYBRID'
|
||||
]
|
||||
|
||||
FUEL_PATTERNS = [
|
||||
r'FLEX',
|
||||
r'ELECTRIC',
|
||||
r'TURBO',
|
||||
r'SUPERCHARGED'
|
||||
]
|
||||
```
|
||||
|
||||
### Parsing Algorithm
|
||||
```python
|
||||
def parse_engine_string(engine_str: str) -> EngineSpec:
|
||||
# 1. Extract base components (displacement, config, cylinders)
|
||||
match = re.match(ENGINE_PATTERN, engine_str)
|
||||
displacement = float(match.group(1))
|
||||
config = normalize_configuration(match.group(2)) # L→I here
|
||||
cylinders = int(match.group(3))
|
||||
|
||||
# 2. Detect fuel type and aspiration from modifiers
|
||||
fuel_type = extract_fuel_type(engine_str)
|
||||
aspiration = extract_aspiration(engine_str)
|
||||
|
||||
return EngineSpec(
|
||||
displacement_l=displacement,
|
||||
configuration=config,
|
||||
cylinders=cylinders,
|
||||
fuel_type=fuel_type,
|
||||
aspiration=aspiration,
|
||||
raw_string=engine_str
|
||||
)
|
||||
|
||||
def normalize_configuration(config: str) -> str:
|
||||
"""CRITICAL: Convert L to I"""
|
||||
return 'I' if config == 'L' else config
|
||||
```
|
||||
|
||||
## Fuel Type Detection
|
||||
|
||||
### Hybrid Classifications
|
||||
| Pattern | Database Value | Description |
|
||||
|---------|---------------|-------------|
|
||||
| `"PLUG-IN HYBRID EV- (PHEV)"` | `"Plug-in Hybrid"` | Plug-in hybrid electric |
|
||||
| `"FULL HYBRID EV- (FHEV)"` | `"Full Hybrid"` | Full hybrid electric |
|
||||
| `"HYBRID"` | `"Hybrid"` | General hybrid |
|
||||
|
||||
### Other Fuel Types
|
||||
| Pattern | Database Value | Description |
|
||||
|---------|---------------|-------------|
|
||||
| `"FLEX"` | `"Flex Fuel"` | Flex-fuel capability |
|
||||
| `"ELECTRIC"` | `"Electric"` | Pure electric |
|
||||
| No modifier | `"Gasoline"` | Default assumption |
|
||||
|
||||
## Aspiration Detection
|
||||
|
||||
### Forced Induction
|
||||
| Pattern | Database Value | Description |
|
||||
|---------|---------------|-------------|
|
||||
| `"TURBO"` | `"Turbocharged"` | Turbocharged engine |
|
||||
| `"SUPERCHARGED"` | `"Supercharged"` | Supercharged engine |
|
||||
| `"SC"` | `"Supercharged"` | Supercharged (short form) |
|
||||
| No modifier | `"Natural"` | Naturally aspirated |
|
||||
|
||||
## Real-World Examples
|
||||
|
||||
### Standard Engines
|
||||
```
|
||||
Input: "2.0L I4"
|
||||
Output: EngineSpec(
|
||||
displacement_l=2.0,
|
||||
configuration="I",
|
||||
cylinders=4,
|
||||
fuel_type="Gasoline",
|
||||
aspiration="Natural",
|
||||
raw_string="2.0L I4"
|
||||
)
|
||||
```
|
||||
|
||||
### L→I Normalization Example
|
||||
```
|
||||
Input: "1.5L L3 PLUG-IN HYBRID EV- (PHEV)"
|
||||
Output: EngineSpec(
|
||||
displacement_l=1.5,
|
||||
configuration="I", # L normalized to I
|
||||
cylinders=3,
|
||||
fuel_type="Plug-in Hybrid",
|
||||
aspiration="Natural",
|
||||
raw_string="1.5L L3 PLUG-IN HYBRID EV- (PHEV)"
|
||||
)
|
||||
```
|
||||
|
||||
### Subaru Boxer Engine
|
||||
```
|
||||
Input: "2.4L H4"
|
||||
Output: EngineSpec(
|
||||
displacement_l=2.4,
|
||||
configuration="H", # Horizontal/Boxer
|
||||
cylinders=4,
|
||||
fuel_type="Gasoline",
|
||||
aspiration="Natural",
|
||||
raw_string="2.4L H4"
|
||||
)
|
||||
```
|
||||
|
||||
### Flex Fuel Engine
|
||||
```
|
||||
Input: "5.6L V8 FLEX"
|
||||
Output: EngineSpec(
|
||||
displacement_l=5.6,
|
||||
configuration="V",
|
||||
cylinders=8,
|
||||
fuel_type="Flex Fuel",
|
||||
aspiration="Natural",
|
||||
raw_string="5.6L V8 FLEX"
|
||||
)
|
||||
```
|
||||
|
||||
## Electric Vehicle Handling
|
||||
|
||||
### Empty Engines Arrays
|
||||
When `engines: []` is found (common in Tesla, Lucid):
|
||||
|
||||
```python
|
||||
def create_default_electric_engine() -> EngineSpec:
|
||||
return EngineSpec(
|
||||
displacement_l=None, # N/A for electric
|
||||
configuration="Electric", # Special designation
|
||||
cylinders=None, # N/A for electric
|
||||
fuel_type="Electric",
|
||||
aspiration=None, # N/A for electric
|
||||
raw_string="Electric Motor"
|
||||
)
|
||||
```
|
||||
|
||||
### Electric Motor Naming
|
||||
Default name: `"Electric Motor"`
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Unparseable Engines
|
||||
For engines that don't match standard patterns:
|
||||
1. **Log warning** with original string
|
||||
2. **Create fallback engine** with raw_string preserved
|
||||
3. **Continue processing** (don't fail entire make)
|
||||
|
||||
```python
|
||||
def create_fallback_engine(raw_string: str) -> EngineSpec:
|
||||
return EngineSpec(
|
||||
displacement_l=None,
|
||||
configuration="Unknown",
|
||||
cylinders=None,
|
||||
fuel_type="Unknown",
|
||||
aspiration="Natural",
|
||||
raw_string=raw_string
|
||||
)
|
||||
```
|
||||
|
||||
### Validation Rules
|
||||
1. **Displacement**: Must be positive number if present
|
||||
2. **Configuration**: Must be I, V, H, or Electric
|
||||
3. **Cylinders**: Must be positive integer if present
|
||||
4. **Required**: At least raw_string must be preserved
|
||||
|
||||
## Database Storage
|
||||
|
||||
### Engine Table Mapping
|
||||
```sql
|
||||
INSERT INTO vehicles.engine (
|
||||
name, -- Original string or "Electric Motor"
|
||||
code, -- NULL (not available in JSON)
|
||||
displacement_l, -- Parsed displacement
|
||||
cylinders, -- Parsed cylinder count
|
||||
fuel_type, -- Parsed or "Gasoline" default
|
||||
aspiration -- Parsed or "Natural" default
|
||||
)
|
||||
```
|
||||
|
||||
### Example Database Records
|
||||
```sql
|
||||
-- Standard engine
|
||||
('2.0L I4', NULL, 2.0, 4, 'Gasoline', 'Natural')
|
||||
|
||||
-- L→I normalized
|
||||
('1.5L I3', NULL, 1.5, 3, 'Plug-in Hybrid', 'Natural')
|
||||
|
||||
-- Electric vehicle
|
||||
('Electric Motor', NULL, NULL, NULL, 'Electric', NULL)
|
||||
|
||||
-- Subaru Boxer
|
||||
('2.4L H4', NULL, 2.4, 4, 'Gasoline', 'Natural')
|
||||
```
|
||||
|
||||
## Testing Requirements
|
||||
|
||||
### Unit Test Cases
|
||||
1. **L→I normalization**: `"1.5L L3"` → `configuration="I"`
|
||||
2. **Hybrid detection**: All PHEV, FHEV, HYBRID patterns
|
||||
3. **Configuration types**: I, V, H preservation
|
||||
4. **Electric vehicles**: Empty array handling
|
||||
5. **Error cases**: Unparseable strings
|
||||
6. **Edge cases**: Missing displacement, unusual formats
|
||||
|
||||
### Integration Test Cases
|
||||
1. **Real JSON data**: Process actual make files
|
||||
2. **Database storage**: Verify correct database records
|
||||
3. **API compatibility**: Ensure dropdown endpoints work
|
||||
4. **Performance**: Parse 1000+ engines efficiently
|
||||
|
||||
## Future Considerations
|
||||
|
||||
### Potential Enhancements
|
||||
1. **Turbo detection**: More sophisticated forced induction parsing
|
||||
2. **Engine codes**: Extract manufacturer engine codes where available
|
||||
3. **Performance specs**: Parse horsepower/torque if present in future data
|
||||
4. **Validation**: Cross-reference with automotive databases
|
||||
|
||||
### Backwards Compatibility
|
||||
- **MSSQL pipeline**: Must continue working unchanged
|
||||
- **API responses**: Same format regardless of data source
|
||||
- **Database schema**: No breaking changes required
|
||||
Reference in New Issue
Block a user