262 lines
7.2 KiB
Markdown
262 lines
7.2 KiB
Markdown
# Engine Specification Parsing Rules
|
|
|
|
## Overview
|
|
Comprehensive rules for parsing engine specifications from JSON files into PostgreSQL engine table structure.
|
|
|
|
## Standard Engine Format
|
|
### Pattern: `{displacement}L {configuration}{cylinders} {modifiers}`
|
|
|
|
Examples:
|
|
- `"2.0L I4"` → 2.0L, Inline, 4-cylinder
|
|
- `"3.5L V6 TURBO"` → 3.5L, V6, Turbocharged
|
|
- `"1.5L L3 PLUG-IN HYBRID EV- (PHEV)"` → 1.5L, **Inline** (L→I), 3-cyl, Plug-in Hybrid
|
|
|
|
## Configuration Normalization Rules
|
|
|
|
### CRITICAL: L-Configuration Handling
|
|
**L-configurations MUST be treated as Inline (I)**
|
|
|
|
| Input | Normalized | Reasoning |
|
|
|-------|------------|-----------|
|
|
| `"1.5L L3"` | `"1.5L I3"` | L3 is alternate notation for Inline 3-cylinder |
|
|
| `"2.0L L4"` | `"2.0L I4"` | L4 is alternate notation for Inline 4-cylinder |
|
|
| `"1.2L L3 FULL HYBRID EV- (FHEV)"` | `"1.2L I3"` + Hybrid | L→I normalization + hybrid flag |
|
|
|
|
### Configuration Types
|
|
- **I** = Inline (most common)
|
|
- **V** = V-configuration
|
|
- **H** = Horizontal/Boxer (Subaru, Porsche)
|
|
- **L** = **Convert to I** (alternate Inline notation)
|
|
|
|
## Engine Parsing Implementation
|
|
|
|
### Regex Patterns
|
|
```python
|
|
# Primary engine pattern
|
|
ENGINE_PATTERN = r'(\d+\.?\d*)L\s+([IVHL])(\d+)'
|
|
|
|
# Modifier patterns
|
|
HYBRID_PATTERNS = [
|
|
r'PLUG-IN HYBRID EV-?\s*\(PHEV\)',
|
|
r'FULL HYBRID EV-?\s*\(FHEV\)',
|
|
r'HYBRID'
|
|
]
|
|
|
|
FUEL_PATTERNS = [
|
|
r'FLEX',
|
|
r'ELECTRIC',
|
|
r'TURBO',
|
|
r'SUPERCHARGED'
|
|
]
|
|
```
|
|
|
|
### Parsing Algorithm
|
|
```python
|
|
def parse_engine_string(engine_str: str) -> EngineSpec:
|
|
# 1. Extract base components (displacement, config, cylinders)
|
|
match = re.match(ENGINE_PATTERN, engine_str)
|
|
displacement = float(match.group(1))
|
|
config = normalize_configuration(match.group(2)) # L→I here
|
|
cylinders = int(match.group(3))
|
|
|
|
# 2. Detect fuel type and aspiration from modifiers
|
|
fuel_type = extract_fuel_type(engine_str)
|
|
aspiration = extract_aspiration(engine_str)
|
|
|
|
return EngineSpec(
|
|
displacement_l=displacement,
|
|
configuration=config,
|
|
cylinders=cylinders,
|
|
fuel_type=fuel_type,
|
|
aspiration=aspiration,
|
|
raw_string=engine_str
|
|
)
|
|
|
|
def normalize_configuration(config: str) -> str:
|
|
"""CRITICAL: Convert L to I"""
|
|
return 'I' if config == 'L' else config
|
|
```
|
|
|
|
## Fuel Type Detection
|
|
|
|
### Hybrid Classifications
|
|
| Pattern | Database Value | Description |
|
|
|---------|---------------|-------------|
|
|
| `"PLUG-IN HYBRID EV- (PHEV)"` | `"Plug-in Hybrid"` | Plug-in hybrid electric |
|
|
| `"FULL HYBRID EV- (FHEV)"` | `"Full Hybrid"` | Full hybrid electric |
|
|
| `"HYBRID"` | `"Hybrid"` | General hybrid |
|
|
|
|
### Other Fuel Types
|
|
| Pattern | Database Value | Description |
|
|
|---------|---------------|-------------|
|
|
| `"FLEX"` | `"Flex Fuel"` | Flex-fuel capability |
|
|
| `"ELECTRIC"` | `"Electric"` | Pure electric |
|
|
| No modifier | `"Gasoline"` | Default assumption |
|
|
|
|
## Aspiration Detection
|
|
|
|
### Forced Induction
|
|
| Pattern | Database Value | Description |
|
|
|---------|---------------|-------------|
|
|
| `"TURBO"` | `"Turbocharged"` | Turbocharged engine |
|
|
| `"SUPERCHARGED"` | `"Supercharged"` | Supercharged engine |
|
|
| `"SC"` | `"Supercharged"` | Supercharged (short form) |
|
|
| No modifier | `"Natural"` | Naturally aspirated |
|
|
|
|
## Real-World Examples
|
|
|
|
### Standard Engines
|
|
```
|
|
Input: "2.0L I4"
|
|
Output: EngineSpec(
|
|
displacement_l=2.0,
|
|
configuration="I",
|
|
cylinders=4,
|
|
fuel_type="Gasoline",
|
|
aspiration="Natural",
|
|
raw_string="2.0L I4"
|
|
)
|
|
```
|
|
|
|
### L→I Normalization Example
|
|
```
|
|
Input: "1.5L L3 PLUG-IN HYBRID EV- (PHEV)"
|
|
Output: EngineSpec(
|
|
displacement_l=1.5,
|
|
configuration="I", # L normalized to I
|
|
cylinders=3,
|
|
fuel_type="Plug-in Hybrid",
|
|
aspiration="Natural",
|
|
raw_string="1.5L L3 PLUG-IN HYBRID EV- (PHEV)"
|
|
)
|
|
```
|
|
|
|
### Subaru Boxer Engine
|
|
```
|
|
Input: "2.4L H4"
|
|
Output: EngineSpec(
|
|
displacement_l=2.4,
|
|
configuration="H", # Horizontal/Boxer
|
|
cylinders=4,
|
|
fuel_type="Gasoline",
|
|
aspiration="Natural",
|
|
raw_string="2.4L H4"
|
|
)
|
|
```
|
|
|
|
### Flex Fuel Engine
|
|
```
|
|
Input: "5.6L V8 FLEX"
|
|
Output: EngineSpec(
|
|
displacement_l=5.6,
|
|
configuration="V",
|
|
cylinders=8,
|
|
fuel_type="Flex Fuel",
|
|
aspiration="Natural",
|
|
raw_string="5.6L V8 FLEX"
|
|
)
|
|
```
|
|
|
|
## Electric Vehicle Handling
|
|
|
|
### Empty Engines Arrays
|
|
When `engines: []` is found (common in Tesla, Lucid):
|
|
|
|
```python
|
|
def create_default_electric_engine() -> EngineSpec:
|
|
return EngineSpec(
|
|
displacement_l=None, # N/A for electric
|
|
configuration="Electric", # Special designation
|
|
cylinders=None, # N/A for electric
|
|
fuel_type="Electric",
|
|
aspiration=None, # N/A for electric
|
|
raw_string="Electric Motor"
|
|
)
|
|
```
|
|
|
|
### Electric Motor Naming
|
|
Default name: `"Electric Motor"`
|
|
|
|
## Error Handling
|
|
|
|
### Unparseable Engines
|
|
For engines that don't match standard patterns:
|
|
1. **Log warning** with original string
|
|
2. **Create fallback engine** with raw_string preserved
|
|
3. **Continue processing** (don't fail entire make)
|
|
|
|
```python
|
|
def create_fallback_engine(raw_string: str) -> EngineSpec:
|
|
return EngineSpec(
|
|
displacement_l=None,
|
|
configuration="Unknown",
|
|
cylinders=None,
|
|
fuel_type="Unknown",
|
|
aspiration="Natural",
|
|
raw_string=raw_string
|
|
)
|
|
```
|
|
|
|
### Validation Rules
|
|
1. **Displacement**: Must be positive number if present
|
|
2. **Configuration**: Must be I, V, H, or Electric
|
|
3. **Cylinders**: Must be positive integer if present
|
|
4. **Required**: At least raw_string must be preserved
|
|
|
|
## Database Storage
|
|
|
|
### Engine Table Mapping
|
|
```sql
|
|
INSERT INTO vehicles.engine (
|
|
name, -- Original string or "Electric Motor"
|
|
code, -- NULL (not available in JSON)
|
|
displacement_l, -- Parsed displacement
|
|
cylinders, -- Parsed cylinder count
|
|
fuel_type, -- Parsed or "Gasoline" default
|
|
aspiration -- Parsed or "Natural" default
|
|
)
|
|
```
|
|
|
|
### Example Database Records
|
|
```sql
|
|
-- Standard engine
|
|
('2.0L I4', NULL, 2.0, 4, 'Gasoline', 'Natural')
|
|
|
|
-- L→I normalized
|
|
('1.5L I3', NULL, 1.5, 3, 'Plug-in Hybrid', 'Natural')
|
|
|
|
-- Electric vehicle
|
|
('Electric Motor', NULL, NULL, NULL, 'Electric', NULL)
|
|
|
|
-- Subaru Boxer
|
|
('2.4L H4', NULL, 2.4, 4, 'Gasoline', 'Natural')
|
|
```
|
|
|
|
## Testing Requirements
|
|
|
|
### Unit Test Cases
|
|
1. **L→I normalization**: `"1.5L L3"` → `configuration="I"`
|
|
2. **Hybrid detection**: All PHEV, FHEV, HYBRID patterns
|
|
3. **Configuration types**: I, V, H preservation
|
|
4. **Electric vehicles**: Empty array handling
|
|
5. **Error cases**: Unparseable strings
|
|
6. **Edge cases**: Missing displacement, unusual formats
|
|
|
|
### Integration Test Cases
|
|
1. **Real JSON data**: Process actual make files
|
|
2. **Database storage**: Verify correct database records
|
|
3. **API compatibility**: Ensure dropdown endpoints work
|
|
4. **Performance**: Parse 1000+ engines efficiently
|
|
|
|
## Future Considerations
|
|
|
|
### Potential Enhancements
|
|
1. **Turbo detection**: More sophisticated forced induction parsing
|
|
2. **Engine codes**: Extract manufacturer engine codes where available
|
|
3. **Performance specs**: Parse horsepower/torque if present in future data
|
|
4. **Validation**: Cross-reference with automotive databases
|
|
|
|
### Backwards Compatibility
|
|
- **MSSQL pipeline**: Must continue working unchanged
|
|
- **API responses**: Same format regardless of data source
|
|
- **Database schema**: No breaking changes required |