7.2 KiB
7.2 KiB
Engine Specification Parsing Rules
Overview
Comprehensive rules for parsing engine specifications from JSON files into PostgreSQL engine table structure.
Standard Engine Format
Pattern: {displacement}L {configuration}{cylinders} {modifiers}
Examples:
"2.0L I4"→ 2.0L, Inline, 4-cylinder"3.5L V6 TURBO"→ 3.5L, V6, Turbocharged"1.5L L3 PLUG-IN HYBRID EV- (PHEV)"→ 1.5L, Inline (L→I), 3-cyl, Plug-in Hybrid
Configuration Normalization Rules
CRITICAL: L-Configuration Handling
L-configurations MUST be treated as Inline (I)
| Input | Normalized | Reasoning |
|---|---|---|
"1.5L L3" |
"1.5L I3" |
L3 is alternate notation for Inline 3-cylinder |
"2.0L L4" |
"2.0L I4" |
L4 is alternate notation for Inline 4-cylinder |
"1.2L L3 FULL HYBRID EV- (FHEV)" |
"1.2L I3" + Hybrid |
L→I normalization + hybrid flag |
Configuration Types
- I = Inline (most common)
- V = V-configuration
- H = Horizontal/Boxer (Subaru, Porsche)
- L = Convert to I (alternate Inline notation)
Engine Parsing Implementation
Regex Patterns
# Primary engine pattern
ENGINE_PATTERN = r'(\d+\.?\d*)L\s+([IVHL])(\d+)'
# Modifier patterns
HYBRID_PATTERNS = [
r'PLUG-IN HYBRID EV-?\s*\(PHEV\)',
r'FULL HYBRID EV-?\s*\(FHEV\)',
r'HYBRID'
]
FUEL_PATTERNS = [
r'FLEX',
r'ELECTRIC',
r'TURBO',
r'SUPERCHARGED'
]
Parsing Algorithm
def parse_engine_string(engine_str: str) -> EngineSpec:
# 1. Extract base components (displacement, config, cylinders)
match = re.match(ENGINE_PATTERN, engine_str)
displacement = float(match.group(1))
config = normalize_configuration(match.group(2)) # L→I here
cylinders = int(match.group(3))
# 2. Detect fuel type and aspiration from modifiers
fuel_type = extract_fuel_type(engine_str)
aspiration = extract_aspiration(engine_str)
return EngineSpec(
displacement_l=displacement,
configuration=config,
cylinders=cylinders,
fuel_type=fuel_type,
aspiration=aspiration,
raw_string=engine_str
)
def normalize_configuration(config: str) -> str:
"""CRITICAL: Convert L to I"""
return 'I' if config == 'L' else config
Fuel Type Detection
Hybrid Classifications
| Pattern | Database Value | Description |
|---|---|---|
"PLUG-IN HYBRID EV- (PHEV)" |
"Plug-in Hybrid" |
Plug-in hybrid electric |
"FULL HYBRID EV- (FHEV)" |
"Full Hybrid" |
Full hybrid electric |
"HYBRID" |
"Hybrid" |
General hybrid |
Other Fuel Types
| Pattern | Database Value | Description |
|---|---|---|
"FLEX" |
"Flex Fuel" |
Flex-fuel capability |
"ELECTRIC" |
"Electric" |
Pure electric |
| No modifier | "Gasoline" |
Default assumption |
Aspiration Detection
Forced Induction
| Pattern | Database Value | Description |
|---|---|---|
"TURBO" |
"Turbocharged" |
Turbocharged engine |
"SUPERCHARGED" |
"Supercharged" |
Supercharged engine |
"SC" |
"Supercharged" |
Supercharged (short form) |
| No modifier | "Natural" |
Naturally aspirated |
Real-World Examples
Standard Engines
Input: "2.0L I4"
Output: EngineSpec(
displacement_l=2.0,
configuration="I",
cylinders=4,
fuel_type="Gasoline",
aspiration="Natural",
raw_string="2.0L I4"
)
L→I Normalization Example
Input: "1.5L L3 PLUG-IN HYBRID EV- (PHEV)"
Output: EngineSpec(
displacement_l=1.5,
configuration="I", # L normalized to I
cylinders=3,
fuel_type="Plug-in Hybrid",
aspiration="Natural",
raw_string="1.5L L3 PLUG-IN HYBRID EV- (PHEV)"
)
Subaru Boxer Engine
Input: "2.4L H4"
Output: EngineSpec(
displacement_l=2.4,
configuration="H", # Horizontal/Boxer
cylinders=4,
fuel_type="Gasoline",
aspiration="Natural",
raw_string="2.4L H4"
)
Flex Fuel Engine
Input: "5.6L V8 FLEX"
Output: EngineSpec(
displacement_l=5.6,
configuration="V",
cylinders=8,
fuel_type="Flex Fuel",
aspiration="Natural",
raw_string="5.6L V8 FLEX"
)
Electric Vehicle Handling
Empty Engines Arrays
When engines: [] is found (common in Tesla, Lucid):
def create_default_electric_engine() -> EngineSpec:
return EngineSpec(
displacement_l=None, # N/A for electric
configuration="Electric", # Special designation
cylinders=None, # N/A for electric
fuel_type="Electric",
aspiration=None, # N/A for electric
raw_string="Electric Motor"
)
Electric Motor Naming
Default name: "Electric Motor"
Error Handling
Unparseable Engines
For engines that don't match standard patterns:
- Log warning with original string
- Create fallback engine with raw_string preserved
- Continue processing (don't fail entire make)
def create_fallback_engine(raw_string: str) -> EngineSpec:
return EngineSpec(
displacement_l=None,
configuration="Unknown",
cylinders=None,
fuel_type="Unknown",
aspiration="Natural",
raw_string=raw_string
)
Validation Rules
- Displacement: Must be positive number if present
- Configuration: Must be I, V, H, or Electric
- Cylinders: Must be positive integer if present
- Required: At least raw_string must be preserved
Database Storage
Engine Table Mapping
INSERT INTO vehicles.engine (
name, -- Original string or "Electric Motor"
code, -- NULL (not available in JSON)
displacement_l, -- Parsed displacement
cylinders, -- Parsed cylinder count
fuel_type, -- Parsed or "Gasoline" default
aspiration -- Parsed or "Natural" default
)
Example Database Records
-- Standard engine
('2.0L I4', NULL, 2.0, 4, 'Gasoline', 'Natural')
-- L→I normalized
('1.5L I3', NULL, 1.5, 3, 'Plug-in Hybrid', 'Natural')
-- Electric vehicle
('Electric Motor', NULL, NULL, NULL, 'Electric', NULL)
-- Subaru Boxer
('2.4L H4', NULL, 2.4, 4, 'Gasoline', 'Natural')
Testing Requirements
Unit Test Cases
- L→I normalization:
"1.5L L3"→configuration="I" - Hybrid detection: All PHEV, FHEV, HYBRID patterns
- Configuration types: I, V, H preservation
- Electric vehicles: Empty array handling
- Error cases: Unparseable strings
- Edge cases: Missing displacement, unusual formats
Integration Test Cases
- Real JSON data: Process actual make files
- Database storage: Verify correct database records
- API compatibility: Ensure dropdown endpoints work
- Performance: Parse 1000+ engines efficiently
Future Considerations
Potential Enhancements
- Turbo detection: More sophisticated forced induction parsing
- Engine codes: Extract manufacturer engine codes where available
- Performance specs: Parse horsepower/torque if present in future data
- Validation: Cross-reference with automotive databases
Backwards Compatibility
- MSSQL pipeline: Must continue working unchanged
- API responses: Same format regardless of data source
- Database schema: No breaking changes required