fix: before admin stations removal
This commit is contained in:
514
data/vehicle-etl/logical-plotting-hartmanis.md
Normal file
514
data/vehicle-etl/logical-plotting-hartmanis.md
Normal file
@@ -0,0 +1,514 @@
|
||||
# vPIC ETL Implementation Plan v2
|
||||
|
||||
## Overview
|
||||
|
||||
Extract vehicle dropdown data from NHTSA vPIC database for MY2022+ to supplement existing VehAPI data. This revised plan uses a make-specific extraction approach with proper VIN schema parsing.
|
||||
|
||||
## Key Changes from v1
|
||||
|
||||
1. **Limit to VehAPI makes only** - Only extract the 48 makes that exist in VehAPI data
|
||||
2. **VIN schema-based extraction** - Extract directly from VIN patterns, not defs_model
|
||||
3. **Proper field formatting** - Match VehAPI display string formats
|
||||
4. **Make-specific logic** - Handle different manufacturers' data patterns
|
||||
|
||||
## Critical Discovery: WMI Linkage
|
||||
|
||||
**Must use `wmi_make` junction table (many-to-many), NOT `wmi.makeid` (one-to-many):**
|
||||
```sql
|
||||
-- CORRECT: via wmi_make (finds all makes including Toyota, Hyundai, etc.)
|
||||
FROM vpic.make m
|
||||
JOIN vpic.wmi_make wm ON wm.makeid = m.id
|
||||
JOIN vpic.wmi w ON w.id = wm.wmiid
|
||||
|
||||
-- WRONG: via wmi.makeid (misses many major brands)
|
||||
FROM vpic.make m
|
||||
JOIN vpic.wmi w ON w.makeid = m.id
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Make Availability Summary
|
||||
|
||||
| Status | Count | Makes |
|
||||
|--------|-------|-------|
|
||||
| **Available (2022+ schemas)** | 46 | See table below |
|
||||
| **No 2022+ data** | 2 | Hummer (discontinued 2010), Scion (discontinued 2016) |
|
||||
|
||||
---
|
||||
|
||||
## Per-Make Analysis
|
||||
|
||||
### Group 1: Japanese Manufacturers (Honda/Acura, Toyota/Lexus, Nissan/Infiniti)
|
||||
|
||||
| Make | VehAPI Name | vPIC Name | Schemas (2022+) | Status |
|
||||
|------|-------------|-----------|-----------------|--------|
|
||||
| Acura | Acura | Acura | 48 | Ready |
|
||||
| Honda | Honda | Honda | 238 | Ready |
|
||||
| Lexus | Lexus | Lexus | 90 | Ready |
|
||||
| Toyota | Toyota | Toyota | 152 | Ready |
|
||||
| Infiniti | INFINITI | Infiniti | 76 | Ready |
|
||||
| Nissan | Nissan | Nissan | 85 | Ready |
|
||||
| Mazda | Mazda | Mazda | 37 | Ready |
|
||||
| Mitsubishi | Mitsubishi | Mitsubishi | 11 | Ready |
|
||||
| Subaru | Subaru | Subaru | 75 | Ready |
|
||||
| Isuzu | Isuzu | Isuzu | 11 | Ready |
|
||||
|
||||
### Group 2: Korean Manufacturers (Hyundai/Kia/Genesis)
|
||||
|
||||
| Make | VehAPI Name | vPIC Name | Schemas (2022+) | Status |
|
||||
|------|-------------|-----------|-----------------|--------|
|
||||
| Genesis | Genesis | Genesis | 74 | Ready |
|
||||
| Hyundai | Hyundai | Hyundai | 177 | Ready |
|
||||
| Kia | Kia | Kia | 72 | Ready |
|
||||
|
||||
### Group 3: American - GM (Chevrolet, GMC, Buick, Cadillac)
|
||||
|
||||
| Make | VehAPI Name | vPIC Name | Schemas (2022+) | Status |
|
||||
|------|-------------|-----------|-----------------|--------|
|
||||
| Buick | Buick | Buick | 20 | Ready |
|
||||
| Cadillac | Cadillac | Cadillac | 50 | Ready |
|
||||
| Chevrolet | Chevrolet | Chevrolet | 185 | Ready |
|
||||
| GMC | GMC | GMC | 107 | Ready |
|
||||
| Oldsmobile | Oldsmobile | Oldsmobile | 1 | Limited |
|
||||
| Pontiac | Pontiac | Pontiac | 5 | Limited (2022-2024) |
|
||||
|
||||
### Group 4: American - Stellantis (Chrysler, Dodge, Jeep, Ram, Fiat)
|
||||
|
||||
| Make | VehAPI Name | vPIC Name | Schemas (2022+) | Status |
|
||||
|------|-------------|-----------|-----------------|--------|
|
||||
| Chrysler | Chrysler | Chrysler | 81 | Ready |
|
||||
| Dodge | Dodge | Dodge | 86 | Ready |
|
||||
| FIAT | FIAT | Fiat | 91 | Ready (case diff) |
|
||||
| Jeep | Jeep | Jeep | 81 | Ready |
|
||||
| RAM | RAM | Ram | 81 | Ready (case diff) |
|
||||
| Plymouth | Plymouth | Plymouth | 4 | Limited |
|
||||
|
||||
### Group 5: American - Ford
|
||||
|
||||
| Make | VehAPI Name | vPIC Name | Schemas (2022+) | Status |
|
||||
|------|-------------|-----------|-----------------|--------|
|
||||
| Ford | Ford | Ford | 108 | Ready |
|
||||
| Lincoln | Lincoln | Lincoln | 21 | Ready |
|
||||
| Mercury | Mercury | Mercury | 0 | No data (discontinued 2011) |
|
||||
|
||||
### Group 6: American - EV Startups
|
||||
|
||||
| Make | VehAPI Name | vPIC Name | Schemas (2022+) | Status |
|
||||
|------|-------------|-----------|-----------------|--------|
|
||||
| Polestar | Polestar | Polestar | 12 | Ready |
|
||||
| Rivian | Rivian | RIVIAN | 10 | Ready (case diff) |
|
||||
| Tesla | Tesla | Tesla | 14 | Ready |
|
||||
|
||||
### Group 7: German Manufacturers
|
||||
|
||||
| Make | VehAPI Name | vPIC Name | Schemas (2022+) | Status |
|
||||
|------|-------------|-----------|-----------------|--------|
|
||||
| Audi | Audi | Audi | 55 | Ready |
|
||||
| BMW | BMW | BMW | 61 | Ready |
|
||||
| Mercedes-Benz | Mercedes-Benz | Mercedes-Benz | 39 | Ready |
|
||||
| MINI | MINI | MINI | 10 | Ready |
|
||||
| Porsche | Porsche | Porsche | 23 | Ready |
|
||||
| smart | smart | smart | 5 | Ready |
|
||||
| Volkswagen | Volkswagen | Volkswagen | 134 | Ready |
|
||||
|
||||
### Group 8: European Luxury
|
||||
|
||||
| Make | VehAPI Name | vPIC Name | Schemas (2022+) | Status |
|
||||
|------|-------------|-----------|-----------------|--------|
|
||||
| Bentley | Bentley | Bentley | 48 | Ready |
|
||||
| Ferrari | Ferrari | Ferrari | 9 | Ready |
|
||||
| Jaguar | Jaguar | Jaguar | 17 | Ready |
|
||||
| Lamborghini | Lamborghini | Lamborghini | 10 | Ready |
|
||||
| Lotus | Lotus | Lotus | 5 | Ready |
|
||||
| Maserati | Maserati | Maserati | 19 | Ready |
|
||||
| McLaren | McLaren | McLaren | 4 | Ready |
|
||||
| Volvo | Volvo | Volvo | 80 | Ready |
|
||||
|
||||
### Group 9: Discontinued (No 2022+ Data)
|
||||
|
||||
| Make | VehAPI Name | Reason | Action |
|
||||
|------|-------------|--------|--------|
|
||||
| Hummer | Hummer | Discontinued 2010 (new EV under GMC) | Skip - use existing VehAPI |
|
||||
| Scion | Scion | Discontinued 2016 | Skip - use existing VehAPI |
|
||||
| Saab | Saab | Discontinued 2012 | Limited schemas (9) |
|
||||
| Mercury | Mercury | Discontinued 2011 | No schemas |
|
||||
|
||||
---
|
||||
|
||||
## Extraction Architecture
|
||||
|
||||
### Data Flow
|
||||
```
|
||||
vPIC VIN Schemas → Pattern Extraction → Format Transformation → SQLite Pairs
|
||||
↓
|
||||
Filter by:
|
||||
- 48 VehAPI makes only
|
||||
- Year >= 2022
|
||||
- Vehicle types (exclude motorcycles, trailers, buses)
|
||||
```
|
||||
|
||||
### Core Query Strategy
|
||||
|
||||
For each allowed make:
|
||||
1. Find WMIs linked to that make
|
||||
2. Get VIN schemas for years 2022+
|
||||
3. Extract from patterns:
|
||||
- Model (from schema name or pattern)
|
||||
- Trim (Element: Trim)
|
||||
- Displacement (Element: DisplacementL)
|
||||
- Horsepower (Element: EngineHP)
|
||||
- Cylinders (Element: EngineCylinders)
|
||||
- Engine Config (Element: EngineConfiguration)
|
||||
- Transmission Style (Element: TransmissionStyle)
|
||||
- Transmission Speeds (Element: TransmissionSpeeds)
|
||||
|
||||
---
|
||||
|
||||
## Acura Extraction Template
|
||||
|
||||
This pattern applies to Honda/Acura and similar well-structured manufacturers.
|
||||
|
||||
### Sample VIN Schema: Acura MDX 2025 (schema_id: 26929)
|
||||
|
||||
| Element | Code | Values |
|
||||
|---------|------|--------|
|
||||
| Trim | Trim | MDX, Technology, SH-AWD, SH-AWD Technology, SH-AWD A-Spec, SH-AWD Advance, SH-AWD A-Spec Advance, SH-AWD TYPE S ADVANCE |
|
||||
| Displacement | DisplacementL | 3.5, 3.0 |
|
||||
| Horsepower | EngineHP | 290, 355 |
|
||||
| Cylinders | EngineCylinders | 6 |
|
||||
| Engine Config | EngineConfiguration | V-Shaped |
|
||||
| Trans Style | TransmissionStyle | Automatic |
|
||||
| Trans Speeds | TransmissionSpeeds | 10 |
|
||||
|
||||
### Output Format
|
||||
|
||||
**Engine Display** (match VehAPI):
|
||||
```
|
||||
{DisplacementL}L {EngineHP} hp V{EngineCylinders}
|
||||
→ "3.5L 290 hp V6"
|
||||
```
|
||||
|
||||
**Transmission Display** (match VehAPI):
|
||||
```
|
||||
{TransmissionSpeeds}-Speed {TransmissionStyle}
|
||||
→ "10-Speed Automatic"
|
||||
```
|
||||
|
||||
### Extraction SQL Template
|
||||
|
||||
```sql
|
||||
WITH schema_data AS (
|
||||
SELECT DISTINCT
|
||||
vs.id AS schema_id,
|
||||
vs.name AS schema_name,
|
||||
wvs.yearfrom,
|
||||
COALESCE(wvs.yearto, 2027) AS yearto,
|
||||
m.name AS make_name
|
||||
FROM vpic.wmi w
|
||||
JOIN vpic.make m ON w.makeid = m.id
|
||||
JOIN vpic.wmi_vinschema wvs ON w.id = wvs.wmiid
|
||||
JOIN vpic.vinschema vs ON wvs.vinschemaid = vs.id
|
||||
WHERE LOWER(m.name) IN ('acura', 'honda', ...) -- VehAPI makes
|
||||
AND wvs.yearfrom >= 2022 OR (wvs.yearto >= 2022)
|
||||
),
|
||||
trim_data AS (
|
||||
SELECT DISTINCT sd.schema_id, p.attributeid AS trim
|
||||
FROM schema_data sd
|
||||
JOIN vpic.pattern p ON p.vinschemaid = sd.schema_id
|
||||
JOIN vpic.element e ON p.elementid = e.id
|
||||
WHERE e.code = 'Trim'
|
||||
),
|
||||
engine_data AS (
|
||||
SELECT DISTINCT
|
||||
sd.schema_id,
|
||||
MAX(CASE WHEN e.code = 'DisplacementL' THEN p.attributeid END) AS displacement,
|
||||
MAX(CASE WHEN e.code = 'EngineHP' THEN p.attributeid END) AS hp,
|
||||
MAX(CASE WHEN e.code = 'EngineCylinders' THEN p.attributeid END) AS cylinders,
|
||||
MAX(CASE WHEN e.code = 'EngineConfiguration' THEN ec.name END) AS config
|
||||
FROM schema_data sd
|
||||
JOIN vpic.pattern p ON p.vinschemaid = sd.schema_id
|
||||
JOIN vpic.element e ON p.elementid = e.id
|
||||
LEFT JOIN vpic.engineconfiguration ec ON e.code = 'EngineConfiguration'
|
||||
AND p.attributeid ~ '^[0-9]+$' AND ec.id = CAST(p.attributeid AS INT)
|
||||
WHERE e.code IN ('DisplacementL', 'EngineHP', 'EngineCylinders', 'EngineConfiguration')
|
||||
GROUP BY sd.schema_id, p.keys -- Group by VIN pattern position
|
||||
),
|
||||
trans_data AS (
|
||||
SELECT DISTINCT
|
||||
sd.schema_id,
|
||||
t.name AS style,
|
||||
MAX(CASE WHEN e.code = 'TransmissionSpeeds' THEN p.attributeid END) AS speeds
|
||||
FROM schema_data sd
|
||||
JOIN vpic.pattern p ON p.vinschemaid = sd.schema_id
|
||||
JOIN vpic.element e ON p.elementid = e.id
|
||||
LEFT JOIN vpic.transmission t ON e.code = 'TransmissionStyle'
|
||||
AND p.attributeid ~ '^[0-9]+$' AND t.id = CAST(p.attributeid AS INT)
|
||||
WHERE e.code IN ('TransmissionStyle', 'TransmissionSpeeds')
|
||||
GROUP BY sd.schema_id, t.name
|
||||
)
|
||||
SELECT ...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Allowed Makes (48 from VehAPI)
|
||||
|
||||
```python
|
||||
ALLOWED_MAKES = [
|
||||
'Acura', 'Audi', 'Bentley', 'BMW', 'Buick', 'Cadillac', 'Chevrolet',
|
||||
'Chrysler', 'Dodge', 'Ferrari', 'FIAT', 'Ford', 'Genesis', 'GMC',
|
||||
'Honda', 'Hummer', 'Hyundai', 'INFINITI', 'Isuzu', 'Jaguar', 'Jeep',
|
||||
'Kia', 'Lamborghini', 'Lexus', 'Lincoln', 'Lotus', 'Maserati', 'Mazda',
|
||||
'McLaren', 'Mercedes-Benz', 'Mercury', 'MINI', 'Mitsubishi', 'Nissan',
|
||||
'Oldsmobile', 'Plymouth', 'Polestar', 'Pontiac', 'Porsche', 'RAM',
|
||||
'Rivian', 'Saab', 'Scion', 'smart', 'Subaru', 'Tesla', 'Toyota',
|
||||
'Volkswagen', 'Volvo'
|
||||
]
|
||||
```
|
||||
|
||||
Note: Some makes may have different names in vPIC (case variations, abbreviations).
|
||||
|
||||
---
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
### Phase 1: Rewrite vpic_extract.py
|
||||
|
||||
**File:** `vpic_extract.py`
|
||||
|
||||
Core extraction query (uses wmi_make junction table):
|
||||
```sql
|
||||
WITH base AS (
|
||||
SELECT DISTINCT
|
||||
m.name AS make_name,
|
||||
vs.id AS schema_id,
|
||||
vs.name AS schema_name,
|
||||
generate_series(
|
||||
GREATEST(wvs.yearfrom, 2022),
|
||||
COALESCE(wvs.yearto, EXTRACT(YEAR FROM NOW()) + 2)
|
||||
)::INT AS year
|
||||
FROM vpic.make m
|
||||
JOIN vpic.wmi_make wm ON wm.makeid = m.id
|
||||
JOIN vpic.wmi w ON w.id = wm.wmiid
|
||||
JOIN vpic.wmi_vinschema wvs ON w.id = wvs.wmiid
|
||||
JOIN vpic.vinschema vs ON wvs.vinschemaid = vs.id
|
||||
WHERE LOWER(m.name) IN ({allowed_makes})
|
||||
AND (wvs.yearfrom >= 2022 OR wvs.yearto >= 2022)
|
||||
)
|
||||
SELECT ...
|
||||
```
|
||||
|
||||
**Key functions to implement:**
|
||||
1. `extract_model_from_schema_name(schema_name)` - Parse "Acura MDX Schema..." → "MDX"
|
||||
2. `get_schema_patterns(schema_id)` - Get all pattern data for a schema
|
||||
3. `format_engine_display(disp, hp, cyl, config)` - Format as "3.5L 290 hp V6"
|
||||
4. `format_trans_display(style, speeds)` - Format as "10-Speed Automatic"
|
||||
5. `generate_trans_records(has_data, style, speeds)` - Return 1 or 2 records
|
||||
|
||||
**Make name normalization:**
|
||||
```python
|
||||
MAKE_MAPPING = {
|
||||
'INFINITI': 'INFINITI', # VehAPI uses all-caps
|
||||
'FIAT': 'FIAT',
|
||||
'RAM': 'RAM',
|
||||
'RIVIAN': 'Rivian', # vPIC uses all-caps, normalize
|
||||
# ... etc
|
||||
}
|
||||
```
|
||||
|
||||
### Phase 2: Test Extraction
|
||||
|
||||
Test with validated VINs:
|
||||
```bash
|
||||
source .venv/bin/activate
|
||||
python3 vpic_extract.py --test-vin 5J8YE1H05SL018611 # Acura MDX
|
||||
python3 vpic_extract.py --test-vin 5TFJA5DB4SX327537 # Toyota Tundra
|
||||
python3 vpic_extract.py --test-vin 3GTUUFEL6PG140748 # GMC Sierra
|
||||
```
|
||||
|
||||
### Phase 3: Full Extraction
|
||||
|
||||
```bash
|
||||
python3 vpic_extract.py --min-year 2022 --output-dir snapshots/vpic-2025-12
|
||||
```
|
||||
|
||||
### Phase 4: Merge & Import
|
||||
|
||||
```bash
|
||||
# Merge vPIC with existing VehAPI data
|
||||
sqlite3 snapshots/merged/snapshot.sqlite "
|
||||
CREATE TABLE pairs(...);
|
||||
ATTACH 'snapshots/vehicle-drop-down.sqlite' AS db1;
|
||||
ATTACH 'snapshots/vpic-2025-12/snapshot.sqlite' AS db2;
|
||||
INSERT OR IGNORE INTO pairs SELECT * FROM db1.pairs WHERE year < 2022;
|
||||
INSERT OR IGNORE INTO pairs SELECT * FROM db2.pairs;
|
||||
"
|
||||
|
||||
# Generate SQL and import
|
||||
python3 etl_generate_sql.py --snapshot-path snapshots/merged/snapshot.sqlite
|
||||
./import_data.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Files to Modify
|
||||
|
||||
| File | Changes |
|
||||
|------|---------|
|
||||
| `vpic_extract.py` | Complete rewrite: VIN schema extraction, dual-record trans logic |
|
||||
| `README.md` | Already updated with workflow |
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
1. Extract all 41 makes with 2022+ VIN schemas
|
||||
2. ~2,500-5,000 unique vehicle configurations (Year/Make/Model/Trim/Engine)
|
||||
3. Transmission: Use vPIC data where available (7 makes), dual-record elsewhere
|
||||
4. Output format matches VehAPI: "3.5L 290 hp V6" / "10-Speed Automatic"
|
||||
5. Merge preserves 2015-2021 VehAPI data
|
||||
6. QA validation passes after import
|
||||
|
||||
---
|
||||
|
||||
## Make Analysis Status (All Families Validated)
|
||||
|
||||
| Family | Makes | Status | Trans Data | Strategy |
|
||||
|--------|-------|--------|------------|----------|
|
||||
| Honda/Acura | Acura, Honda | VALIDATED | YES (93-97%) | Use vPIC trans data |
|
||||
| Toyota/Lexus | Toyota, Lexus | VALIDATED | PARTIAL (Toyota 23%, Lexus 0%) | Dual-record for Lexus |
|
||||
| Nissan/Infiniti | Nissan, Infiniti, Mitsubishi | VALIDATED | LOW (5%) | Dual-record |
|
||||
| GM | Chevrolet, GMC, Buick, Cadillac | VALIDATED | LOW (0-7%) | Dual-record |
|
||||
| Stellantis | Chrysler, Dodge, Jeep, Ram, Fiat | VALIDATED | NONE (0%) | Dual-record |
|
||||
| Ford | Ford, Lincoln | VALIDATED | NONE (0%) | Dual-record |
|
||||
| VW Group | Volkswagen, Audi, Porsche, Bentley, Lamborghini | VALIDATED | MIXED (0-84%) | VW/Audi use vPIC; others dual-record |
|
||||
| BMW | BMW, MINI | VALIDATED | NONE (0%) | Dual-record |
|
||||
| Mercedes | Mercedes-Benz, smart | VALIDATED | YES (52%) | Use vPIC trans data |
|
||||
| Hyundai/Kia/Genesis | Hyundai, Kia, Genesis | VALIDATED | NONE (0%) | Dual-record |
|
||||
| Subaru | Subaru | VALIDATED | YES (64%) | Use vPIC trans data |
|
||||
| Mazda | Mazda | VALIDATED | LOW (11%) | Dual-record |
|
||||
| Volvo | Volvo, Polestar | VALIDATED | LOW (3%/0%) | Dual-record |
|
||||
| Exotics | Ferrari, Maserati, Jaguar, Lotus, McLaren | VALIDATED | MIXED | Per-make handling |
|
||||
| EV | Tesla, Rivian | VALIDATED | NONE (0%) | Dual-record (though EVs don't have "manual") |
|
||||
|
||||
### Special Cases
|
||||
|
||||
1. **Electric Vehicles** (Tesla, Rivian, Polestar): Don't have manual transmissions
|
||||
- Still create dual-record for consistency with dropdown
|
||||
- User can select "Automatic" (single-speed EV)
|
||||
|
||||
2. **Luxury Exotics** (Ferrari, Lamborghini, etc.): Mix of automated manual/DCT
|
||||
- Dual-record covers all options
|
||||
|
||||
---
|
||||
|
||||
## CRITICAL FINDING: Transmission Data Availability
|
||||
|
||||
**Most manufacturers do NOT encode transmission info in VINs.**
|
||||
|
||||
### VIN Decode Validation Results (12 Families)
|
||||
|
||||
| Family | VIN | Make | Model | Year | Trim | Engine | Trans |
|
||||
|--------|-----|------|-------|------|------|--------|-------|
|
||||
| Honda/Acura | 5J8YE1H05SL018611 | ACURA | MDX | 2025 | SH-AWD A-Spec | 3.5L V6 290hp | 10-Spd Auto |
|
||||
| Honda/Acura | 2HGFE4F88SH315466 | HONDA | Civic | 2025 | Sport Hybrid | 2.0L I4 141hp | e-CVT |
|
||||
| Toyota/Lexus | 5TFJA5DB4SX327537 | TOYOTA | Tundra | 2025 | Limited | 3.4L V6 389hp | 10-Spd Auto |
|
||||
| Nissan/Infiniti | 5N1AL1FW9TC332353 | INFINITI | QX60 | 2026 | Luxe | 2.0L (no cyl/hp) | **MISSING** |
|
||||
| GM | 3GTUUFEL6PG140748 | GMC | Sierra | 2023 | AT4X | 6.2L V8 (no hp) | **MISSING** |
|
||||
| Stellantis | 1C4HJXEG7PW506480 | JEEP | Wrangler | 2023 | Sahara | 3.6L V6 285hp | **MISSING** |
|
||||
| Ford | 1FTFW4L59SFC03038 | FORD | F-150 | 2025 | Tremor | 5.0L V8 (no hp) | **MISSING** |
|
||||
| VW Group | WVWEB7CD9RW229116 | VOLKSWAGEN | Golf R | 2024 | **MISSING** | 2.0L 4cyl 315hp | Auto (no spd) |
|
||||
| BMW | 5YM13ET06R9S31554 | BMW | X5 | 2024 | X5 M Competition | 4.4L 8cyl 617hp | **MISSING** |
|
||||
| Mercedes | W1KAF4HB1SR287126 | MERCEDES-BENZ | C-Class | 2025 | C300 4MATIC | 2.0L I4 255hp | 9-Spd Auto |
|
||||
| Hyundai/Kia | 5XYRLDJC0SG336002 | KIA | Sorento | 2025 | S | 2.5L 4cyl 191hp | **MISSING** |
|
||||
| Subaru | JF1VBAF67P9806852 | SUBARU | WRX | 2023 | Premium | 2.4L 4cyl 271hp | 6-Spd Manual |
|
||||
| Mazda | JM3KFBCL3R0522361 | MAZDA | CX-5 | 2024 | Preferred Pkg | 2.5L I4 187hp | 6-Spd Auto |
|
||||
| Volvo | YV4M12RJ9S1094167 | VOLVO | XC60 | 2025 | Core | 2.0L 4cyl 247hp | 8-Spd Auto |
|
||||
|
||||
### Transmission Data Coverage in vPIC Schemas
|
||||
|
||||
| Coverage | Makes | Trans Schemas / Total |
|
||||
|----------|-------|----------------------|
|
||||
| **HIGH (>40%)** | Honda, Acura, Subaru, Audi, VW, Mercedes, Jaguar | 225/233, 42/45, 47/74, 46/55, 47/132, 13/25, 17/17 |
|
||||
| **LOW (<10%)** | Chevrolet, Cadillac, Nissan, Infiniti, Mazda, Volvo | 4/164, 7/43, 4/82, 4/74, 4/36, 2/72 |
|
||||
| **NONE (0%)** | GMC, Buick, Ford, Lincoln, Jeep, Dodge, Chrysler, Ram, Fiat, BMW, MINI, Porsche, Hyundai, Kia, Genesis, Lexus, Tesla, Rivian, Polestar | 0% |
|
||||
|
||||
### Makes WITHOUT Transmission Data (22 of 41 makes = 54%)
|
||||
- **ALL Stellantis**: Chrysler, Dodge, Jeep, Ram, Fiat
|
||||
- **ALL Ford**: Ford, Lincoln
|
||||
- **ALL Korean**: Hyundai, Kia, Genesis
|
||||
- **ALL BMW Group**: BMW, MINI
|
||||
- **GM (partial)**: GMC, Buick (Chevy/Cadillac have minimal)
|
||||
- **Others**: Lexus, Porsche, Bentley, Lamborghini, Tesla, Rivian, Polestar
|
||||
|
||||
---
|
||||
|
||||
## Extraction Strategy (SELECTED)
|
||||
|
||||
### Dual-Record Strategy for Missing Transmission Data
|
||||
|
||||
When transmission data is NOT available from vPIC:
|
||||
- **Create TWO records** for each vehicle configuration
|
||||
- One with `trans_display = "Automatic"`, `trans_canon = "automatic"`
|
||||
- One with `trans_display = "Manual"`, `trans_canon = "manual"`
|
||||
|
||||
This ensures:
|
||||
- All transmission options available in dropdown for user selection
|
||||
- User can select the correct transmission type
|
||||
- No false "Unknown" values that break filtering
|
||||
|
||||
### Implementation Logic
|
||||
|
||||
```python
|
||||
def generate_trans_records(has_trans_data: bool, trans_style: str, trans_speeds: str):
|
||||
if has_trans_data:
|
||||
# Use actual vPIC data
|
||||
return [(format_trans_display(trans_style, trans_speeds),
|
||||
canonicalize_trans(trans_style))]
|
||||
else:
|
||||
# Generate both options
|
||||
return [
|
||||
("Automatic", "automatic"),
|
||||
("Manual", "manual")
|
||||
]
|
||||
```
|
||||
|
||||
### Expected Output Growth
|
||||
|
||||
For makes without trans data, record count approximately doubles:
|
||||
- GMC Sierra AT4X + 6.2L V8 → 2 records (Auto + Manual)
|
||||
- Ford F-150 Tremor + 5.0L V8 → 2 records (Auto + Manual)
|
||||
|
||||
This is acceptable as it provides complete dropdown coverage.
|
||||
|
||||
---
|
||||
|
||||
## Validated Extraction Examples
|
||||
|
||||
### Acura MDX 2025 (VIN: 5J8YE1H05SL018611)
|
||||
- **vPIC**: Make=ACURA, Model=MDX, Trim=SH-AWD A-Spec, Engine=3.5L V6 290hp, Trans=10-Speed Automatic
|
||||
- **Output**: `3.5L 290 hp V6` | `10-Speed Automatic`
|
||||
|
||||
### Honda Civic 2025 (VIN: 2HGFE4F88SH315466)
|
||||
- **vPIC**: Make=HONDA, Model=Civic, Trim=Sport Hybrid / Sport Touring Hybrid, Engine=2L I4 141hp, Trans=e-CVT
|
||||
- **Output**: `2.0L 141 hp I4` | `Electronic Continuously Variable (e-CVT)`
|
||||
|
||||
### Toyota Tundra 2025 (VIN: 5TFJA5DB4SX327537)
|
||||
- **vPIC**: Make=TOYOTA, Model=Tundra, Trim=Limited, Engine=3.4L V6 389hp, Trans=10-Speed Automatic
|
||||
- **Output**: `3.4L 389 hp V6` | `10-Speed Automatic`
|
||||
|
||||
### Mercedes C-Class 2025 (VIN: W1KAF4HB1SR287126)
|
||||
- **vPIC**: Make=MERCEDES-BENZ, Model=C-Class, Trim=C300 4MATIC, Engine=2.0L I4 255hp, Trans=9-Speed Automatic
|
||||
- **Output**: `2.0L 255 hp I4` | `9-Speed Automatic`
|
||||
|
||||
### Subaru WRX 2023 (VIN: JF1VBAF67P9806852)
|
||||
- **vPIC**: Make=SUBARU, Model=WRX, Trim=Premium, Engine=2.4L 4cyl 271hp, Trans=6-Speed Manual
|
||||
- **Output**: `2.4L 271 hp 4cyl` | `6-Speed Manual`
|
||||
|
||||
### Mazda CX-5 2024 (VIN: JM3KFBCL3R0522361)
|
||||
- **vPIC**: Make=MAZDA, Model=CX-5, Trim=Preferred Package, Engine=2.5L I4 187hp, Trans=6-Speed Automatic
|
||||
- **Output**: `2.5L 187 hp I4` | `6-Speed Automatic`
|
||||
|
||||
### Volvo XC60 2025 (VIN: YV4M12RJ9S1094167)
|
||||
- **vPIC**: Make=VOLVO, Model=XC60, Trim=Core, Engine=2.0L 4cyl 247hp, Trans=8-Speed Automatic
|
||||
- **Output**: `2.0L 247 hp 4cyl` | `8-Speed Automatic`
|
||||
Reference in New Issue
Block a user