Vehicles Dropdown V2 - Manual JSON ETL Implementation
Overview
This directory contains comprehensive documentation for implementing manual JSON processing in the MVP Platform Vehicles ETL system. The goal is to add capability to process 55 JSON files containing vehicle data directly, bypassing the MSSQL source dependency.
Quick Start for AI Instances
Current State (As of Implementation Start)
- 55 JSON files exist in
mvp-platform-services/vehicles/etl/sources/makes/ - Current ETL only supports MSSQL → PostgreSQL pipeline
- Need to add JSON → PostgreSQL capability
Key Files to Load for Context
# Load these files for complete understanding
mvp-platform-services/vehicles/etl/sources/makes/toyota.json # Large file example
mvp-platform-services/vehicles/etl/sources/makes/tesla.json # Electric vehicle example
mvp-platform-services/vehicles/etl/pipeline.py # Current pipeline
mvp-platform-services/vehicles/etl/loaders/postgres_loader.py # Current loader
mvp-platform-services/vehicles/sql/schema/001_schema.sql # Target schema
Implementation Status
See 08-status-tracking.md for current progress.
Critical Requirements Discovered
1. Make Name Normalization
- JSON filenames:
alfa_romeo.json,land_rover.json - Database display:
"Alfa Romeo","Land Rover"(spaces, title case)
2. Engine Configuration Normalization
- CRITICAL:
L3→I3(L-configuration treated as Inline) - Standard format:
{displacement}L {config}{cylinders} {descriptions} - Examples:
"1.5L L3"→"1.5L I3","2.4L H4"(Subaru Boxer)
3. Hybrid/Electric Patterns Found
"PLUG-IN HYBRID EV- (PHEV)"- Plug-in hybrid"FULL HYBRID EV- (FHEV)"- Full hybrid"ELECTRIC"- Pure electric"FLEX"- Flex-fuel- Empty engines arrays for Tesla/electric vehicles
4. Transmission Limitation
- Manual selection only: Automatic/Manual choice
- No automatic detection from JSON data
Document Structure
| File | Purpose | Status |
|---|---|---|
| 01-analysis-findings.md | JSON data patterns analysis | ⏳ Pending |
| 02-implementation-plan.md | Technical roadmap | ⏳ Pending |
| 03-engine-spec-parsing.md | Engine parsing rules | ⏳ Pending |
| 04-make-name-mapping.md | Make name normalization | ⏳ Pending |
| 05-database-schema-updates.md | Schema change requirements | ⏳ Pending |
| 06-cli-commands.md | New CLI command design | ⏳ Pending |
| 07-testing-strategy.md | Testing and validation approach | ⏳ Pending |
| 08-status-tracking.md | Implementation progress tracker | ⏳ Pending |
AI Handoff Instructions
To Continue This Work:
- Read this README.md - Current state and critical requirements
- Check 08-status-tracking.md - See what's completed/in-progress
- Review 02-implementation-plan.md - Technical roadmap
- Load specific documentation based on what you're implementing
To Understand the Data:
- Load 01-analysis-findings.md - JSON structure analysis
- Load 03-engine-spec-parsing.md - Engine parsing rules
- Load 04-make-name-mapping.md - Make name conversion rules
To Start Coding:
- Check status tracker - See what needs to be implemented next
- Load implementation plan - Step-by-step technical guide
- Reference examples/ directory - Code samples and patterns
Success Criteria
- New CLI command:
python -m etl load-manual - Process all 55 JSON make files
- Proper make name normalization (
alfa_romeo.json→"Alfa Romeo") - Engine spec parsing with L→I normalization
- Clear/append mode support with duplicate handling
- Electric vehicle support (default engines for empty arrays)
- Integration with existing PostgreSQL schema
Architecture Integration
This feature integrates with:
- Existing ETL pipeline:
mvp-platform-services/vehicles/etl/ - PostgreSQL schema:
vehiclesschema with make/model/engine tables - Platform API: Hierarchical dropdown endpoints remain unchanged
- Application service: No changes required
Notes for Future Implementations
- Maintain compatibility with existing MSSQL pipeline
- Follow existing code patterns in
etl/directory - Use existing
PostgreSQLLoaderwhere possible - Preserve referential integrity during data loading