fix: ETL vehicle db import fixes
This commit is contained in:
@@ -1,40 +1,84 @@
|
||||
Step 1: Fetch Data from VehAPI
|
||||
# Vehicle Catalog Data Export
|
||||
|
||||
cd data/vehicle-etl
|
||||
python3 vehapi_fetch_snapshot.py --min-year 2015 --max-year 2025
|
||||
Export the current vehicle catalog database to SQL files for GitLab CI/CD deployment.
|
||||
|
||||
Options:
|
||||
| Flag | Default | Description |
|
||||
|---------------------|-------------------|------------------------|
|
||||
| --min-year | 2015 | Start year |
|
||||
| --max-year | 2022 | End year |
|
||||
| --rate-per-min | 55 | API rate limit |
|
||||
| --snapshot-dir | snapshots/<today> | Output directory |
|
||||
| --no-response-cache | false | Disable resume caching |
|
||||
## Export Workflow
|
||||
|
||||
Output: Creates snapshots/<date>/snapshot.sqlite
|
||||
### Export from Running Database
|
||||
|
||||
---
|
||||
Step 2: Generate SQL Files
|
||||
```bash
|
||||
cd data/vehicle-etl
|
||||
python3 export_from_postgres.py
|
||||
```
|
||||
|
||||
python3 etl_generate_sql.py --snapshot-path snapshots/<date>/snapshot.sqlite
|
||||
**Output:** Creates output/01_engines.sql, output/02_transmissions.sql, output/03_vehicle_options.sql
|
||||
|
||||
Output: Creates output/01_engines.sql, output/02_transmissions.sql, output/03_vehicle_options.sql
|
||||
**Requirements:**
|
||||
- mvp-postgres container running
|
||||
- Python 3.7+
|
||||
|
||||
---
|
||||
Step 3: Import to PostgreSQL
|
||||
### Commit and Deploy
|
||||
|
||||
./import_data.sh
|
||||
```bash
|
||||
git add output/*.sql
|
||||
git commit -m "Update vehicle catalog data from PostgreSQL export"
|
||||
git push
|
||||
```
|
||||
|
||||
Requires: mvp-postgres container running, SQL files in output/
|
||||
GitLab CI/CD will automatically import these SQL files during deployment.
|
||||
|
||||
---
|
||||
Quick Test (single year)
|
||||
---
|
||||
|
||||
python3 vehapi_fetch_snapshot.py --min-year 2020 --max-year 2020
|
||||
## When to Export
|
||||
|
||||
# Full ETL workflow with cached results
|
||||
./reset_database.sh # Clear old data
|
||||
python3 etl_generate_sql.py --snapshot-path snapshots/*.sqlite # Generate SQL
|
||||
./import_data.sh # Import to Postgres
|
||||
docker compose exec mvp-redis redis-cli FLUSHALL # Flush Redis Cache for front end
|
||||
| Scenario | Action |
|
||||
|----------|--------|
|
||||
| Admin uploaded CSVs to database | Export and commit |
|
||||
| Manual corrections in PostgreSQL | Export and commit |
|
||||
| After adding new vehicle data | Export and commit |
|
||||
| Preparing for deployment | Export and commit |
|
||||
|
||||
---
|
||||
|
||||
## Local Testing
|
||||
|
||||
```bash
|
||||
# Export current database state
|
||||
python3 export_from_postgres.py
|
||||
|
||||
# Test import locally
|
||||
./reset_database.sh
|
||||
./import_data.sh
|
||||
docker compose exec mvp-redis redis-cli FLUSHALL
|
||||
|
||||
# Verify data
|
||||
docker exec mvp-postgres psql -U postgres -d motovaultpro -c "
|
||||
SELECT
|
||||
(SELECT COUNT(*) FROM engines) as engines,
|
||||
(SELECT COUNT(*) FROM transmissions) as transmissions,
|
||||
(SELECT COUNT(*) FROM vehicle_options) as vehicle_options,
|
||||
(SELECT MIN(year) FROM vehicle_options) as min_year,
|
||||
(SELECT MAX(year) FROM vehicle_options) as max_year;
|
||||
"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## GitLab CI/CD Integration
|
||||
|
||||
The pipeline automatically imports SQL files from `output/` directory during deployment (/.gitlab-ci.yml lines 89-98):
|
||||
- data/vehicle-etl/output/01_engines.sql
|
||||
- data/vehicle-etl/output/02_transmissions.sql
|
||||
- data/vehicle-etl/output/03_vehicle_options.sql
|
||||
|
||||
Commit updated SQL files to trigger deployment with new data.
|
||||
|
||||
---
|
||||
|
||||
## Legacy Scripts (Not Used)
|
||||
|
||||
The following scripts are legacy from the VehAPI integration and are no longer used:
|
||||
- vehapi_fetch_snapshot.py (obsolete - VehAPI not used)
|
||||
- etl_generate_sql.py (obsolete - database export used instead)
|
||||
|
||||
These scripts are preserved for historical reference but should not be executed.
|
||||
|
||||
Reference in New Issue
Block a user