Pipeline
Weather ETL
Coverage: 2019-01 to 2025-12 (from weather_monthly).
Built 2026-03-03 02:23 UTC ยท Commit defd5c8
Page Navigation
Data Provenance
flowchart LR
03_weather(["Weather ETL"])
f1_03_weather[/"data/noaa-weather/daily_raw.csv"/] --> 03_weather
a1_03_weather{"NOAA NCEI Daily Summaries API"} --> 03_weather
03_weather --> tp_weather_monthly[("weather_monthly")]
classDef page fill:#dbeafe,stroke:#1d4ed8,color:#1e3a8a,stroke-width:2px;
classDef table fill:#ecfeff,stroke:#0e7490,color:#164e63;
classDef dep fill:#fff7ed,stroke:#c2410c,color:#7c2d12,stroke-dasharray: 4 2;
classDef file fill:#eef2ff,stroke:#6366f1,color:#3730a3;
classDef api fill:#f0fdf4,stroke:#16a34a,color:#14532d;
classDef pipeline fill:#f5f3ff,stroke:#7c3aed,color:#4c1d95;
class 03_weather page;
class tp_weather_monthly table;
class f1_03_weather file;
class a1_03_weather api;
Findings
Findings: Weather ETL
Summary
Monthly weather features are loaded into prt.db and validated for overlap with OTP months.
Notes
- Uses cached raw weather CSV when present.
- Emits seasonality sanity checks and null-rate diagnostics.
Methods
Methods: Weather ETL
Question
How do we produce a monthly weather feature table aligned to OTP months?
Approach
- Fetch or read cached NOAA daily observations for Pittsburgh airport station.
- Aggregate daily metrics to monthly precipitation, snowfall, temperature, wind, and event-day counts.
- Rebuild
weather_monthlyinprt.db. - Verify temporal overlap with
otp_monthly.
Data
- NOAA daily summaries API (
PRCP,SNOW,SNWD,TMAX,TMIN,AWND) - Cached CSV under
data/noaa-weather/
Output
weather_monthlytable indata/prt.db
Tables Produced
| Table | Description |
|---|---|
weather_monthly |
Monthly precipitation, temperature, snowfall, and wind summary features. |
Sources
| Name | Type | Why It Matters | Owner | Freshness | Caveat |
|---|---|---|---|---|---|
| data/noaa-weather/daily_raw.csv | file | Cached NOAA daily observations for Pittsburgh station. | Local project data owner not specified. | Snapshot file; refresh by rerunning its pipeline step. | May lag upstream source updates. |
| NOAA NCEI Daily Summaries API | api | Daily weather observations for station USW00094823 (Pittsburgh). | Hosted by www.ncei.noaa.gov. | Queried during pipeline execution; freshness depends on upstream updates. | Availability and schema can change without notice. |