Pipeline

Weather ETL

Coverage: 2019-01 to 2025-12 (from weather_monthly).

Built 2026-03-03 02:23 UTC ยท Commit defd5c8

Page Navigation

Data Provenance

flowchart LR
  03_weather(["Weather ETL"])
  f1_03_weather[/"data/noaa-weather/daily_raw.csv"/] --> 03_weather
  a1_03_weather{"NOAA NCEI Daily Summaries API"} --> 03_weather
  03_weather --> tp_weather_monthly[("weather_monthly")]
  classDef page fill:#dbeafe,stroke:#1d4ed8,color:#1e3a8a,stroke-width:2px;
  classDef table fill:#ecfeff,stroke:#0e7490,color:#164e63;
  classDef dep fill:#fff7ed,stroke:#c2410c,color:#7c2d12,stroke-dasharray: 4 2;
  classDef file fill:#eef2ff,stroke:#6366f1,color:#3730a3;
  classDef api fill:#f0fdf4,stroke:#16a34a,color:#14532d;
  classDef pipeline fill:#f5f3ff,stroke:#7c3aed,color:#4c1d95;
  class 03_weather page;
  class tp_weather_monthly table;
  class f1_03_weather file;
  class a1_03_weather api;

Findings

Findings: Weather ETL

Summary

Monthly weather features are loaded into prt.db and validated for overlap with OTP months.

Notes

  • Uses cached raw weather CSV when present.
  • Emits seasonality sanity checks and null-rate diagnostics.

Methods

Methods: Weather ETL

Question

How do we produce a monthly weather feature table aligned to OTP months?

Approach

  1. Fetch or read cached NOAA daily observations for Pittsburgh airport station.
  2. Aggregate daily metrics to monthly precipitation, snowfall, temperature, wind, and event-day counts.
  3. Rebuild weather_monthly in prt.db.
  4. Verify temporal overlap with otp_monthly.

Data

  • NOAA daily summaries API (PRCP, SNOW, SNWD, TMAX, TMIN, AWND)
  • Cached CSV under data/noaa-weather/

Output

  • weather_monthly table in data/prt.db

Tables Produced

TableDescription
weather_monthly Monthly precipitation, temperature, snowfall, and wind summary features.

Sources

NameTypeWhy It MattersOwnerFreshnessCaveat
data/noaa-weather/daily_raw.csv file Cached NOAA daily observations for Pittsburgh station. Local project data owner not specified. Snapshot file; refresh by rerunning its pipeline step. May lag upstream source updates.
NOAA NCEI Daily Summaries API api Daily weather observations for station USW00094823 (Pittsburgh). Hosted by www.ncei.noaa.gov. Queried during pipeline execution; freshness depends on upstream updates. Availability and schema can change without notice.