Reference

Sources Inventory

Unified catalog of files, APIs, tables, and library dependencies used in this project.

Built 2026-03-03 02:23 UTC ยท Commit defd5c8

Inventory

Includes source ownership, freshness expectations, and caveats inferred from manifests.

NameTypeDescriptionOwnerFreshnessCaveat
NOAA NCEI Daily Summaries API api Daily weather observations for station USW00094823 (Pittsburgh). Hosted by www.ncei.noaa.gov. Queried during pipeline execution; freshness depends on upstream updates. Availability and schema can change without notice.
PennDOT ArcGIS Roadway Traffic Layer api Public roadway segment AADT and truck percentage attributes. Hosted by gis.penndot.gov. Queried during pipeline execution; freshness depends on upstream updates. Availability and schema can change without notice.
WPRDC Pick Lookup api Public lookup for pick period date ranges. Hosted by data.wprdc.org. Queried during pipeline execution; freshness depends on upstream updates. Availability and schema can change without notice.
WPRDC Schedule Monthly Aggregate api Public dataset of route-level monthly schedule aggregates. Hosted by data.wprdc.org. Queried during pipeline execution; freshness depends on upstream updates. Availability and schema can change without notice.
branca dependency Utility library used by Folium for map templating, colormaps, and HTML components. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.
folium dependency Mapping library used to render interactive geospatial visualizations. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.
matplotlib dependency Plotting library used to generate static charts. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.
numpy dependency Numerical computing library for vectorized arrays and matrix operations. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.
polars dependency Dataframe library used for fast tabular data transformations and aggregation. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.
scipy dependency Scientific computing library used for statistical tests and numerical routines. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.
statsmodels dependency Statistical modeling library used for regression and time-series methods. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.
data/GTFS/shapes.txt file GTFS route shape geometry points. Local project data owner not specified. Snapshot file; refresh by rerunning its pipeline step. May lag upstream source updates.
data/GTFS/trips.txt file GTFS shape-to-route mapping. Local project data owner not specified. Snapshot file; refresh by rerunning its pipeline step. May lag upstream source updates.
data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv file Current route metadata and mode classifications. Local project data owner not specified. Snapshot file; refresh by rerunning its pipeline step. May lag upstream source updates.
data/PRT_Stop_Reference_Lookup_Table.csv file Historical stop reference file with geography attributes. Local project data owner not specified. Snapshot file; refresh by rerunning its pipeline step. May lag upstream source updates.
data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv file Current stop-to-route coverage and trip counts. Local project data owner not specified. Snapshot file; refresh by rerunning its pipeline step. May lag upstream source updates.
data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv file Average ridership by route and month. Local project data owner not specified. Snapshot file; refresh by rerunning its pipeline step. May lag upstream source updates.
data/bus-stop-usage/wprdc_stop_data.csv file Referenced via DATA_DIR path composition in analysis script. Local project data owner not specified. Snapshot file; refresh by rerunning its pipeline step. May lag upstream source updates.
data/noaa-weather/daily_raw.csv file Cached NOAA daily observations for Pittsburgh station. Local project data owner not specified. Snapshot file; refresh by rerunning its pipeline step. May lag upstream source updates.
data/ntd-monthly-ridership/December 2025 Complete Monthly Ridership (with adjustments and estimates)_260202.xlsx file NTD monthly ridership workbook containing agency metadata and UPT series. Local project data owner not specified. Snapshot file; refresh by rerunning its pipeline step. May lag upstream source updates.
data/penndot-traffic/aadt_raw.json file Cached PennDOT ArcGIS feature response for Allegheny County roadway segments. Local project data owner not specified. Snapshot file; refresh by rerunning its pipeline step. May lag upstream source updates.
data/routes_by_month.csv file Monthly route OTP source table in wide format. Local project data owner not specified. Snapshot file; refresh by rerunning its pipeline step. May lag upstream source updates.
data/wprdc-schedule/paac_pick_lookup.csv file Pick period lookup metadata (cached copy when available). Local project data owner not specified. Snapshot file; refresh by rerunning its pipeline step. May lag upstream source updates.
data/wprdc-schedule/schedule_monthly_agg.csv file Monthly route/day-type schedule aggregates (cached copy when available). Local project data owner not specified. Snapshot file; refresh by rerunning its pipeline step. May lag upstream source updates.
ntd_agency table Agency dimension table keyed by NTD ID, mode, and TOS. Produced by NTD Ridership ETL. Produced by NTD Ridership ETL. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
ntd_ridership table Monthly UPT facts by NTD ID, mode, and TOS. Produced by NTD Ridership ETL. Produced by NTD Ridership ETL. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
otp_monthly table Monthly OTP values by route. Produced by Data Ingestion. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
ridership_monthly table Monthly ridership by route and day type. Produced by Data Ingestion. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
route_stops table Route-stop bridge with service frequency metrics. Produced by Data Ingestion. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
route_traffic table Route-level traffic exposure metrics including weighted AADT and match quality. Produced by Traffic Overlay ETL. Produced by Traffic Overlay ETL. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
routes table Route dimension table. Produced by Data Ingestion. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
scheduled_trips_monthly table Monthly route/day-type scheduled trip counts and distance metrics. Produced by Scheduled Trips ETL. Produced by Scheduled Trips ETL. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
stop_reference table Historical stop metadata and geography. Produced by Data Ingestion. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
stops table Physical stop dimension table. Produced by Data Ingestion. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
weather_monthly table Monthly precipitation, temperature, snowfall, and wind summary features. Produced by Weather ETL. Produced by Weather ETL. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.