Pipeline

Scheduled Trips ETL

Coverage: 2012-03 to 2026-02 (from schedule_periods, scheduled_trips_monthly).

Built 2026-03-03 02:23 UTC ยท Commit defd5c8

Page Navigation

Data Provenance

flowchart LR
  02_scheduled_trips(["Scheduled Trips ETL"])
  f1_02_scheduled_trips[/"data/wprdc-schedule/schedule_monthly_agg.csv"/] --> 02_scheduled_trips
  f2_02_scheduled_trips[/"data/wprdc-schedule/paac_pick_lookup.csv"/] --> 02_scheduled_trips
  a1_02_scheduled_trips{"WPRDC Schedule Monthly Aggregate"} --> 02_scheduled_trips
  a2_02_scheduled_trips{"WPRDC Pick Lookup"} --> 02_scheduled_trips
  t_routes[("routes")] --> 02_scheduled_trips
  01_data_ingestion[["Data Ingestion"]] --> t_routes
  02_scheduled_trips --> tp_scheduled_trips_monthly[("scheduled_trips_monthly")]
  02_scheduled_trips --> tp_schedule_periods[("schedule_periods")]
  classDef page fill:#dbeafe,stroke:#1d4ed8,color:#1e3a8a,stroke-width:2px;
  classDef table fill:#ecfeff,stroke:#0e7490,color:#164e63;
  classDef dep fill:#fff7ed,stroke:#c2410c,color:#7c2d12,stroke-dasharray: 4 2;
  classDef file fill:#eef2ff,stroke:#6366f1,color:#3730a3;
  classDef api fill:#f0fdf4,stroke:#16a34a,color:#14532d;
  classDef pipeline fill:#f5f3ff,stroke:#7c3aed,color:#4c1d95;
  class 02_scheduled_trips page;
  class t_routes,tp_schedule_periods,tp_scheduled_trips_monthly table;
  class f1_02_scheduled_trips,f2_02_scheduled_trips file;
  class a1_02_scheduled_trips,a2_02_scheduled_trips api;
  class 01_data_ingestion pipeline;

Findings

Findings: Scheduled Trips ETL

Summary

Scheduled trip and pick-period tables are loaded into prt.db for overlap months with OTP coverage.

Notes

  • Route matching and overlap diagnostics are emitted during execution.
  • Cached files under data/wprdc-schedule/ are used when available.

Methods

Methods: Scheduled Trips ETL

Question

How do we add monthly service-level schedule data needed for longitudinal service and causality analyses?

Approach

  1. Fetch or read cached WPRDC schedule exports.
  2. Normalize route IDs, month keys, day type, and schedule period fields.
  3. Deduplicate overlapping schedule periods per route/month/day type.
  4. Rebuild scheduled_trips_monthly and schedule_periods in prt.db.

Data

  • WPRDC monthly schedule aggregates (schedule_monthly_agg.csv)
  • WPRDC pick lookup (paac_pick_lookup.csv)
  • Route IDs from routes table in prt.db

Output

  • scheduled_trips_monthly table in data/prt.db
  • schedule_periods table in data/prt.db

Tables Produced

TableDescription
scheduled_trips_monthly Monthly route/day-type scheduled trip counts and distance metrics.
schedule_periods Pick period start/end dates keyed by pick ID.

Sources

NameTypeWhy It MattersOwnerFreshnessCaveat
data/wprdc-schedule/schedule_monthly_agg.csv file Monthly route/day-type schedule aggregates (cached copy when available). Local project data owner not specified. Snapshot file; refresh by rerunning its pipeline step. May lag upstream source updates.
data/wprdc-schedule/paac_pick_lookup.csv file Pick period lookup metadata (cached copy when available). Local project data owner not specified. Snapshot file; refresh by rerunning its pipeline step. May lag upstream source updates.
WPRDC Schedule Monthly Aggregate api Public dataset of route-level monthly schedule aggregates. Hosted by data.wprdc.org. Queried during pipeline execution; freshness depends on upstream updates. Availability and schema can change without notice.
WPRDC Pick Lookup api Public lookup for pick period date ranges. Hosted by data.wprdc.org. Queried during pipeline execution; freshness depends on upstream updates. Availability and schema can change without notice.
routes table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.