Data Ingestion
Builds the normalized SQLite database from canonical local CSV sources.
Transit Performance Knowledge Base
Pipeline, analyses, and source lineage for on-time performance and ridership research.
Built 2026-03-03 02:23 UTC · Commit defd5c8
Builds the normalized SQLite database from canonical local CSV sources.
Loads monthly scheduled trip counts and schedule periods from WPRDC exports.
Fetches NOAA daily weather and aggregates monthly features for OTP modeling.
Computes route-level traffic exposure metrics by spatially joining GTFS and PennDOT AADT data.
Loads national monthly ridership benchmark data from the NTD workbook.
Tracks the overall PRT on-time performance trend from 2019 through 2025, including COVID impact and recovery.
Compares on-time performance across service modes (BUS, RAIL, INCLINE) and route types (local, limited, express, busway).
Ranks routes by average OTP, trend direction, and volatility to identify best/worst performers and most (in)consistent routes.
Investigates whether on-time performance varies systematically by neighborhood and municipality.
Identifies and investigates sharp OTP drops that may indicate route restructuring, detours, or data quality issues.
Decomposes route-level OTP into trend, seasonal, and residual components to identify whether summer or winter months systematically affect performance.
Tests whether routes with more stops have worse on-time performance, using a scatter plot of stop count against average OTP with mode-based coloring.
Visualizes stop-level on-time performance on a geographic scatter plot to identify corridor-level bottlenecks and clusters of poor performance.
Audits the Monongahela Incline data across all database tables to determine why it appears in OTP data with zero/null values.
Within-route panel: does changing trip frequency improve or degrade OTP?
Identify low-usage stops that could be consolidated to improve OTP, leveraging the finding that stop count is the strongest OTP predictor.
Assess whether bus shelters are equitably placed relative to stop-level ridership volume and demographics.
Map the spatial pattern of stop-level ridership loss and recovery between pre-pandemic and pandemic periods.
Quantify how concentrated ridership is across stops and test whether concentration correlates with route OTP.
Analyze net boarding-alighting flows by stop and direction to identify major trip generators and attractors.
Compare 2019-to-2024 ridership recovery across the 150 largest US transit agencies using NTD data; rank PRT nationally.
Track indexed monthly ridership for Pittsburgh and 7 peer cities from 2019-2025 using NTD data; compare recovery trajectories and mode splits.
Test whether OTP declines predict subsequent ridership losses using lagged correlation and Granger causality tests.
Compare ridership recovery trajectories with OTP recovery trajectories post-COVID to identify whether ridership recovery degrades OTP.
Estimate late rider-trips per route per month by combining ridership with OTP to identify where the most total human impact occurs.
Compare OTP and ridership trends across PRT garages (Ross, Collier, East Liberty, West Mifflin) to surface operational differences.
Track how weekday, Saturday, and Sunday ridership patterns shifted post-COVID and whether weekend ridership share correlates with OTP.
Measure what share of total system ridership is carried by the lowest-OTP routes using Lorenz curves and Gini coefficients.
Add ridership as a predictor to the Analysis 18 OLS model to test whether it adds explanatory power beyond stop count, span, and mode.
Tests whether PennDOT AADT traffic volume explains OTP variance beyond structural features
Tests whether weather (precipitation, snow, temperature) explains OTP variance or the counterintuitive seasonal pattern from Analysis 06.
Do schedule changes (pick period transitions) correlate with OTP shifts?
Tests whether high-frequency routes have worse on-time performance, using weekday trip counts as a proxy for service frequency.
Investigates whether routes with a structural imbalance between inbound and outbound trip frequency have worse on-time performance.
Computes the geographic span (max distance between any two stops) for each route and tests whether longer routes have worse on-time performance, disentangling route length from stop count.
Computes pairwise OTP time-series correlations between all routes and uses hierarchical clustering to identify groups of routes whose performance rises and falls together.
Measures how far each route's OTP has recovered relative to its pre-COVID baseline and identifies route characteristics that predict faster or slower recovery.
Aggregates on-time performance by municipality and county to assess service reliability equity at a broader geographic level than neighborhood analysis (Analysis 04).
Identifies high-connectivity stops (served by many routes) and tests whether passengers at transfer hubs experience worse OTP than those at low-connectivity stops.
Tests whether routes with different weekend-to-weekday service ratios show different OTP patterns, distinguishing commuter-oriented routes from all-day service routes.
Combines stop count, mode, bus subtype, geographic span, and service profile into a single OLS regression model to quantify relative importance and total explained variance.
Compute system OTP weighted by actual average daily ridership instead of scheduled trip frequency, to measure the average rider's experience.