Analysis

27 - Traffic Congestion and OTP

Ridership and External Factors

Coverage: 2019-01 to 2025-11 (from otp_monthly).

Built 2026-03-03 02:23 UTC · Commit defd5c8

Page Navigation

Analysis Navigation

Data Provenance

flowchart LR
  27_traffic_congestion(["27 - Traffic Congestion and OTP"])
  t_otp_monthly[("otp_monthly")] --> 27_traffic_congestion
  01_data_ingestion[["Data Ingestion"]] --> t_otp_monthly
  t_route_stops[("route_stops")] --> 27_traffic_congestion
  01_data_ingestion[["Data Ingestion"]] --> t_route_stops
  t_route_traffic[("route_traffic")] --> 27_traffic_congestion
  04_traffic_overlay[["Traffic Overlay ETL"]] --> t_route_traffic
  t_routes[("routes")] --> 27_traffic_congestion
  01_data_ingestion[["Data Ingestion"]] --> t_routes
  t_stops[("stops")] --> 27_traffic_congestion
  01_data_ingestion[["Data Ingestion"]] --> t_stops
  d1_27_traffic_congestion(("numpy (lib)")) --> 27_traffic_congestion
  d2_27_traffic_congestion(("polars (lib)")) --> 27_traffic_congestion
  d3_27_traffic_congestion(("scipy (lib)")) --> 27_traffic_congestion
  classDef page fill:#dbeafe,stroke:#1d4ed8,color:#1e3a8a,stroke-width:2px;
  classDef table fill:#ecfeff,stroke:#0e7490,color:#164e63;
  classDef dep fill:#fff7ed,stroke:#c2410c,color:#7c2d12,stroke-dasharray: 4 2;
  classDef file fill:#eef2ff,stroke:#6366f1,color:#3730a3;
  classDef api fill:#f0fdf4,stroke:#16a34a,color:#14532d;
  classDef pipeline fill:#f5f3ff,stroke:#7c3aed,color:#4c1d95;
  class 27_traffic_congestion page;
  class t_otp_monthly,t_route_stops,t_route_traffic,t_routes,t_stops table;
  class d1_27_traffic_congestion,d2_27_traffic_congestion,d3_27_traffic_congestion dep;
  class 01_data_ingestion,04_traffic_overlay pipeline;

Findings

Findings: Traffic Congestion and OTP

Summary

Total traffic volume (AADT) does not explain OTP variance after controlling for structural features (F=0.011, p=0.92). However, truck percentage is a significant predictor (p=0.006), jointly boosting R2 from 0.40 to 0.45. Routes with higher truck traffic perform better on-time, likely because truck-heavy corridors tend to be wider arterials with fewer stops and more predictable traffic flow.

Key Numbers

89 routes analyzed (88 bus, 1 rail) after filtering match_rate >= 0.3
Base model (6 structural features): R2 = 0.40, Adj R2 = 0.36
Adding log(AADT): R2 = 0.40 (+0.0001), p = 0.92 -- not significant
Adding log(AADT) + truck_pct: R2 = 0.45 (+0.05), joint F = 3.97, p = 0.023
Truck percentage beta weight: +0.26 (p = 0.006), third-strongest predictor
VIF for log_aadt = 1.17 -- no multicollinearity concern
AADT range across matched routes: 4,181 -- 15,748 vehicles/day
Top AADT routes: G3 (15,748), 28X (15,328), P12 (15,162)

Observations

Traffic volume is orthogonal to OTP once structural features are controlled. The bivariate correlation is weak (r = +0.01) and the partial residual plot shows no trend. Routes on high-traffic roads perform similarly to those on quieter ones.
Truck percentage captures road type, not congestion. The positive coefficient (+0.021 OTP per 1% truck share) suggests that truck-heavy routes operate on highways/arterials with infrastructure advantages (dedicated lanes, longer signal phases, wider roads).
The three excluded routes (BLUE, P1, SLVR) are rail/busway with match_rate < 0.3, confirming that the PennDOT road network does not cover dedicated transit rights-of-way.
Bus-only subgroup shows the same null result: log_aadt F = 0.011, p = 0.92.
The spatial match rate is high (median 80%), indicating good coverage of the PennDOT road network for bus routes.

Discussion

The headline result is a null: traffic volume, the leading candidate for explaining the remaining 50% of OTP variance after structural features, contributes essentially nothing (R2 change of 0.01%). This is not a measurement failure -- the VIF of 1.17 confirms that AADT is nearly orthogonal to stop count and span, so it had every opportunity to capture independent variance. It simply doesn't.

Why not? The most likely explanation is that AADT measures the wrong thing. Annual average daily traffic smooths over the peak-hour congestion that actually delays buses. A road carrying 15,000 vehicles/day spread evenly across 24 hours is very different from one carrying the same total with 40% concentrated in two rush-hour peaks. PRT buses operate disproportionately during those peaks, so the congestion they experience is poorly proxied by a 24-hour average. Directional, time-of-day traffic counts -- or better yet, speed/travel-time data from probe vehicles -- would be a more direct test.

The truck percentage finding is the genuine contribution of this analysis. At +0.26 beta weight (third-strongest after stop count and span), truck share adds 5 percentage points of explained variance. But this almost certainly proxies for road classification rather than a direct truck-bus interaction. Roads with high truck percentages tend to be state highways and major arterials -- wider, with longer signal cycles, fewer pedestrian crossings, and more predictable traffic flow. The truck_pct variable is effectively encoding "this route runs on a highway" in a way that is_premium_bus failed to capture (since many non-premium routes also use arterial segments for part of their alignment).

This connects to the broader model-building narrative across Analyses 18, 26, and 27. The base model (stop count + span + mode) explains ~40% of variance. Adding n_munis as a suppressor pushes it to ~47%. Adding truck_pct pushes it to ~45% on the traffic-matched subsample. Ridership (Analysis 26) added nothing. The cumulative picture is that roughly half of OTP variance is explained by route geometry and road type, and the other half likely requires operational data (schedule padding, driver availability, vehicle condition, real-time traffic conditions) that is not in this dataset.

For policy, the null AADT result is actually useful: it suggests that rerouting buses to lower-traffic roads would not improve OTP, since traffic volume per se is not the problem. The truck_pct finding, if it indeed proxies for road type, suggests that routes spending more of their alignment on arterials (vs. neighborhood streets) perform better -- consistent with the stop count finding, since arterial segments typically have fewer stops per mile.

Caveats

AADT is an annual average; it does not capture peak-hour congestion, which is when buses run most frequently and are most affected. Directional or time-of-day traffic data would be a stronger test.
PennDOT covers state routes; local streets (where many bus routes run) may be underrepresented.
The truck_pct finding may proxy for road classification rather than a direct causal mechanism.
The spatial matching uses a 30m buffer, which may associate routes with adjacent parallel roads in dense urban areas.

Output

image aadt_vs_otp_scatter.png
bivariate scatter of AADT vs OTP.

image coefficient_comparison.png
beta weight comparison between base and expanded models.

image partial_residual.png
partial residual plot for log_aadt.

No interactive outputs declared.

data model_comparison.csv

regression results for all models.

Preview CSV

Expand to load preview.

data route_traffic_summary.csv

per-route traffic data with OTP.

Preview CSV

Expand to load preview.

data vif_table.csv

VIF values for expanded model.

Preview CSV

Expand to load preview.

Methods

Methods: Traffic Congestion and OTP

Question

Does traffic volume (AADT) explain OTP variance beyond stop count, geographic span, and other structural features from the Analysis 18 model?

Approach

Replicate the Analysis 18 six-feature OLS model on routes with matched PennDOT traffic data.
Filter to routes with match_rate >= 0.3 (sufficient spatial overlap with PennDOT road network).
Add log-transformed weighted AADT as a seventh predictor. Log transform is used because the effect of traffic on delays is expected to show diminishing returns (doubling from 5,000 to 10,000 AADT matters more than 25,000 to 30,000).
Compare adjusted R-squared between six-feature and seven-feature models using a nested F-test.
Fit a second expanded model adding avg_truck_pct as an eighth predictor.
Check VIF for multicollinearity (AADT may correlate with span or stop count).
Repeat with bus-only subset.
Generate bivariate scatter (AADT vs OTP), coefficient comparison chart, and partial residual plot.

Data

Name	Description	Source
`otp_monthly`	route_id, month, otp (averaged to route-level, 12+ months required)	`prt.db` table
`route_traffic`	route_id, weighted_aadt, avg_truck_pct, match_rate (built by `traffic_overlay.py`)	`prt.db` table
`route_stops`	stop counts, trip frequencies	`prt.db` table
`stops`	lat, lon for geographic span computation	`prt.db` table
`routes`	route_id, mode for subtype classification	`prt.db` table

Output

output/model_comparison.csv -- regression results for all models
output/vif_table.csv -- VIF values for expanded model
output/route_traffic_summary.csv -- per-route traffic data with OTP
output/aadt_vs_otp_scatter.png -- bivariate scatter of AADT vs OTP
output/coefficient_comparison.png -- beta weight comparison between base and expanded models
output/partial_residual.png -- partial residual plot for log_aadt

Sources

Name	Type	Why It Matters	Owner	Freshness	Caveat
otp_monthly	table	Primary analytical table used in this page's computations.	Produced by Data Ingestion.	Updated when the producing pipeline step is rerun.	Coverage depends on upstream source availability and ETL assumptions.
route_stops	table	Primary analytical table used in this page's computations.	Produced by Data Ingestion.	Updated when the producing pipeline step is rerun.	Coverage depends on upstream source availability and ETL assumptions.
route_traffic	table	Primary analytical table used in this page's computations.	Produced by Traffic Overlay ETL.	Updated when the producing pipeline step is rerun.	Coverage depends on upstream source availability and ETL assumptions.
routes	table	Primary analytical table used in this page's computations.	Produced by Data Ingestion.	Updated when the producing pipeline step is rerun.	Coverage depends on upstream source availability and ETL assumptions.
stops	table	Primary analytical table used in this page's computations.	Produced by Data Ingestion.	Updated when the producing pipeline step is rerun.	Coverage depends on upstream source availability and ETL assumptions.
numpy	dependency	Runtime dependency required for this page's pipeline or analysis code.	Open-source Python ecosystem maintainers.	Version pinned by project environment until dependency updates are applied.	Library updates may change behavior or defaults.
polars	dependency	Runtime dependency required for this page's pipeline or analysis code.	Open-source Python ecosystem maintainers.	Version pinned by project environment until dependency updates are applied.	Library updates may change behavior or defaults.
scipy	dependency	Runtime dependency required for this page's pipeline or analysis code.	Open-source Python ecosystem maintainers.	Version pinned by project environment until dependency updates are applied.	Library updates may change behavior or defaults.