Analysis

27 - Traffic Congestion and OTP

Ridership and External Factors

Coverage: 2019-01 to 2025-11 (from otp_monthly).

Built 2026-03-03 02:23 UTC · Commit defd5c8

Page Navigation

Analysis Navigation

Data Provenance

flowchart LR
  27_traffic_congestion(["27 - Traffic Congestion and OTP"])
  t_otp_monthly[("otp_monthly")] --> 27_traffic_congestion
  01_data_ingestion[["Data Ingestion"]] --> t_otp_monthly
  t_route_stops[("route_stops")] --> 27_traffic_congestion
  01_data_ingestion[["Data Ingestion"]] --> t_route_stops
  t_route_traffic[("route_traffic")] --> 27_traffic_congestion
  04_traffic_overlay[["Traffic Overlay ETL"]] --> t_route_traffic
  t_routes[("routes")] --> 27_traffic_congestion
  01_data_ingestion[["Data Ingestion"]] --> t_routes
  t_stops[("stops")] --> 27_traffic_congestion
  01_data_ingestion[["Data Ingestion"]] --> t_stops
  d1_27_traffic_congestion(("numpy (lib)")) --> 27_traffic_congestion
  d2_27_traffic_congestion(("polars (lib)")) --> 27_traffic_congestion
  d3_27_traffic_congestion(("scipy (lib)")) --> 27_traffic_congestion
  classDef page fill:#dbeafe,stroke:#1d4ed8,color:#1e3a8a,stroke-width:2px;
  classDef table fill:#ecfeff,stroke:#0e7490,color:#164e63;
  classDef dep fill:#fff7ed,stroke:#c2410c,color:#7c2d12,stroke-dasharray: 4 2;
  classDef file fill:#eef2ff,stroke:#6366f1,color:#3730a3;
  classDef api fill:#f0fdf4,stroke:#16a34a,color:#14532d;
  classDef pipeline fill:#f5f3ff,stroke:#7c3aed,color:#4c1d95;
  class 27_traffic_congestion page;
  class t_otp_monthly,t_route_stops,t_route_traffic,t_routes,t_stops table;
  class d1_27_traffic_congestion,d2_27_traffic_congestion,d3_27_traffic_congestion dep;
  class 01_data_ingestion,04_traffic_overlay pipeline;

Findings

Findings: Traffic Congestion and OTP

Summary

Total traffic volume (AADT) does not explain OTP variance after controlling for structural features (F=0.011, p=0.92). However, truck percentage is a significant predictor (p=0.006), jointly boosting R2 from 0.40 to 0.45. Routes with higher truck traffic perform better on-time, likely because truck-heavy corridors tend to be wider arterials with fewer stops and more predictable traffic flow.

Key Numbers

  • 89 routes analyzed (88 bus, 1 rail) after filtering match_rate >= 0.3
  • Base model (6 structural features): R2 = 0.40, Adj R2 = 0.36
  • Adding log(AADT): R2 = 0.40 (+0.0001), p = 0.92 -- not significant
  • Adding log(AADT) + truck_pct: R2 = 0.45 (+0.05), joint F = 3.97, p = 0.023
  • Truck percentage beta weight: +0.26 (p = 0.006), third-strongest predictor
  • VIF for log_aadt = 1.17 -- no multicollinearity concern
  • AADT range across matched routes: 4,181 -- 15,748 vehicles/day
  • Top AADT routes: G3 (15,748), 28X (15,328), P12 (15,162)

Observations

  • Traffic volume is orthogonal to OTP once structural features are controlled. The bivariate correlation is weak (r = +0.01) and the partial residual plot shows no trend. Routes on high-traffic roads perform similarly to those on quieter ones.
  • Truck percentage captures road type, not congestion. The positive coefficient (+0.021 OTP per 1% truck share) suggests that truck-heavy routes operate on highways/arterials with infrastructure advantages (dedicated lanes, longer signal phases, wider roads).
  • The three excluded routes (BLUE, P1, SLVR) are rail/busway with match_rate < 0.3, confirming that the PennDOT road network does not cover dedicated transit rights-of-way.
  • Bus-only subgroup shows the same null result: log_aadt F = 0.011, p = 0.92.
  • The spatial match rate is high (median 80%), indicating good coverage of the PennDOT road network for bus routes.

Discussion

The headline result is a null: traffic volume, the leading candidate for explaining the remaining 50% of OTP variance after structural features, contributes essentially nothing (R2 change of 0.01%). This is not a measurement failure -- the VIF of 1.17 confirms that AADT is nearly orthogonal to stop count and span, so it had every opportunity to capture independent variance. It simply doesn't.

Why not? The most likely explanation is that AADT measures the wrong thing. Annual average daily traffic smooths over the peak-hour congestion that actually delays buses. A road carrying 15,000 vehicles/day spread evenly across 24 hours is very different from one carrying the same total with 40% concentrated in two rush-hour peaks. PRT buses operate disproportionately during those peaks, so the congestion they experience is poorly proxied by a 24-hour average. Directional, time-of-day traffic counts -- or better yet, speed/travel-time data from probe vehicles -- would be a more direct test.

The truck percentage finding is the genuine contribution of this analysis. At +0.26 beta weight (third-strongest after stop count and span), truck share adds 5 percentage points of explained variance. But this almost certainly proxies for road classification rather than a direct truck-bus interaction. Roads with high truck percentages tend to be state highways and major arterials -- wider, with longer signal cycles, fewer pedestrian crossings, and more predictable traffic flow. The truck_pct variable is effectively encoding "this route runs on a highway" in a way that is_premium_bus failed to capture (since many non-premium routes also use arterial segments for part of their alignment).

This connects to the broader model-building narrative across Analyses 18, 26, and 27. The base model (stop count + span + mode) explains ~40% of variance. Adding n_munis as a suppressor pushes it to ~47%. Adding truck_pct pushes it to ~45% on the traffic-matched subsample. Ridership (Analysis 26) added nothing. The cumulative picture is that roughly half of OTP variance is explained by route geometry and road type, and the other half likely requires operational data (schedule padding, driver availability, vehicle condition, real-time traffic conditions) that is not in this dataset.

For policy, the null AADT result is actually useful: it suggests that rerouting buses to lower-traffic roads would not improve OTP, since traffic volume per se is not the problem. The truck_pct finding, if it indeed proxies for road type, suggests that routes spending more of their alignment on arterials (vs. neighborhood streets) perform better -- consistent with the stop count finding, since arterial segments typically have fewer stops per mile.

Caveats

  • AADT is an annual average; it does not capture peak-hour congestion, which is when buses run most frequently and are most affected. Directional or time-of-day traffic data would be a stronger test.
  • PennDOT covers state routes; local streets (where many bus routes run) may be underrepresented.
  • The truck_pct finding may proxy for road classification rather than a direct causal mechanism.
  • The spatial matching uses a 30m buffer, which may associate routes with adjacent parallel roads in dense urban areas.

Output

Methods

Methods: Traffic Congestion and OTP

Question

Does traffic volume (AADT) explain OTP variance beyond stop count, geographic span, and other structural features from the Analysis 18 model?

Approach

  • Replicate the Analysis 18 six-feature OLS model on routes with matched PennDOT traffic data.
  • Filter to routes with match_rate >= 0.3 (sufficient spatial overlap with PennDOT road network).
  • Add log-transformed weighted AADT as a seventh predictor. Log transform is used because the effect of traffic on delays is expected to show diminishing returns (doubling from 5,000 to 10,000 AADT matters more than 25,000 to 30,000).
  • Compare adjusted R-squared between six-feature and seven-feature models using a nested F-test.
  • Fit a second expanded model adding avg_truck_pct as an eighth predictor.
  • Check VIF for multicollinearity (AADT may correlate with span or stop count).
  • Repeat with bus-only subset.
  • Generate bivariate scatter (AADT vs OTP), coefficient comparison chart, and partial residual plot.

Data

Name Description Source
otp_monthly route_id, month, otp (averaged to route-level, 12+ months required) prt.db table
route_traffic route_id, weighted_aadt, avg_truck_pct, match_rate (built by traffic_overlay.py) prt.db table
route_stops stop counts, trip frequencies prt.db table
stops lat, lon for geographic span computation prt.db table
routes route_id, mode for subtype classification prt.db table

Output

  • output/model_comparison.csv -- regression results for all models
  • output/vif_table.csv -- VIF values for expanded model
  • output/route_traffic_summary.csv -- per-route traffic data with OTP
  • output/aadt_vs_otp_scatter.png -- bivariate scatter of AADT vs OTP
  • output/coefficient_comparison.png -- beta weight comparison between base and expanded models
  • output/partial_residual.png -- partial residual plot for log_aadt

Sources

NameTypeWhy It MattersOwnerFreshnessCaveat
otp_monthly table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
route_stops table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
route_traffic table Primary analytical table used in this page's computations. Produced by Traffic Overlay ETL. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
routes table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
stops table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
numpy dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.
polars dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.
scipy dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.