Analysis

07: Stop Count vs OTP

Core OTP Patterns

Coverage: 2019-01 to 2025-11 (from otp_monthly).

Built 2026-03-03 02:23 UTC · Commit defd5c8

Page Navigation

Analysis Navigation

Data Provenance

flowchart LR
  07_stop_count_vs_otp(["07: Stop Count vs OTP"])
  t_otp_monthly[("otp_monthly")] --> 07_stop_count_vs_otp
  01_data_ingestion[["Data Ingestion"]] --> t_otp_monthly
  t_route_stops[("route_stops")] --> 07_stop_count_vs_otp
  01_data_ingestion[["Data Ingestion"]] --> t_route_stops
  t_routes[("routes")] --> 07_stop_count_vs_otp
  01_data_ingestion[["Data Ingestion"]] --> t_routes
  d1_07_stop_count_vs_otp(("polars (lib)")) --> 07_stop_count_vs_otp
  d2_07_stop_count_vs_otp(("scipy (lib)")) --> 07_stop_count_vs_otp
  classDef page fill:#dbeafe,stroke:#1d4ed8,color:#1e3a8a,stroke-width:2px;
  classDef table fill:#ecfeff,stroke:#0e7490,color:#164e63;
  classDef dep fill:#fff7ed,stroke:#c2410c,color:#7c2d12,stroke-dasharray: 4 2;
  classDef file fill:#eef2ff,stroke:#6366f1,color:#3730a3;
  classDef api fill:#f0fdf4,stroke:#16a34a,color:#14532d;
  classDef pipeline fill:#f5f3ff,stroke:#7c3aed,color:#4c1d95;
  class 07_stop_count_vs_otp page;
  class t_otp_monthly,t_route_stops,t_routes table;
  class d1_07_stop_count_vs_otp,d2_07_stop_count_vs_otp dep;
  class 01_data_ingestion pipeline;

Findings

Findings: Stop Count vs OTP

Summary

There is a moderately strong negative correlation between the number of stops on a route and its average OTP. This finding holds for both all routes and bus-only analysis, ruling out Simpson's paradox as a confounder.

Key Numbers

All routes: Pearson r = -0.53 (p < 0.001, n = 92)
Bus only: Pearson r = -0.50 (p < 0.001, n = 89)
Bus only: Spearman r = -0.49 (p < 0.001)
Routes with < 50 stops: typically 80%+ OTP
Routes with 150+ stops: typically below 60% OTP

Routes with fewer than 12 months of OTP data are excluded to avoid noisy averages from sparse observations.

Observations

The bus-only correlation (r = -0.50) is nearly as strong as the all-routes correlation (r = -0.53), confirming that the effect is not driven by the BUS/RAIL mode split (Simpson's paradox). Stop count predicts OTP within the bus mode alone.
The Spearman rank correlation (r = -0.49) is consistent with the Pearson, indicating the relationship is approximately monotonic without being driven by outliers or non-linearity.
Every stop adds dwell time (boarding/alighting), traffic signal delay, and schedule recovery risk. The cumulative effect is substantial.
Busway and rail routes tend to have fewer stops and dedicated right-of-way, giving them a double advantage.
Route 77 (Penn Hills) is an extreme case: 258 stops and among the worst OTP in the system.

Implication

Stop consolidation -- reducing the number of stops on long routes -- is a common transit strategy for improving schedule adherence. This data strongly supports that approach for PRT's worst-performing routes.

Caveats

Correlation is not causation. Routes with many stops also tend to serve congested corridors, cover longer distances, and carry more passengers -- all of which independently affect OTP.
Temporal mismatch: Stop counts come from the current route_stops snapshot while OTP is averaged across all historical months (2019--2025). Routes that changed stop configurations during this period have a mismatch between their current stop count and earlier OTP observations. This is inherent to the available data and cannot be corrected without historical stop-count snapshots.

Review History

2026-02-11: RED-TEAM-REPORTS/2026-02-11-analyses-01-05-07-11.md — 6 issues (0 significant). Added 12-month minimum filter, temporal mismatch note in METHODS.md, all_n tracking, replaced manual regression with linregress, added min-n guard, updated METHODS.md for Pearson+Spearman.

Output

image stop_count_vs_otp.png
scatter plot with regression line.

No interactive outputs declared.

data stop_count_otp.csv

per-route stop count and average OTP.

Preview CSV

Expand to load preview.

Methods

Methods: Stop Count vs OTP

Question

Do routes with more stops have worse on-time performance? Each stop is another opportunity to fall behind schedule.

Approach

Count distinct stops per route from route_stops.
Compute average OTP per route from otp_monthly, requiring at least 12 months of data (HAVING COUNT(*) >= 12) to exclude routes with sparse observations.
Create a scatter plot of stop count vs average OTP, colored by mode.
Compute Pearson and Spearman correlation coefficients, both for all routes and for bus-only (to check for Simpson's paradox from mixing modes).
Fit a simple linear regression line (bus-only, via scipy.stats.linregress).

Note: Stop counts come from the current route_stops snapshot, while OTP is averaged across all historical months. Routes that changed stop configurations over time will have a mismatch between their current stop count and the OTP values from earlier periods.

Data

Name	Description	Source
`otp_monthly`	Monthly OTP per route	`prt.db` table
`route_stops`	Stop count per route	`prt.db` table
`routes`	Mode classification	`prt.db` table

Output

output/stop_count_otp.csv -- per-route stop count and average OTP
output/stop_count_vs_otp.png -- scatter plot with regression line

Sources

Name	Type	Why It Matters	Owner	Freshness	Caveat
otp_monthly	table	Primary analytical table used in this page's computations.	Produced by Data Ingestion.	Updated when the producing pipeline step is rerun.	Coverage depends on upstream source availability and ETL assumptions.
route_stops	table	Primary analytical table used in this page's computations.	Produced by Data Ingestion.	Updated when the producing pipeline step is rerun.	Coverage depends on upstream source availability and ETL assumptions.
routes	table	Primary analytical table used in this page's computations.	Produced by Data Ingestion.	Updated when the producing pipeline step is rerun.	Coverage depends on upstream source availability and ETL assumptions.
polars	dependency	Runtime dependency required for this page's pipeline or analysis code.	Open-source Python ecosystem maintainers.	Version pinned by project environment until dependency updates are applied.	Library updates may change behavior or defaults.
scipy	dependency	Runtime dependency required for this page's pipeline or analysis code.	Open-source Python ecosystem maintainers.	Version pinned by project environment until dependency updates are applied.	Library updates may change behavior or defaults.