Analysis

31 - Stop Consolidation Candidates

Equity and Strategic Planning

Coverage: 2019-01 to 2025-11 (from otp_monthly).

Built 2026-04-03 20:09 UTC · Commit 7c56b9a

Page Navigation

Analysis Navigation

Data Provenance

flowchart LR
  31_stop_consolidation(["31 - Stop Consolidation Candidates"])
  f1_31_stop_consolidation[/"data/bus-stop-usage/wprdc_stop_data.csv"/] --> 31_stop_consolidation
  t_otp_monthly[("otp_monthly")] --> 31_stop_consolidation
  01_data_ingestion[["Data Ingestion"]] --> t_otp_monthly
  u1_01_data_ingestion[/"data/routes_by_month.csv"/] --> 01_data_ingestion
  u2_01_data_ingestion[/"data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv"/] --> 01_data_ingestion
  u3_01_data_ingestion[/"data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv"/] --> 01_data_ingestion
  u4_01_data_ingestion[/"data/PRT_Stop_Reference_Lookup_Table.csv"/] --> 01_data_ingestion
  u5_01_data_ingestion[/"data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv"/] --> 01_data_ingestion
  t_route_stops[("route_stops")] --> 31_stop_consolidation
  01_data_ingestion[["Data Ingestion"]] --> t_route_stops
  t_routes[("routes")] --> 31_stop_consolidation
  01_data_ingestion[["Data Ingestion"]] --> t_routes
  d1_31_stop_consolidation(("numpy (lib)")) --> 31_stop_consolidation
  d2_31_stop_consolidation(("polars (lib)")) --> 31_stop_consolidation
  d3_31_stop_consolidation(("scipy (lib)")) --> 31_stop_consolidation
  classDef page fill:#dbeafe,stroke:#1d4ed8,color:#1e3a8a,stroke-width:2px;
  classDef table fill:#ecfeff,stroke:#0e7490,color:#164e63;
  classDef dep fill:#fff7ed,stroke:#c2410c,color:#7c2d12,stroke-dasharray: 4 2;
  classDef file fill:#eef2ff,stroke:#6366f1,color:#3730a3;
  classDef api fill:#f0fdf4,stroke:#16a34a,color:#14532d;
  classDef pipeline fill:#f5f3ff,stroke:#7c3aed,color:#4c1d95;
  class 31_stop_consolidation page;
  class t_otp_monthly,t_route_stops,t_routes table;
  class d1_31_stop_consolidation,d2_31_stop_consolidation,d3_31_stop_consolidation dep;
  class f1_31_stop_consolidation,u1_01_data_ingestion,u2_01_data_ingestion,u3_01_data_ingestion,u4_01_data_ingestion,u5_01_data_ingestion file;
  class 01_data_ingestion pipeline;

Findings

Findings: Stop Consolidation Candidates

Summary

43% of all stop-route combinations in the PRT system see fewer than 5 daily boardings+alightings on weekdays, and nearly all of these have a same-route neighbor within 400 m walking distance. Removing these low-usage stops could yield an average OTP improvement of +3.2 percentage points per route, with the highest-impact routes gaining up to +10 pp. The top candidates are long suburban/flyer routes with many lightly used stops along corridors.

Key Numbers

  • 11,461 stop-route combinations in the pre-pandemic weekday data
  • 4,991 (43%) are low-usage (< 5 avg daily ons+offs)
  • 4,963 of those have a neighbor on the same route within 400 m (consolidation candidates)
  • 87 of 90 routes (97%) matched to OTP data have at least one candidate
  • Median candidates per route: 44 stops
  • Regression slope: each stop removed is associated with +0.059 pp OTP
  • Average estimated OTP gain: +3.2 pp across routes with candidates
  • Maximum estimated OTP gain: +10.2 pp (Route 59, Mon Valley -- 174 candidate stops)

Observations

  • Flyer/express routes have the most candidates: P10 (167), P16 (159), O5 (148) -- these long-distance routes serve many stops with minimal usage along the way.
  • Route 59 (Mon Valley) tops the list with 174 candidates out of 334 stop-route pairs, representing a potential 52% reduction in stops.
  • The candidate map shows candidates distributed system-wide, with particularly dense clusters in outer suburban corridors where stop spacing is tight but usage is low.
  • Almost all low-usage stops (99.4%) have a nearby neighbor, meaning very few riders would lack a within-walking-distance alternative.
  • The OTP gain estimate is conservative: the regression slope (-0.059 pp/stop) comes from cross-sectional variation, not from an experiment. Actual gains from targeted consolidation could differ.

Discussion

What the data shows

Analysis 07 established that stop count is the strongest single predictor of poor OTP (r = -0.53), and this analysis shows that nearly half the system's stop-route pairs see trivially low usage. The correlation between stop count and OTP is real and robust across multiple specifications.

The concentration of candidates on flyer routes (P10, P16, O5) is intuitive: these routes traverse long suburban corridors where stops were placed at frequent intervals to maximize coverage, but actual demand clusters at a few park-and-ride or transfer locations.

Why the OTP gain estimate is likely overstated

The +3.2 pp average gain estimate assumes each stop removed saves equivalent time, but PRT bus drivers already skip stops where no one is waiting and no one has signaled to alight. A low-usage stop with <5 daily boardings is empty on the vast majority of individual bus trips, meaning the bus already passes it without stopping most of the time. Removing the sign does not change this operational reality.

The cross-sectional regression slope (-0.059 pp/stop) captures the fact that routes with many stops are structurally different -- they are longer, serve denser urban corridors with more traffic signals, and have higher cumulative probability of someone being at the next stop. These are route design characteristics, not marginal effects of individual stops. The causal effect of removing a single low-usage stop is likely well below the regression estimate.

Analysis 34 (Ridership Concentration) reinforces this interpretation: per-stop ridership concentration has no correlation with OTP (r = -0.016, p = 0.88), suggesting dwell time from passenger volumes is not the dominant mechanism. The stop count/OTP relationship is better understood as a proxy for route design philosophy (local vs limited-stop vs express) rather than a per-stop causal lever.

Accessibility and equity concerns

The 400 m walk-distance filter assumes riders can walk to the next stop, but this may not hold for riders with disabilities, elderly riders, or those with mobility limitations -- particularly given Pittsburgh's hilly terrain. ADA compliance is not just a legal requirement but a core service obligation. Any consolidation program would require stop-by-stop accessibility review. Some low-usage stops may also serve riders making short trips where incremental stop spacing is the primary value of the bus service.

Reframing the finding

The stop count/OTP relationship is best read as evidence that route design with fewer, better-spaced stops outperforms local-stop design -- which is already reflected in the busway/express vs local gap (Analysis 02). The policy implication points more toward limited-stop or express overlays on high-ridership corridors than toward individual stop removal. Converting low-usage suburban segments to limited-stop service (as the flyer route candidates suggest) is more defensible than piecemeal stop removal, because it redesigns the service pattern rather than degrading existing local coverage.

Stop consolidation remains politically sensitive. Community opposition to stop removal often exceeds what ridership data would justify. A phased approach -- starting with stops below 1 rider/day that have a neighbor within 200 m, with accessibility review -- would minimize controversy while testing whether actual OTP gains materialize.

Caveats

  • The OTP gain estimate is likely overstated because buses already skip empty stops operationally; removing the sign at a stop that is already being passed most trips has minimal time savings.
  • The regression slope is a cross-sectional system-wide average reflecting route design differences, not a causal per-stop marginal effect. Individual stop removal may have near-zero impact.
  • Accessibility: the 400 m walk-distance filter may be inadequate for riders with disabilities or limited mobility, especially on Pittsburgh's hilly terrain. ADA review is required before any consolidation.
  • Short trips: some low-usage stops serve riders making short trips where incremental stop spacing is the primary value of the service.
  • Stop usage data is from FY2019 (pre-pandemic); current usage patterns may have shifted.
  • Some stops may appear low-usage on one route but serve high volumes on other routes at the same physical location; the data is per stop-route, not per physical stop.
  • Projected stop counts can go negative for routes where the CSV has more stop-route combinations than the DB stop count (due to different data vintages).

Review History

  • 2026-02-27: RED-TEAM-REPORTS/2026-02-27-analyses-31-35.md — 1 significant issue. OTP gain estimates now produced only for bus routes; non-bus routes set to null (bus-only regression slope not applicable to rail/busway). METHODS.md updated to reflect independent slope computation.

Output

Methods

Methods: Stop Consolidation Candidates

Question

Which low-usage bus stops are candidates for consolidation, and how much OTP improvement could each route expect from fewer stops?

Approach

  • Use pre-pandemic weekday stop-level ridership (datekeys 201909 and 202001) as a stable baseline, averaging across the two periods.
  • Compute average daily boardings + alightings per stop-route combination.
  • Flag stops with average daily usage below a threshold (< 5 total ons+offs per weekday).
  • For each low-usage stop on a route, compute haversine distance to the nearest other stop on the same route. If a neighbor exists within 400 m, the stop is a consolidation candidate (riders can walk to the next stop).
  • Per route: count current stops, count candidates, compute the potential reduced stop count.
  • Compute the stop-count/OTP regression slope independently (bus-only) and apply it to estimate the OTP benefit from consolidation. OTP gain estimates are produced only for bus routes; non-bus routes are included in the summary but flagged as not applicable for the bus-derived slope.
  • Generate per-route summary, system-wide statistics, and a chart of estimated OTP gains.

Data

Name Description Source
wprdc_stop_data.csv Stop-level boardings/alightings by route, period, and day type Local CSV (data/bus-stop-usage/)
route_stops Current route-stop assignments and stop locations prt.db table
stops Stop coordinates (lat/lon) prt.db table
otp_monthly Monthly OTP per route (for current performance baseline) prt.db table
routes Route name and mode prt.db table

Output

  • output/consolidation_candidates.csv -- per-stop detail: stop, route, usage, nearest neighbor distance, candidate flag
  • output/route_consolidation_summary.csv -- per-route summary: current stops, candidates, projected new stop count, estimated OTP gain
  • output/otp_gain_by_route.png -- bar chart of estimated OTP improvement per route
  • output/candidate_map.png -- scatter map of candidate stops colored by usage

Source Code

"""Identify low-usage bus stops that are candidates for consolidation to improve OTP."""

import math

import numpy as np
import polars as pl
from scipy import stats

from prt_otp_analysis.common import DATA_DIR, analysis_dir, phase, query_to_polars, run_analysis, save_chart, save_csv, setup_plotting

OUT = analysis_dir(__file__)

USAGE_THRESHOLD = 5      # avg daily ons+offs below this = low-usage
WALK_DISTANCE_M = 400    # max walk distance to nearest neighbor


def haversine_m(lat1: float, lon1: float, lat2: float, lon2: float) -> float:
    """Return the distance in metres between two lat/lon points."""
    R = 6_371_000
    rlat1, rlat2 = math.radians(lat1), math.radians(lat2)
    dlat = math.radians(lat2 - lat1)
    dlon = math.radians(lon2 - lon1)
    a = math.sin(dlat / 2) ** 2 + math.cos(rlat1) * math.cos(rlat2) * math.sin(dlon / 2) ** 2
    return R * 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))


def load_stop_usage() -> pl.DataFrame:
    """Load pre-pandemic weekday stop-route usage from the WPRDC CSV."""
    csv_path = DATA_DIR / "bus-stop-usage" / "wprdc_stop_data.csv"
    df = pl.read_csv(csv_path, null_values=["NA", ""])
    # Keep pre-pandemic weekday rows only
    df = df.filter(
        (pl.col("time_period") == "Pre-pandemic")
        & (pl.col("serviceday") == "Weekday")
    )
    # Average across the two pre-pandemic datekeys (201909, 202001)
    usage = (
        df.group_by(["stop_id", "route_name"])
        .agg(
            pl.col("avg_ons").mean().alias("avg_ons"),
            pl.col("avg_offs").mean().alias("avg_offs"),
            pl.col("latitude").first().alias("lat"),
            pl.col("longitude").first().alias("lon"),
            pl.col("stop_name").first().alias("stop_name"),
        )
        .with_columns(
            (pl.col("avg_ons") + pl.col("avg_offs")).alias("avg_daily_usage")
        )
    )
    return usage


def load_route_otp() -> pl.DataFrame:
    """Load average OTP and stop count per route from the database."""
    return query_to_polars("""
        SELECT o.route_id, r.route_name, r.mode,
               AVG(o.otp) AS avg_otp,
               COUNT(DISTINCT rs.stop_id) AS stop_count
        FROM otp_monthly o
        JOIN routes r ON o.route_id = r.route_id
        LEFT JOIN route_stops rs ON o.route_id = rs.route_id
        GROUP BY o.route_id
        HAVING COUNT(DISTINCT o.month) >= 12
    """)


def get_otp_slope() -> float:
    """Compute the OTP ~ stop_count regression slope (bus-only) from DB data."""
    df = query_to_polars("""
        SELECT o.route_id, r.mode,
               AVG(o.otp) AS avg_otp,
               COUNT(DISTINCT rs.stop_id) AS stop_count
        FROM otp_monthly o
        JOIN routes r ON o.route_id = r.route_id
        LEFT JOIN route_stops rs ON o.route_id = rs.route_id
        GROUP BY o.route_id
        HAVING COUNT(DISTINCT o.month) >= 12
    """)
    bus = df.filter(pl.col("mode") == "BUS")
    lr = stats.linregress(bus["stop_count"].to_list(), bus["avg_otp"].to_list())
    return lr.slope


def find_candidates(usage: pl.DataFrame) -> pl.DataFrame:
    """Flag low-usage stops with a same-route neighbor within walk distance."""
    # Mark low-usage
    usage = usage.with_columns(
        (pl.col("avg_daily_usage") < USAGE_THRESHOLD).alias("low_usage")
    )

    # For each stop-route, find nearest neighbor on the same route
    rows = usage.to_dicts()
    # Build route -> list of (stop_id, lat, lon)
    route_stops: dict[str, list[tuple[str, float, float]]] = {}
    for r in rows:
        route_stops.setdefault(r["route_name"], []).append(
            (r["stop_id"], r["lat"], r["lon"])
        )

    nearest_dist = []
    for r in rows:
        siblings = route_stops[r["route_name"]]
        min_d = float("inf")
        for sid, slat, slon in siblings:
            if sid == r["stop_id"]:
                continue
            d = haversine_m(r["lat"], r["lon"], slat, slon)
            if d < min_d:
                min_d = d
        nearest_dist.append(min_d if min_d != float("inf") else None)

    usage = usage.with_columns(
        pl.Series("nearest_neighbor_m", nearest_dist)
    )

    # Candidate = low usage AND neighbor within walk distance
    usage = usage.with_columns(
        (
            pl.col("low_usage")
            & pl.col("nearest_neighbor_m").is_not_null()
            & (pl.col("nearest_neighbor_m") <= WALK_DISTANCE_M)
        ).alias("candidate")
    )
    return usage


def route_summary(candidates: pl.DataFrame, route_otp: pl.DataFrame, slope: float) -> pl.DataFrame:
    """Build per-route consolidation summary with estimated OTP gain (bus-only)."""
    # Count candidates per route (using route_name from CSV = route_id in DB)
    per_route = (
        candidates.group_by("route_name")
        .agg(
            pl.col("candidate").sum().alias("n_candidates"),
            pl.len().alias("n_stops_csv"),
        )
    )

    # CSV route_name (e.g. "69") corresponds to DB route_id
    merged = per_route.join(
        route_otp,
        left_on="route_name",
        right_on="route_id",
        how="inner",
    )

    # Only apply bus-only regression slope to bus routes; flag non-bus as extrapolation
    merged = merged.with_columns(
        (pl.col("stop_count") - pl.col("n_candidates")).alias("projected_stops"),
        pl.when(pl.col("mode") == "BUS")
        .then(-slope * pl.col("n_candidates"))
        .otherwise(None)
        .alias("est_otp_gain"),
    )

    return merged.sort("est_otp_gain", descending=True)


def make_charts(candidates: pl.DataFrame, summary: pl.DataFrame) -> None:
    """Generate bar chart of OTP gains and scatter map of candidate stops."""
    plt = setup_plotting()

    # --- Bar chart: top 20 routes by estimated OTP gain ---
    top = summary.filter(
        (pl.col("n_candidates") > 0) & pl.col("est_otp_gain").is_not_null()
    ).sort("est_otp_gain", descending=True).head(20)
    if len(top) == 0:
        print("  No routes with candidates -- skipping bar chart.")
        return

    fig, ax = plt.subplots(figsize=(12, 7))
    labels = top["route_name"].to_list()
    gains = [g * 100 for g in top["est_otp_gain"].to_list()]  # convert to pp
    n_cands = top["n_candidates"].to_list()
    bars = ax.barh(range(len(labels)), gains, color="#3b82f6", edgecolor="white")
    ax.set_yticks(range(len(labels)))
    ax.set_yticklabels([f"{lbl}  ({n} stops)" for lbl, n in zip(labels, n_cands)], fontsize=9)
    ax.invert_yaxis()
    ax.set_xlabel("Estimated OTP Gain (percentage points)")
    ax.set_title("Top 20 Routes: Estimated OTP Improvement from Stop Consolidation")
    for bar, val in zip(bars, gains):
        ax.text(bar.get_width() + 0.05, bar.get_y() + bar.get_height() / 2,
                f"+{val:.1f} pp", va="center", fontsize=8)
    save_chart(fig, OUT / "otp_gain_by_route.png")

    # --- Scatter map of candidate stops ---
    cand_only = candidates.filter(pl.col("candidate"))
    non_cand = candidates.filter(~pl.col("candidate"))

    fig, ax = plt.subplots(figsize=(10, 10))
    ax.scatter(
        non_cand["lon"].to_list(), non_cand["lat"].to_list(),
        s=4, alpha=0.15, color="#94a3b8", label=f"Retained ({len(non_cand):,})", zorder=1,
    )
    sc = ax.scatter(
        cand_only["lon"].to_list(), cand_only["lat"].to_list(),
        s=12, c=cand_only["avg_daily_usage"].to_list(), cmap="YlOrRd_r",
        edgecolors="black", linewidths=0.3, alpha=0.8,
        label=f"Candidates ({len(cand_only):,})", zorder=2,
    )
    plt.colorbar(sc, ax=ax, label="Avg Daily Usage (ons+offs)", shrink=0.6)
    ax.set_xlabel("Longitude")
    ax.set_ylabel("Latitude")
    ax.set_title("Stop Consolidation Candidates (low usage + neighbor < 400 m)")
    ax.legend(loc="upper left", fontsize=9)
    save_chart(fig, OUT / "candidate_map.png")


@run_analysis(31, "Stop Consolidation Candidates")
def main() -> None:
    """Entry point: load data, find candidates, estimate OTP gains, chart."""

    with phase("Loading stop-level usage (pre-pandemic weekday)"):
        usage = load_stop_usage()
        print(f"  {len(usage):,} stop-route combinations")

    with phase("Loading route OTP and stop counts from DB"):
        route_otp = load_route_otp()
        print(f"  {len(route_otp)} routes with OTP data")

    with phase("Computing OTP ~ stop_count regression slope (bus-only)"):
        slope = get_otp_slope()
        print(f"  Slope = {slope:.6f} OTP per stop (i.e., {slope * 100:.3f} pp per stop)")

    with phase("Identifying consolidation candidates"):
        candidates = find_candidates(usage)
        n_low = candidates.filter(pl.col("low_usage")).shape[0]
        n_cand = candidates.filter(pl.col("candidate")).shape[0]
        print(f"  {n_low:,} low-usage stop-route pairs (< {USAGE_THRESHOLD} daily ons+offs)")
        print(f"  {n_cand:,} consolidation candidates (low usage + neighbor <= {WALK_DISTANCE_M} m)")

    with phase("Building route summary"):
        summary = route_summary(candidates, route_otp, slope)
        routes_with_cand = summary.filter(pl.col("n_candidates") > 0)
        print(f"  {len(routes_with_cand)} routes have at least one candidate")
        if len(routes_with_cand) > 0:
            total_cand = routes_with_cand["n_candidates"].sum()
            avg_gain = routes_with_cand["est_otp_gain"].mean() * 100
            max_gain = routes_with_cand["est_otp_gain"].max() * 100
            print(f"  Total candidates across routes: {total_cand}")
            print(f"  Avg estimated OTP gain: +{avg_gain:.1f} pp")
            print(f"  Max estimated OTP gain: +{max_gain:.1f} pp")

    with phase("Saving CSVs"):
        save_csv(candidates, OUT / "consolidation_candidates.csv")
        save_csv(summary, OUT / "route_consolidation_summary.csv")

    with phase("Generating charts"):
        make_charts(candidates, summary)


if __name__ == "__main__":
    main()

Sources

NameTypeWhy It MattersOwnerFreshnessCaveat
data/bus-stop-usage/wprdc_stop_data.csv file Referenced via DATA_DIR path composition in analysis script. Local project data owner not specified. Snapshot file; refresh by rerunning its pipeline step. May lag upstream source updates.
otp_monthly table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5)
  • file data/routes_by_month.csv — Monthly route OTP source table in wide format.
  • file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications.
  • file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts.
  • file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes.
  • file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
route_stops table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5)
  • file data/routes_by_month.csv — Monthly route OTP source table in wide format.
  • file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications.
  • file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts.
  • file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes.
  • file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
routes table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5)
  • file data/routes_by_month.csv — Monthly route OTP source table in wide format.
  • file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications.
  • file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts.
  • file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes.
  • file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
numpy dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.
polars dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.
scipy dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.