Analysis

22 - Passenger-Weighted Delay Burden

Ridership and External Factors

Coverage: 2017-01 to 2025-11 (from otp_monthly, ridership_monthly).

Built 2026-06-15 11:52 UTC · Commit e5cf673

Page Navigation

Analysis Navigation

Data Provenance

flowchart LR
  22_delay_burden(["22 - Passenger-Weighted Delay Burden"])
  t_otp_monthly[("otp_monthly")] --> 22_delay_burden
  01_data_ingestion[["Data Ingestion"]] --> t_otp_monthly
  u1_01_data_ingestion[/"data/routes_by_month.csv"/] --> 01_data_ingestion
  u2_01_data_ingestion[/"data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv"/] --> 01_data_ingestion
  u3_01_data_ingestion[/"data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv"/] --> 01_data_ingestion
  u4_01_data_ingestion[/"data/PRT_Stop_Reference_Lookup_Table.csv"/] --> 01_data_ingestion
  u5_01_data_ingestion[/"data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv"/] --> 01_data_ingestion
  t_ridership_monthly[("ridership_monthly")] --> 22_delay_burden
  01_data_ingestion[["Data Ingestion"]] --> t_ridership_monthly
  t_routes[("routes")] --> 22_delay_burden
  01_data_ingestion[["Data Ingestion"]] --> t_routes
  d1_22_delay_burden(("polars (lib)")) --> 22_delay_burden
  d2_22_delay_burden(("scipy (lib)")) --> 22_delay_burden
  classDef page fill:#dbeafe,stroke:#1d4ed8,color:#1e3a8a,stroke-width:2px;
  classDef table fill:#ecfeff,stroke:#0e7490,color:#164e63;
  classDef dep fill:#fff7ed,stroke:#c2410c,color:#7c2d12,stroke-dasharray: 4 2;
  classDef file fill:#eef2ff,stroke:#6366f1,color:#3730a3;
  classDef api fill:#f0fdf4,stroke:#16a34a,color:#14532d;
  classDef pipeline fill:#f5f3ff,stroke:#7c3aed,color:#4c1d95;
  class 22_delay_burden page;
  class t_otp_monthly,t_ridership_monthly,t_routes table;
  class d1_22_delay_burden,d2_22_delay_burden dep;
  class u1_01_data_ingestion,u2_01_data_ingestion,u3_01_data_ingestion,u4_01_data_ingestion,u5_01_data_ingestion file;
  class 01_data_ingestion pipeline;

Findings

Findings: Passenger-Weighted Delay Burden

Summary

Over the Jan 2019 -- Oct 2024 overlap period, PRT accumulated 55.5 million late weekday rider-trips out of 179.2 million total (31% late rate). The top 10 routes by delay burden account for 26.6% of all late rider-trips, and ridership weighting substantially reshuffles which routes appear most problematic compared to a pure OTP ranking.

Key Numbers

55.5 million cumulative late weekday rider-trips (Jan 2019 -- Oct 2024)
179.2 million total weekday rider-trips; 31.0% system late rate
Top 10 routes account for 26.6% of all late rider-trips
Rank correlation between OTP rank and burden rank: Spearman r = 0.40 (p < 0.001) -- moderate, meaning ridership significantly reshuffles priorities
93 routes with paired OTP + ridership data

Top 10 Routes by Delay Burden

Burden Rank	Route	OTP Rank	Avg OTP	Cumulative Late Rider-Trips
1	61C - McKeesport-Homestead	3	59.0%	2,765,662
2	51 - Carrick	55	68.6%	2,360,226
3	71C - Point Breeze	8	60.9%	2,241,803
4	61A - North Braddock	7	60.6%	2,135,034
5	71B - Highland Park	10	61.9%	2,086,371

Route 51 (Carrick) is the standout example: it ranks only 55th by OTP (68.6%, near the system average) but 2nd by delay burden because it carries massive ridership. Conversely, Route 77 (Penn Hills) has the worst OTP (54.9%) but ranks only 18th by burden because fewer people ride it.

Observations

Ridership dramatically reshuffles priorities. The Spearman rank correlation between OTP rank and burden rank is only 0.40 -- OTP rank alone is a poor proxy for human impact.
The biggest upward shifts (more burden than OTP suggests) are high-ridership transit routes: P1 East Busway (+77 ranks), RED light rail (+72), BLUE light rail (+65). These routes have decent OTP (80%+) but carry so many riders that even their small late fractions generate substantial burden.
The biggest downward shifts (less burden than OTP suggests) are low-ridership flyers: 65 Squirrel Hill (-70), P69 Trafford Flyer (-66), P13 Mount Royal Flyer (-51). These have poor OTP but few riders, so the total human impact is small.
The delay burden trend shows a sharp drop during COVID (1.5M to 0.3M monthly late rider-trips), then a partial recovery to ~0.8-1.0M by 2024 -- still well below pre-COVID levels primarily because ridership hasn't recovered.
Pre-COVID, the system averaged ~1.3M late rider-trips per month. Post-COVID (2023-2024), it averages ~0.8M -- a 40% reduction driven almost entirely by ridership collapse, not OTP improvement.

Discussion

This analysis reframes the OTP problem from "which routes are most unreliable" to "where does unreliability affect the most people." The two questions give different answers. A policy intervention on Route 51 (the #2 burden route) would affect more riders than fixing Route 77 (the #1 worst OTP route), even though 51's OTP is 14 pp better. Similarly, even small OTP improvements on high-ridership rail/busway routes would reduce more late rider-trips than large OTP improvements on low-ridership flyers.

The system's total delay burden has paradoxically decreased post-COVID -- not because service improved, but because fewer people are riding. If ridership recovers without OTP improvements, the burden will return to or exceed pre-COVID levels.

Caveats

"Late rider-trips" is a derived metric: avg_riders * day_count * (1 - OTP). It does not measure actual delay duration -- a trip that is 1 minute late counts the same as one that is 30 minutes late.
Ridership data is weekday only; weekend burden is not captured.
OTP is a route-level monthly average; stop-level or trip-level variation is not reflected.
The metric assumes all riders on a route experience the same OTP, which is an approximation -- riders at different stops on the same route may have different experiences.

Output

image delay_burden_trend.png
system-wide monthly late rider-trips.

image rate_vs_burden.png
scatter comparing OTP rank with delay burden rank.

image top10_burden.png
bar chart of top 10 routes by late rider-trips.

No interactive outputs declared.

data delay_burden_monthly.csv

Monthly system and route-level late rider-trip totals over the analysis period.

Preview CSV

Expand to load preview.

data delay_burden_ranking.csv

routes ranked by total late rider-trips.

Preview CSV

Expand to load preview.

Methods

Methods: Passenger-Weighted Delay Burden

Question

Which routes impose the largest total delay burden on riders, accounting for both OTP and ridership volume?

Approach

Join monthly OTP with monthly weekday ridership by route and month.
Compute monthly late rider-trips per route: avg_riders * day_count * (1 - otp) where otp is on a 0--1 scale.
Rank routes by total and average monthly late rider-trips.
Compute system-wide monthly late rider-trips and plot the trend over time.
Identify the top 10 routes by cumulative late rider-trips -- these are where interventions would affect the most people.
Compare the "worst by rate" (lowest OTP) with "worst by burden" (most late rider-trips) to show how ridership weighting shifts priorities.

Data

Name	Description	Source
`otp_monthly`	route_id, month, otp	`prt.db` table
`ridership_monthly`	route_id, month, avg_riders, day_count; filtered to day_type='WEEKDAY'	`prt.db` table
`routes`	route_id, route_name	`prt.db` table

Notes: Join on route_id and month; overlap period only (Jan 2019 -- Oct 2024).

Output

output/delay_burden_ranking.csv -- routes ranked by total late rider-trips
output/delay_burden_trend.png -- system-wide monthly late rider-trips
output/rate_vs_burden.png -- scatter comparing OTP rank with delay burden rank
output/top10_burden.png -- bar chart of top 10 routes by late rider-trips

  Source Code
    
      
      """Analysis 22: Estimate late rider-trips per route per month to identify where the most total human impact occurs."""

import polars as pl

from prt_otp_analysis.common import OTP_GOOD_THRESHOLD, OTP_WARNING_THRESHOLD, analysis_dir, correlate, phase, query_to_polars, run_analysis, save_chart, save_csv, setup_plotting, weighted_mean

OUT = analysis_dir(__file__)


def load_data() -> pl.DataFrame:
    """Load paired OTP and weekday ridership with day counts."""
    df = query_to_polars("""
        SELECT o.route_id, rt.route_name, o.month, o.otp,
               r.avg_riders, r.day_count
        FROM otp_monthly o
        JOIN ridership_monthly r
            ON o.route_id = r.route_id AND o.month = r.month
            AND r.day_type = 'WEEKDAY'
        JOIN routes rt ON o.route_id = rt.route_id
    """)

    # Compute late rider-trips: riders * weekdays * (1 - OTP)
    df = df.with_columns(
        late_rider_trips=(pl.col("avg_riders") * pl.col("day_count") * (1.0 - pl.col("otp"))),
        total_rider_trips=(pl.col("avg_riders") * pl.col("day_count")),
    )

    return df


def route_ranking(df: pl.DataFrame) -> pl.DataFrame:
    """Rank routes by cumulative and average monthly late rider-trips."""
    ranking = (
        df.group_by("route_id", "route_name")
        .agg(
            total_late=pl.col("late_rider_trips").sum(),
            avg_monthly_late=pl.col("late_rider_trips").mean(),
            total_trips=pl.col("total_rider_trips").sum(),
            avg_otp=pl.col("otp").mean(),
            avg_riders=pl.col("avg_riders").mean(),
            n_months=pl.col("month").count(),
        )
        .with_columns(
            effective_otp=(1.0 - pl.col("total_late") / pl.col("total_trips")),
        )
        .sort("total_late", descending=True)
    )

    # Add ranks
    ranking = ranking.with_row_index("burden_rank", offset=1)
    otp_rank = (
        ranking.sort("avg_otp")
        .with_row_index("otp_rank", offset=1)
        .select("route_id", "otp_rank")
    )
    ranking = ranking.join(otp_rank, on="route_id")

    return ranking


def monthly_system(df: pl.DataFrame) -> pl.DataFrame:
    """Compute system-wide monthly late rider-trips."""
    return (
        df.group_by("month")
        .agg(
            system_late=pl.col("late_rider_trips").sum(),
            system_total=pl.col("total_rider_trips").sum(),
            system_otp=weighted_mean("otp", "avg_riders"),
            route_count=pl.col("route_id").n_unique(),
        )
        .sort("month")
    )


def make_trend_chart(monthly: pl.DataFrame) -> None:
    """System-wide monthly late rider-trips trend."""
    plt = setup_plotting()
    fig, ax = plt.subplots(figsize=(14, 6))

    months = monthly["month"].to_list()
    late = [v / 1000 for v in monthly["system_late"].to_list()]  # thousands
    x = range(len(months))
    tick_positions = [i for i, m in enumerate(months) if m.endswith("-01")]
    tick_labels = [months[i][:4] for i in tick_positions]

    ax.fill_between(x, late, alpha=0.3, color="#e11d48")
    ax.plot(x, late, color="#e11d48", linewidth=1.5)

    if "2020-03" in months:
        covid_idx = months.index("2020-03")
        ax.axvline(covid_idx, color="#ef4444", linestyle=":", alpha=0.7)
        ax.text(covid_idx + 0.5, max(late) * 0.95, "COVID",
                color="#ef4444", fontsize=8, va="top")

    ax.set_ylabel("Late Rider-Trips (thousands)")
    ax.set_xlabel("Month")
    ax.set_title("System-Wide Monthly Delay Burden (Weekday Late Rider-Trips)")
    ax.set_xticks(tick_positions)
    ax.set_xticklabels(tick_labels)

    save_chart(fig, OUT / "delay_burden_trend.png")


def make_top10_chart(ranking: pl.DataFrame) -> None:
    """Horizontal bar chart of top 10 routes by cumulative late rider-trips."""
    plt = setup_plotting()
    fig, ax = plt.subplots(figsize=(10, 6))

    top10 = ranking.head(10).sort("total_late")
    labels = [f"{r['route_id']} - {r['route_name']}" for r in top10.iter_rows(named=True)]
    values = [v / 1_000_000 for v in top10["total_late"].to_list()]  # millions
    otp_vals = top10["avg_otp"].to_list()

    colors = ["#e11d48" if o < OTP_WARNING_THRESHOLD else "#f59e0b" if o < OTP_GOOD_THRESHOLD else "#3b82f6" for o in otp_vals]
    bars = ax.barh(range(len(labels)), values, color=colors, edgecolor="white", alpha=0.8)

    # Annotate with OTP
    for i, (v, o) in enumerate(zip(values, otp_vals)):
        ax.text(v + max(values) * 0.01, i, f"OTP: {o:.0%}", va="center", fontsize=8)

    ax.set_yticks(range(len(labels)))
    ax.set_yticklabels(labels, fontsize=9)
    ax.set_xlabel("Cumulative Late Rider-Trips (millions, Jan 2019 - Oct 2024)")
    ax.set_title("Top 10 Routes by Delay Burden")

    save_chart(fig, OUT / "top10_burden.png")


def make_rate_vs_burden_chart(ranking: pl.DataFrame) -> None:
    """Scatter plot comparing OTP rank with delay burden rank."""
    plt = setup_plotting()
    fig, ax = plt.subplots(figsize=(8, 8))

    n = len(ranking)
    ax.scatter(
        ranking["otp_rank"].to_list(),
        ranking["burden_rank"].to_list(),
        s=30, alpha=0.6, color="#3b82f6", edgecolors="white", linewidths=0.5,
    )

    # Diagonal (perfect agreement)
    ax.plot([1, n], [1, n], color="#9ca3af", linestyle="--", linewidth=1, label="Perfect agreement")

    # Label routes that shifted most
    ranking_with_diff = ranking.with_columns(
        rank_shift=(pl.col("otp_rank").cast(pl.Int64) - pl.col("burden_rank").cast(pl.Int64)).abs(),
    )
    outliers = ranking_with_diff.sort("rank_shift", descending=True).head(8)
    for row in outliers.iter_rows(named=True):
        ax.annotate(
            row["route_id"],
            (row["otp_rank"], row["burden_rank"]),
            fontsize=7, alpha=0.8,
            xytext=(5, 5), textcoords="offset points",
        )

    # Spearman rank correlation
    rank_corr = correlate(ranking, "otp_rank", "burden_rank")
    ax.text(0.05, 0.95, f"Spearman r = {rank_corr['spearman_r']:.3f}\np = {rank_corr['spearman_p']:.4f}",
            transform=ax.transAxes, fontsize=9, va="top",
            bbox=dict(boxstyle="round", facecolor="white", alpha=0.8))

    ax.set_xlabel("OTP Rank (1 = worst OTP)")
    ax.set_ylabel("Burden Rank (1 = most late rider-trips)")
    ax.set_title("Rate vs Burden: How Ridership Shifts Priorities")
    ax.invert_xaxis()
    ax.invert_yaxis()
    ax.legend(loc="lower right", fontsize=8)

    save_chart(fig, OUT / "rate_vs_burden.png")


@run_analysis(22, "Passenger-Weighted Delay Burden")
def main() -> None:
    """Entry point: load, compute burden, rank, chart, and save."""

    with phase("Loading data"):
        df = load_data()
        n_routes = df["route_id"].n_unique()
        print(f"  {len(df):,} route-month observations ({n_routes} routes)")

    with phase("Ranking routes by delay burden"):
        ranking = route_ranking(df)

        print("\n  Top 10 routes by cumulative late rider-trips:")
        print(f"  {'Rank':>4} {'Route':<8} {'Name':<32} {'Late trips':>12} {'Avg OTP':>8} {'OTP Rank':>9}")
        for row in ranking.head(10).iter_rows(named=True):
            print(f"  {row['burden_rank']:>4} {row['route_id']:<8} {row['route_name']:<32} "
                  f"{row['total_late']:>12,.0f} {row['avg_otp']:>8.1%} {row['otp_rank']:>9}")

        # Routes that shift most between rankings
        ranking_annotated = ranking.with_columns(
            rank_shift=(pl.col("otp_rank").cast(pl.Int64) - pl.col("burden_rank").cast(pl.Int64)),
        )

        print("\n  Biggest rank shifts (burden rank - OTP rank):")
        print("  Positive = higher burden than OTP rank suggests (high-ridership route)")
        print("  Negative = lower burden than OTP rank suggests (low-ridership route)")
        shifted = ranking_annotated.sort("rank_shift", descending=True)
        print(f"\n  {'Route':<8} {'Name':<32} {'OTP Rank':>9} {'Burden Rank':>12} {'Shift':>6}")
        for row in shifted.head(5).iter_rows(named=True):
            print(f"  {row['route_id']:<8} {row['route_name']:<32} "
                  f"{row['otp_rank']:>9} {row['burden_rank']:>12} {row['rank_shift']:>+6}")
        print("  ...")
        for row in shifted.tail(5).iter_rows(named=True):
            print(f"  {row['route_id']:<8} {row['route_name']:<32} "
                  f"{row['otp_rank']:>9} {row['burden_rank']:>12} {row['rank_shift']:>+6}")

    with phase("Computing system-wide monthly trend"):
        monthly = monthly_system(df)
        total_late = monthly["system_late"].sum()
        total_trips = monthly["system_total"].sum()
        print(f"  Total late rider-trips (all time): {total_late:,.0f}")
        print(f"  Total rider-trips (all time):      {total_trips:,.0f}")
        print(f"  System late rate:                  {total_late / total_trips:.1%}")

        # Top 10 share of system burden
        top10_late = ranking.head(10)["total_late"].sum()
        print(f"\n  Top 10 routes account for {top10_late / total_late:.1%} of all late rider-trips")

    with phase("Saving CSVs"):
        save_csv(ranking, OUT / "delay_burden_ranking.csv")
        save_csv(monthly, OUT / "delay_burden_monthly.csv")

    with phase("Generating charts"):
        make_trend_chart(monthly)
        make_top10_chart(ranking)
        make_rate_vs_burden_chart(ranking)


if __name__ == "__main__":
    main()

    

    

Sources

Name	Type	Why It Matters	Owner	Freshness	Caveat
otp_monthly	table	Primary analytical table used in this page's computations.	Produced by Data Ingestion.	Updated when the producing pipeline step is rerun.	Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5) file data/routes_by_month.csv — Monthly route OTP source table in wide format. file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications. file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts. file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes. file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
ridership_monthly	table	Primary analytical table used in this page's computations.	Produced by Data Ingestion.	Updated when the producing pipeline step is rerun.	Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5) file data/routes_by_month.csv — Monthly route OTP source table in wide format. file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications. file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts. file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes. file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
routes	table	Primary analytical table used in this page's computations.	Produced by Data Ingestion.	Updated when the producing pipeline step is rerun.	Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5) file data/routes_by_month.csv — Monthly route OTP source table in wide format. file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications. file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts. file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes. file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
polars	dependency	Runtime dependency required for this page's pipeline or analysis code.	Open-source Python ecosystem maintainers.	Version pinned by project environment until dependency updates are applied.	Library updates may change behavior or defaults.
scipy	dependency	Runtime dependency required for this page's pipeline or analysis code.	Open-source Python ecosystem maintainers.	Version pinned by project environment until dependency updates are applied.	Library updates may change behavior or defaults.