Analysis

21 - COVID Ridership vs OTP Recovery

Ridership and External Factors

Coverage: 2017-01 to 2025-11 (from otp_monthly, ridership_monthly).

Built 2026-04-03 20:09 UTC · Commit 7c56b9a

Page Navigation

Analysis Navigation

Data Provenance

flowchart LR
  21_covid_ridership_recovery(["21 - COVID Ridership vs OTP Recovery"])
  t_otp_monthly[("otp_monthly")] --> 21_covid_ridership_recovery
  01_data_ingestion[["Data Ingestion"]] --> t_otp_monthly
  u1_01_data_ingestion[/"data/routes_by_month.csv"/] --> 01_data_ingestion
  u2_01_data_ingestion[/"data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv"/] --> 01_data_ingestion
  u3_01_data_ingestion[/"data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv"/] --> 01_data_ingestion
  u4_01_data_ingestion[/"data/PRT_Stop_Reference_Lookup_Table.csv"/] --> 01_data_ingestion
  u5_01_data_ingestion[/"data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv"/] --> 01_data_ingestion
  t_ridership_monthly[("ridership_monthly")] --> 21_covid_ridership_recovery
  01_data_ingestion[["Data Ingestion"]] --> t_ridership_monthly
  t_routes[("routes")] --> 21_covid_ridership_recovery
  01_data_ingestion[["Data Ingestion"]] --> t_routes
  d1_21_covid_ridership_recovery(("numpy (lib)")) --> 21_covid_ridership_recovery
  d2_21_covid_ridership_recovery(("polars (lib)")) --> 21_covid_ridership_recovery
  d3_21_covid_ridership_recovery(("scipy (lib)")) --> 21_covid_ridership_recovery
  classDef page fill:#dbeafe,stroke:#1d4ed8,color:#1e3a8a,stroke-width:2px;
  classDef table fill:#ecfeff,stroke:#0e7490,color:#164e63;
  classDef dep fill:#fff7ed,stroke:#c2410c,color:#7c2d12,stroke-dasharray: 4 2;
  classDef file fill:#eef2ff,stroke:#6366f1,color:#3730a3;
  classDef api fill:#f0fdf4,stroke:#16a34a,color:#14532d;
  classDef pipeline fill:#f5f3ff,stroke:#7c3aed,color:#4c1d95;
  class 21_covid_ridership_recovery page;
  class t_otp_monthly,t_ridership_monthly,t_routes table;
  class d1_21_covid_ridership_recovery,d2_21_covid_ridership_recovery,d3_21_covid_ridership_recovery dep;
  class u1_01_data_ingestion,u2_01_data_ingestion,u3_01_data_ingestion,u4_01_data_ingestion,u5_01_data_ingestion file;
  class 01_data_ingestion pipeline;

Findings

Findings: COVID Ridership vs OTP Recovery

Summary

Zero of 93 routes have recovered to pre-COVID ridership levels (median -43%, all negative). Routes that lost more ridership tended to see better OTP recovery (r = -0.21, p = 0.047), weakly supporting the hypothesis that ridership recovery degrades OTP through crowding and longer dwell times.

Key Numbers

  • Ridership recovery: median -43.4%, mean -44.5%; 0/93 routes at or above pre-COVID
  • OTP recovery: 35/93 routes improved, 58/93 declined vs pre-COVID
  • Correlation (ridership change vs OTP change): Pearson r = -0.21, p = 0.047; Spearman r = -0.29, p = 0.005
  • Kruskal-Wallis (OTP delta by subtype): H = 2.89, p = 0.58 (not significant)
  • Pre-COVID baseline: Jan 2019 -- Feb 2020; recovery period: Jan 2023 -- Oct 2024
  • 93 routes with 6+ months in both periods

Observations

  • The scatter plot falls entirely in the left half (all ridership declines). The two populated quadrants are "both declined" (58 routes) and "riders down, OTP up" (35 routes). There are zero routes in the "both improved" or "riders up, OTP down" quadrants.
  • Flyers lost the most ridership (median -68%), consistent with the collapse of peak-hour commute demand. Despite this, flyers show the widest spread in OTP recovery (some improved substantially, others declined).
  • Busway routes had the best OTP recovery (median +3.3 pp), likely because their dedicated right-of-way insulates them from the traffic impacts that affect other routes.
  • Local bus routes (n=64) show the widest spread on both axes, reflecting the diversity of the category. Some locals improved OTP by 10 pp while others declined by 19 pp.
  • The routes that improved OTP the most (P7 +18 pp, P69 +10 pp, 41 +10 pp) all lost substantial ridership (-32% to -61%), consistent with the idea that fewer riders means shorter dwell times and better schedule adherence.

Discussion

The weak negative correlation between ridership recovery and OTP recovery is consistent with a crowding mechanism: routes that regained more riders saw their OTP degrade (or fail to improve), while routes that stayed emptier ran closer to schedule. However, the effect is weak (r = -0.21) and the correlation is borderline significant (p = 0.047), so it should be interpreted cautiously.

The most policy-relevant finding is the universal ridership collapse: every single route is still below pre-COVID levels, with a median loss of 43%. This dwarfs any OTP effects. The system is running fewer riders on roughly the same infrastructure, and OTP has still declined for most routes -- suggesting that the OTP decline is driven by operational factors (staffing, vehicle maintenance, traffic congestion) rather than demand-side crowding alone.

The absence of subtype differences (Kruskal-Wallis p = 0.58) confirms Analysis 14's finding: no route type has recovered OTP systematically better or worse than others.

Caveats

  • Regression to the mean: baseline OTP is negatively correlated with OTP delta (as found in Analysis 14 for the same OTP data), meaning routes with extreme pre-COVID OTP tend to regress toward the mean regardless of ridership changes. The headline correlation (r = -0.21, p = 0.047) may be partly artifactual due to RTM. The Spearman result (r = -0.29, p = 0.005) is more robust but the RTM caveat still applies.
  • The ridership data ends Oct 2024; more recent ridership trends are not captured.
  • "Recovery" is defined as a simple average comparison between two periods, not a trajectory analysis. A route could be on an upward trend that the period average does not fully reflect.
  • The correlation between ridership change and OTP change does not establish causation; both could be driven by a third factor (e.g., service cuts, schedule changes).
  • Flyer and busway subtypes have small sample sizes (n=17 and n=3), limiting subtype-level conclusions.
  • Ridership data is weekday only; weekend recovery patterns may differ.

Review History

Output

Methods

Methods: COVID Ridership vs OTP Recovery

Question

Did routes that recovered ridership fastest also recover OTP? Or does ridership recovery degrade OTP (e.g., through crowding and longer dwell times)?

Approach

  • Define pre-COVID baseline as Jan 2019 -- Feb 2020 for both ridership and OTP.
  • Define recovery period as Jan 2023 -- Oct 2024 (post-stabilization).
  • For each route, compute ridership recovery ratio (recovery avg / baseline avg) and OTP recovery delta (recovery avg - baseline avg).
  • Scatter plot ridership recovery vs OTP recovery, colored by mode/subtype.
  • Test correlation (Pearson and Spearman) between ridership recovery and OTP recovery.
  • Test for regression to the mean: compute correlation between baseline OTP and OTP delta. A significant negative correlation indicates RTM, which would bias the recovery correlation.
  • Stratify by route subtype (local, express, premium, rail) and test for group differences.

Data

Name Description Source
otp_monthly route_id, month, OTP prt.db table
ridership_monthly route_id, month, day_type='WEEKDAY', avg_riders prt.db table
routes route_id, mode for subtype classification prt.db table

Notes: Join on route_id and month; restrict to overlap period. Exclude routes missing from either dataset or with fewer than 6 months in either period.

Output

  • output/recovery_scatter.png -- ridership recovery ratio vs OTP delta
  • output/recovery_by_subtype.png -- box plots by route subtype
  • output/recovery_data.csv -- per-route recovery metrics

Source Code

"""Analysis 21: Compare ridership recovery trajectories with OTP recovery trajectories post-COVID."""

import numpy as np
import polars as pl
from scipy import stats

from prt_otp_analysis.common import (
    PRE_COVID_BASELINE_MONTH,
    analysis_dir,
    classify_bus_route,
    correlate,
    phase,
    query_to_polars,
    run_analysis,
    save_chart,
    save_csv,
    setup_plotting,
)

OUT = analysis_dir(__file__)

PRE_COVID_START = PRE_COVID_BASELINE_MONTH
PRE_COVID_END = "2020-02"
RECOVERY_START = "2023-01"
RECOVERY_END = "2024-10"
MIN_MONTHS = 6


def load_data() -> pl.DataFrame:
    """Compute pre-COVID and recovery-period averages for both OTP and ridership per route."""
    paired = query_to_polars("""
        SELECT o.route_id, r2.route_name, r2.mode, o.month, o.otp, r.avg_riders
        FROM otp_monthly o
        JOIN ridership_monthly r
            ON o.route_id = r.route_id AND o.month = r.month
            AND r.day_type = 'WEEKDAY'
        JOIN routes r2 ON o.route_id = r2.route_id
    """)

    # Pre-COVID baseline
    pre = (
        paired.filter(
            (pl.col("month") >= PRE_COVID_START) & (pl.col("month") <= PRE_COVID_END)
        )
        .group_by("route_id", "route_name", "mode")
        .agg(
            baseline_otp=pl.col("otp").mean(),
            baseline_riders=pl.col("avg_riders").mean(),
            baseline_months=pl.col("month").count(),
        )
        .filter(pl.col("baseline_months") >= MIN_MONTHS)
    )

    # Recovery period
    post = (
        paired.filter(
            (pl.col("month") >= RECOVERY_START) & (pl.col("month") <= RECOVERY_END)
        )
        .group_by("route_id")
        .agg(
            recovery_otp=pl.col("otp").mean(),
            recovery_riders=pl.col("avg_riders").mean(),
            recovery_months=pl.col("month").count(),
        )
        .filter(pl.col("recovery_months") >= MIN_MONTHS)
    )

    df = pre.join(post, on="route_id", how="inner")

    # Compute deltas and ratios
    df = df.with_columns(
        otp_delta=(pl.col("recovery_otp") - pl.col("baseline_otp")),
        ridership_pct_change=(
            (pl.col("recovery_riders") - pl.col("baseline_riders")) / pl.col("baseline_riders")
        ),
    )

    # Add subtype
    df = df.with_columns(
        pl.when(pl.col("mode") == "BUS")
        .then(pl.col("route_id").map_elements(classify_bus_route, return_dtype=pl.Utf8))
        .otherwise(pl.col("mode").str.to_lowercase())
        .alias("subtype")
    )

    return df.sort("route_id")


def analyze(df: pl.DataFrame) -> dict:
    """Test correlation between ridership recovery and OTP recovery."""
    results = {"n_routes": len(df)}

    otp_d = df["otp_delta"].to_numpy()
    rider_r = df["ridership_pct_change"].to_numpy()

    # Pearson + Spearman
    corr = correlate(df, "ridership_pct_change", "otp_delta")
    results["pearson_r"] = corr["pearson_r"]
    results["pearson_p"] = corr["pearson_p"]
    results["spearman_r"] = corr["spearman_r"]
    results["spearman_p"] = corr["spearman_p"]

    # Regression to the mean test: corr(baseline_otp, otp_delta)
    rtm_corr = correlate(df, "baseline_otp", "otp_delta")
    results["rtm_r"] = rtm_corr["pearson_r"]
    results["rtm_p"] = rtm_corr["pearson_p"]

    # Summary counts
    results["n_both_improved"] = int(((otp_d > 0) & (rider_r > 0)).sum())
    results["n_both_declined"] = int(((otp_d < 0) & (rider_r < 0)).sum())
    results["n_otp_up_riders_down"] = int(((otp_d > 0) & (rider_r < 0)).sum())
    results["n_otp_down_riders_up"] = int(((otp_d < 0) & (rider_r > 0)).sum())

    # Kruskal-Wallis across subtypes for OTP delta
    subtypes = sorted(df["subtype"].unique().to_list())
    groups = []
    for st in subtypes:
        vals = df.filter(pl.col("subtype") == st)["otp_delta"].to_list()
        if len(vals) >= 3:
            groups.append(vals)
    if len(groups) >= 2:
        h, p = stats.kruskal(*groups)
        results["kruskal_h"] = h
        results["kruskal_p"] = p

    return results


def make_scatter(df: pl.DataFrame, results: dict) -> None:
    """Scatter plot: ridership recovery vs OTP recovery, colored by subtype."""
    plt = setup_plotting()
    fig, ax = plt.subplots(figsize=(10, 8))

    subtype_colors = {
        "local": "#3b82f6",
        "limited": "#22c55e",
        "express": "#f59e0b",
        "busway": "#8b5cf6",
        "flyer": "#06b6d4",
        "rail": "#ef4444",
        "unknown": "#9ca3af",
    }

    for st in sorted(df["subtype"].unique().to_list()):
        sdf = df.filter(pl.col("subtype") == st)
        ax.scatter(
            sdf["ridership_pct_change"].to_list(),
            sdf["otp_delta"].to_list(),
            s=40, alpha=0.7,
            color=subtype_colors.get(st, "#9ca3af"),
            edgecolors="white", linewidths=0.5,
            label=f"{st} (n={len(sdf)})",
        )

    # Reference lines
    ax.axhline(0, color="black", linewidth=0.5, alpha=0.5)
    ax.axvline(0, color="black", linewidth=0.5, alpha=0.5)

    # Regression line
    x = df["ridership_pct_change"].to_numpy()
    y = df["otp_delta"].to_numpy()
    slope, intercept = np.polyfit(x, y, 1)
    x_line = np.array([x.min(), x.max()])
    ax.plot(x_line, slope * x_line + intercept, color="#1e40af", linewidth=1.5,
            linestyle="--", label=f"r={results['pearson_r']:.3f}, p={results['pearson_p']:.3f}")

    # Quadrant labels
    xlim = ax.get_xlim()
    ylim = ax.get_ylim()
    ax.text(xlim[1] * 0.7, ylim[1] * 0.85, f"Both improved\n(n={results['n_both_improved']})",
            ha="center", fontsize=8, color="#16a34a", alpha=0.8)
    ax.text(xlim[0] * 0.7, ylim[0] * 0.85, f"Both declined\n(n={results['n_both_declined']})",
            ha="center", fontsize=8, color="#dc2626", alpha=0.8)
    ax.text(xlim[0] * 0.7, ylim[1] * 0.85, f"Riders down,\nOTP up (n={results['n_otp_up_riders_down']})",
            ha="center", fontsize=8, color="#9ca3af", alpha=0.8)
    ax.text(xlim[1] * 0.7, ylim[0] * 0.85, f"Riders up,\nOTP down (n={results['n_otp_down_riders_up']})",
            ha="center", fontsize=8, color="#9ca3af", alpha=0.8)

    ax.set_xlabel("Ridership Change (recovery / baseline - 1)")
    ax.set_ylabel("OTP Change (recovery - baseline)")
    ax.set_title("COVID Recovery: Ridership vs OTP by Route")
    ax.legend(loc="upper left", fontsize=8)

    save_chart(fig, OUT / "recovery_scatter.png")


def make_subtype_chart(df: pl.DataFrame) -> None:
    """Side-by-side box plots: ridership recovery and OTP recovery by subtype."""
    plt = setup_plotting()
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

    subtypes = sorted(df["subtype"].unique().to_list())
    colors = ["#3b82f6", "#22c55e", "#f59e0b", "#ef4444", "#8b5cf6", "#06b6d4", "#9ca3af"]

    # OTP recovery by subtype
    otp_data = []
    labels = []
    for st in subtypes:
        vals = df.filter(pl.col("subtype") == st)["otp_delta"].to_list()
        if len(vals) >= 3:
            otp_data.append(vals)
            labels.append(f"{st}\n(n={len(vals)})")

    bp1 = ax1.boxplot(otp_data, tick_labels=labels, patch_artist=True)
    for patch, color in zip(bp1["boxes"], colors):
        patch.set_facecolor(color)
        patch.set_alpha(0.6)
    ax1.axhline(0, color="#dc2626", linewidth=1, linestyle="--", alpha=0.5)
    ax1.set_ylabel("OTP Change (recovery - baseline)")
    ax1.set_title("OTP Recovery by Subtype")

    # Ridership recovery by subtype
    rider_data = []
    labels2 = []
    for st in subtypes:
        vals = df.filter(pl.col("subtype") == st)["ridership_pct_change"].to_list()
        if len(vals) >= 3:
            rider_data.append(vals)
            labels2.append(f"{st}\n(n={len(vals)})")

    bp2 = ax2.boxplot(rider_data, tick_labels=labels2, patch_artist=True)
    for patch, color in zip(bp2["boxes"], colors):
        patch.set_facecolor(color)
        patch.set_alpha(0.6)
    ax2.axhline(0, color="#dc2626", linewidth=1, linestyle="--", alpha=0.5)
    ax2.set_ylabel("Ridership Change (recovery / baseline - 1)")
    ax2.set_title("Ridership Recovery by Subtype")

    save_chart(fig, OUT / "recovery_by_subtype.png")


@run_analysis(21, "COVID Ridership vs OTP Recovery")
def main() -> None:
    """Entry point: load, analyze, chart, and save."""

    with phase("Loading data"):
        df = load_data()
        print(f"  {len(df)} routes with both pre-COVID and recovery data")
        print(f"  Pre-COVID: {PRE_COVID_START} to {PRE_COVID_END}")
        print(f"  Recovery:  {RECOVERY_START} to {RECOVERY_END}")

    with phase("Analyzing"):
        results = analyze(df)

        print(f"\n  Quadrant breakdown:")
        print(f"    Both improved:           {results['n_both_improved']}")
        print(f"    Both declined:           {results['n_both_declined']}")
        print(f"    OTP up, riders down:     {results['n_otp_up_riders_down']}")
        print(f"    OTP down, riders up:     {results['n_otp_down_riders_up']}")

        print(f"\n  Ridership recovery vs OTP recovery:")
        print(f"    Pearson:  r = {results['pearson_r']:.4f}, p = {results['pearson_p']:.4f}")
        print(f"    Spearman: r = {results['spearman_r']:.4f}, p = {results['spearman_p']:.4f}")

        print(f"\n  Regression to the mean test (baseline OTP vs OTP delta):")
        print(f"    Pearson:  r = {results['rtm_r']:.4f}, p = {results['rtm_p']:.4f}")
        if results['rtm_p'] < 0.05:
            print("    => RTM detected: routes with extreme baseline OTP regress toward the mean.")
            print("       The ridership-OTP correlation may be partly artifactual.")

        if "kruskal_h" in results:
            print(f"\n  Kruskal-Wallis (OTP delta by subtype):")
            print(f"    H = {results['kruskal_h']:.3f}, p = {results['kruskal_p']:.4f}")

        # System-wide ridership recovery
        median_rider = df["ridership_pct_change"].median()
        mean_rider = df["ridership_pct_change"].mean()
        print(f"\n  System ridership recovery:")
        print(f"    Median: {median_rider:+.1%}")
        print(f"    Mean:   {mean_rider:+.1%}")
        n_rider_recovered = df.filter(pl.col("ridership_pct_change") >= 0).height
        print(f"    Routes at/above pre-COVID: {n_rider_recovered}/{len(df)}")

        # Extreme routes
        print("\n  Most ridership recovery with OTP decline:")
        worst = (
            df.filter((pl.col("ridership_pct_change") > 0) & (pl.col("otp_delta") < -0.05))
            .sort("otp_delta")
            .head(5)
        )
        for row in worst.iter_rows(named=True):
            print(f"    {row['route_id']:>5s} - {row['route_name']:<30s}  "
                  f"riders {row['ridership_pct_change']:+.0%}, OTP {row['otp_delta']:+.1%}")

        print("\n  Most ridership loss with OTP improvement:")
        best = (
            df.filter((pl.col("ridership_pct_change") < -0.1) & (pl.col("otp_delta") > 0))
            .sort("otp_delta", descending=True)
            .head(5)
        )
        for row in best.iter_rows(named=True):
            print(f"    {row['route_id']:>5s} - {row['route_name']:<30s}  "
                  f"riders {row['ridership_pct_change']:+.0%}, OTP {row['otp_delta']:+.1%}")

    with phase("Saving CSV"):
        save_csv(df, OUT / "recovery_data.csv")

    with phase("Generating charts"):
        make_scatter(df, results)
        make_subtype_chart(df)


if __name__ == "__main__":
    main()

Sources

NameTypeWhy It MattersOwnerFreshnessCaveat
otp_monthly table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5)
  • file data/routes_by_month.csv — Monthly route OTP source table in wide format.
  • file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications.
  • file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts.
  • file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes.
  • file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
ridership_monthly table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5)
  • file data/routes_by_month.csv — Monthly route OTP source table in wide format.
  • file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications.
  • file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts.
  • file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes.
  • file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
routes table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5)
  • file data/routes_by_month.csv — Monthly route OTP source table in wide format.
  • file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications.
  • file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts.
  • file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes.
  • file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
numpy dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.
polars dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.
scipy dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.