Analysis

14: COVID Recovery Trajectories

Route and Service Drivers

Coverage: 2019-01 to 2025-11 (from otp_monthly).

Built 2026-04-03 20:09 UTC · Commit 7c56b9a

Page Navigation

Analysis Navigation

Data Provenance

flowchart LR
  14_covid_recovery(["14: COVID Recovery Trajectories"])
  t_otp_monthly[("otp_monthly")] --> 14_covid_recovery
  01_data_ingestion[["Data Ingestion"]] --> t_otp_monthly
  u1_01_data_ingestion[/"data/routes_by_month.csv"/] --> 01_data_ingestion
  u2_01_data_ingestion[/"data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv"/] --> 01_data_ingestion
  u3_01_data_ingestion[/"data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv"/] --> 01_data_ingestion
  u4_01_data_ingestion[/"data/PRT_Stop_Reference_Lookup_Table.csv"/] --> 01_data_ingestion
  u5_01_data_ingestion[/"data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv"/] --> 01_data_ingestion
  t_route_stops[("route_stops")] --> 14_covid_recovery
  01_data_ingestion[["Data Ingestion"]] --> t_route_stops
  t_routes[("routes")] --> 14_covid_recovery
  01_data_ingestion[["Data Ingestion"]] --> t_routes
  d1_14_covid_recovery(("numpy (lib)")) --> 14_covid_recovery
  d2_14_covid_recovery(("polars (lib)")) --> 14_covid_recovery
  d3_14_covid_recovery(("scipy (lib)")) --> 14_covid_recovery
  classDef page fill:#dbeafe,stroke:#1d4ed8,color:#1e3a8a,stroke-width:2px;
  classDef table fill:#ecfeff,stroke:#0e7490,color:#164e63;
  classDef dep fill:#fff7ed,stroke:#c2410c,color:#7c2d12,stroke-dasharray: 4 2;
  classDef file fill:#eef2ff,stroke:#6366f1,color:#3730a3;
  classDef api fill:#f0fdf4,stroke:#16a34a,color:#14532d;
  classDef pipeline fill:#f5f3ff,stroke:#7c3aed,color:#4c1d95;
  class 14_covid_recovery page;
  class t_otp_monthly,t_route_stops,t_routes table;
  class d1_14_covid_recovery,d2_14_covid_recovery,d3_14_covid_recovery dep;
  class u1_01_data_ingestion,u2_01_data_ingestion,u3_01_data_ingestion,u4_01_data_ingestion,u5_01_data_ingestion file;
  class 01_data_ingestion pipeline;

Findings

Findings: COVID Recovery Trajectories

Summary

The system-wide OTP decline since COVID is unevenly distributed. Of 92 routes with data in both periods, 43 improved and 49 declined, with a median delta of -0.9 pp and a mean delta of -2.1 pp. However, a significant regression-to-the-mean effect (r = -0.25, p = 0.02) means that much of the apparent divergence between "improved" and "declined" routes is a statistical artifact: routes with extreme baselines naturally regress toward the mean.

Key Numbers

  • 92 routes with data in both pre-COVID (2019-01 to 2020-02) and current (2024-12 to 2025-11) periods
  • 43 improved, 49 declined
  • Median recovery delta: -0.9 pp
  • Mean recovery delta: -2.1 pp
  • Regression-to-the-mean: r = -0.25 (p = 0.02) -- routes with high baselines tended to decline; routes with low baselines tended to improve
  • Kruskal-Wallis test across subtypes: H = 5.5, p = 0.24 -- no significant difference between route types
  • Stop count vs recovery (bus): r = -0.11, p = 0.32 -- not significant

Most Improved / Most Declined

Most improved:

Route Baseline Current Delta
P7 - McKeesport Flyer 58.7% 75.8% +17.1 pp
G2 - West Busway 75.4% 88.4% +13.0 pp
21 - Coraopolis 67.8% 80.2% +12.4 pp

Most declined:

Route Baseline Current Delta
71B - Highland Park 63.0% 41.9% -21.1 pp
58 - Greenfield 70.4% 49.8% -20.6 pp
65 - Squirrel Hill 65.5% 46.5% -19.0 pp

Observations

  • The regression-to-the-mean test is significant (r = -0.25, p = 0.02): routes that started with below-average OTP tended to improve, and routes that started above-average tended to decline. This does not mean the recovery differences are entirely artifactual, but it means the extreme cases (P7 improving +17 pp from a 58.7% baseline, or Route 6 declining -17.8 pp from an 80.5% baseline) are partially explained by statistical regression rather than operational changes.
  • The Kruskal-Wallis test across subtypes is not significant (p = 0.24), meaning there is no statistically defensible evidence that premium routes recovered better than local routes as a group. The observation that the top improvers are flyers/busway routes may reflect cherry-picking the extremes rather than a systematic pattern.
  • That said, the most-declined routes are genuinely concentrated in Pittsburgh's eastern neighborhoods (Highland Park, Greenfield, Squirrel Hill), and these declines are large enough to be concerning regardless of the RTM effect.

Implication

The recovery picture is more nuanced than "premium routes improved, local routes declined." RTM explains a substantial fraction of the divergence. The policy-relevant finding is narrower: specific local bus routes in the eastern corridor have deteriorated badly (15-21 pp below pre-COVID levels), and this decline exceeds what RTM alone would predict.

Caveats

  • Regression to the mean is not the only explanation. Operational changes (schedule modifications, staffing shifts) may have genuinely affected some routes more than others.
  • The current period is the trailing 12 months of available data (2024-12 to 2025-11), which may not represent a stable equilibrium.
  • The pre-COVID baseline (2019-01 to 2020-02) includes months of varying performance; a longer baseline would be more stable.

Review History

Output

Methods

Methods: COVID Recovery Trajectories

Question

PRT system OTP declined from ~69% pre-COVID to ~62% currently (Analysis 01). But did all routes decline equally, or did some recover while others cratered? What route characteristics predict recovery?

Approach

  • Define pre-COVID baseline as average OTP during 2019-01 through 2020-02 (14 months before COVID disruption).
  • Define current period as the trailing 12 months of data.
  • For each route with data in both periods, compute:
    • Recovery delta = current OTP - baseline OTP (positive = improved, negative = declined).
    • Recovery ratio = current OTP / baseline OTP.
  • Exclude routes with fewer than 6 months in either period.
  • Characterize recovery by mode, bus subtype, stop count, and geographic span.
  • Identify the most-recovered and most-declined routes.
  • Test whether mode, stop count, or bus subtype predicts recovery.

Data

Name Description Source
otp_monthly Monthly OTP per route prt.db table
routes Mode classification prt.db table
route_stops Stop count per route prt.db table

Output

  • output/covid_recovery.csv -- per-route baseline, current, delta, and characteristics
  • output/recovery_distribution.png -- histogram of recovery deltas
  • output/recovery_by_mode.png -- box plot of recovery delta by mode/subtype

Source Code

"""COVID recovery analysis: which routes recovered and which didn't?"""

import polars as pl
from scipy import stats

from prt_otp_analysis.common import (
    PRE_COVID_BASELINE_MONTH,
    analysis_dir,
    classify_bus_route,
    correlate,
    phase,
    query_to_polars,
    run_analysis,
    save_chart,
    save_csv,
    setup_plotting,
)

OUT = analysis_dir(__file__)

PRE_COVID_START = PRE_COVID_BASELINE_MONTH
PRE_COVID_END = "2020-02"
MIN_MONTHS = 6


def load_data() -> tuple[pl.DataFrame, str, str]:
    """Compute pre-COVID baseline and current OTP for each route."""
    otp = query_to_polars("""
        SELECT o.route_id, r.route_name, r.mode, o.month, o.otp
        FROM otp_monthly o
        JOIN routes r ON o.route_id = r.route_id
    """)
    stop_counts = query_to_polars("""
        SELECT route_id, COUNT(DISTINCT stop_id) AS stop_count
        FROM route_stops
        GROUP BY route_id
    """)

    # Find the last 12 months in the dataset
    all_months = sorted(otp["month"].unique().to_list())
    current_months = all_months[-12:]
    current_start = current_months[0]
    current_end = current_months[-1]

    # Pre-COVID baseline
    baseline = (
        otp.filter((pl.col("month") >= PRE_COVID_START) & (pl.col("month") <= PRE_COVID_END))
        .group_by("route_id", "route_name", "mode")
        .agg(
            pl.col("otp").mean().alias("baseline_otp"),
            pl.col("otp").len().alias("baseline_months"),
        )
        .filter(pl.col("baseline_months") >= MIN_MONTHS)
    )

    # Current period
    current = (
        otp.filter(pl.col("month") >= current_start)
        .group_by("route_id")
        .agg(
            pl.col("otp").mean().alias("current_otp"),
            pl.col("otp").len().alias("current_months"),
        )
        .filter(pl.col("current_months") >= MIN_MONTHS)
    )

    df = baseline.join(current, on="route_id", how="inner")
    df = df.join(stop_counts, on="route_id", how="left")
    df = df.with_columns(
        (pl.col("current_otp") - pl.col("baseline_otp")).alias("recovery_delta"),
        (pl.col("current_otp") / pl.col("baseline_otp")).alias("recovery_ratio"),
    )
    # Add bus subtype
    df = df.with_columns(
        pl.when(pl.col("mode") == "BUS")
        .then(pl.col("route_id").map_elements(classify_bus_route, return_dtype=pl.Utf8))
        .otherwise(pl.col("mode").str.to_lowercase())
        .alias("subtype")
    )
    return df.sort("recovery_delta"), current_start, current_end


def analyze(df: pl.DataFrame) -> dict:
    """Compute summary statistics on recovery, including regression-to-the-mean test."""
    results = {}
    results["n_routes"] = len(df)
    results["median_delta"] = df["recovery_delta"].median()
    results["mean_delta"] = df["recovery_delta"].mean()
    results["n_improved"] = len(df.filter(pl.col("recovery_delta") > 0))
    results["n_declined"] = len(df.filter(pl.col("recovery_delta") < 0))

    # Correlation: stop count vs recovery delta (bus only)
    bus = df.filter(pl.col("mode") == "BUS").drop_nulls("stop_count")
    if len(bus) > 5:
        corr = correlate(bus, "stop_count", "recovery_delta")
        results["stops_recovery_r"] = corr["pearson_r"]
        results["stops_recovery_p"] = corr["pearson_p"]

    # Regression-to-the-mean test: baseline vs delta
    rtm_corr = correlate(df, "baseline_otp", "recovery_delta")
    results["rtm_r"] = rtm_corr["pearson_r"]
    results["rtm_p"] = rtm_corr["pearson_p"]

    # Kruskal-Wallis test across subtypes (non-parametric, handles unequal groups)
    subtypes = sorted(df["subtype"].unique().to_list())
    groups = []
    for st in subtypes:
        vals = df.filter(pl.col("subtype") == st)["recovery_delta"].to_list()
        if len(vals) >= 3:
            groups.append(vals)
    if len(groups) >= 2:
        h, p = stats.kruskal(*groups)
        results["kruskal_h"] = h
        results["kruskal_p"] = p

    return results


def make_charts(df: pl.DataFrame, results: dict) -> None:
    """Generate recovery distribution, subtype box plot, and RTM scatter."""
    plt = setup_plotting()

    # Recovery distribution histogram
    fig, ax = plt.subplots(figsize=(10, 6))
    deltas = df["recovery_delta"].to_list()
    ax.hist(deltas, bins=25, color="#3b82f6", edgecolor="white", alpha=0.8)
    ax.axvline(0, color="#dc2626", linewidth=1.5, linestyle="--", label="No change")
    ax.axvline(results["median_delta"], color="#16a34a", linewidth=1.5, linestyle="--",
               label=f"Median ({results['median_delta']:+.1%})")
    ax.set_xlabel("OTP Change (Current - Pre-COVID)")
    ax.set_ylabel("Number of Routes")
    ax.set_title("Distribution of OTP Recovery from Pre-COVID Baseline")
    ax.legend()
    save_chart(fig, OUT / "recovery_distribution.png")

    # Recovery by subtype box plot
    fig, ax = plt.subplots(figsize=(10, 6))
    subtypes = sorted(df["subtype"].unique().to_list())
    box_data = []
    box_labels = []
    for st in subtypes:
        vals = df.filter(pl.col("subtype") == st)["recovery_delta"].to_list()
        if len(vals) >= 3:
            box_data.append(vals)
            box_labels.append(f"{st}\n(n={len(vals)})")

    bp = ax.boxplot(box_data, tick_labels=box_labels, patch_artist=True)
    colors = ["#3b82f6", "#22c55e", "#f59e0b", "#ef4444", "#8b5cf6", "#06b6d4"]
    for patch, color in zip(bp["boxes"], colors):
        patch.set_facecolor(color)
        patch.set_alpha(0.6)
    ax.axhline(0, color="#dc2626", linewidth=1, linestyle="--", alpha=0.5)
    ax.set_ylabel("OTP Change (Current - Pre-COVID)")
    ax.set_title("COVID Recovery by Route Type")
    save_chart(fig, OUT / "recovery_by_mode.png")

    # Regression-to-the-mean scatter
    fig, ax = plt.subplots(figsize=(8, 8))
    ax.scatter(df["baseline_otp"].to_list(), df["recovery_delta"].to_list(),
               s=30, alpha=0.6, color="#3b82f6", edgecolors="white", linewidths=0.5)
    ax.axhline(0, color="#dc2626", linewidth=1, linestyle="--", alpha=0.5)
    import numpy as np
    x = np.array(df["baseline_otp"].to_list())
    y = np.array(df["recovery_delta"].to_list())
    slope, intercept = np.polyfit(x, y, 1)
    x_line = np.array([x.min(), x.max()])
    ax.plot(x_line, slope * x_line + intercept, color="#1e40af", linewidth=1.5, linestyle="--",
            label=f"r={results['rtm_r']:.3f}, p={results['rtm_p']:.3f}")
    ax.set_xlabel("Pre-COVID Baseline OTP")
    ax.set_ylabel("OTP Change (Current - Baseline)")
    ax.set_title("Regression to the Mean Test")
    ax.legend()
    save_chart(fig, OUT / "regression_to_mean.png")


@run_analysis(14, "COVID Recovery Trajectories")
def main() -> None:
    """Entry point: load data, analyze, chart, and save."""
    with phase("Loading data"):
        df, current_start, current_end = load_data()
        print(f"  {len(df)} routes with both pre-COVID and current data")
        print(f"  Pre-COVID period: {PRE_COVID_START} to {PRE_COVID_END}")
        print(f"  Current period:   {current_start} to {current_end}")

    with phase("Analyzing recovery"):
        results = analyze(df)
        print(f"  Improved: {results['n_improved']} routes")
        print(f"  Declined: {results['n_declined']} routes")
        print(f"  Median delta: {results['median_delta']:+.1%}")
        print(f"  Mean delta:   {results['mean_delta']:+.1%}")
        if "stops_recovery_r" in results:
            print(f"  Stop count vs recovery (bus): r = {results['stops_recovery_r']:.4f} "
                  f"(p = {results['stops_recovery_p']:.4f})")
        print(f"\n  Regression-to-the-mean test:")
        print(f"    Baseline vs delta: r = {results['rtm_r']:.4f} (p = {results['rtm_p']:.4f})")
        if results['rtm_p'] < 0.05:
            print(f"    WARNING: Significant RTM detected -- routes with extreme baselines regress toward the mean.")
        else:
            print(f"    No significant RTM detected.")
        if "kruskal_h" in results:
            print(f"  Kruskal-Wallis test across subtypes: H = {results['kruskal_h']:.3f}, "
                  f"p = {results['kruskal_p']:.4f}")

        # Top/bottom routes
        print("\nMost improved:")
        top = df.sort("recovery_delta", descending=True).head(5)
        for row in top.iter_rows(named=True):
            print(f"  {row['route_id']:>5s} - {row['route_name']:<30s} "
                  f"Baseline={row['baseline_otp']:.1%} -> Current={row['current_otp']:.1%} "
                  f"({row['recovery_delta']:+.1%})")

        print("\nMost declined:")
        bottom = df.sort("recovery_delta").head(5)
        for row in bottom.iter_rows(named=True):
            print(f"  {row['route_id']:>5s} - {row['route_name']:<30s} "
                  f"Baseline={row['baseline_otp']:.1%} -> Current={row['current_otp']:.1%} "
                  f"({row['recovery_delta']:+.1%})")

        save_csv(df, OUT / "covid_recovery.csv")

    with phase("Generating charts"):
        make_charts(df, results)


if __name__ == "__main__":
    main()

Sources

NameTypeWhy It MattersOwnerFreshnessCaveat
otp_monthly table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5)
  • file data/routes_by_month.csv — Monthly route OTP source table in wide format.
  • file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications.
  • file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts.
  • file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes.
  • file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
route_stops table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5)
  • file data/routes_by_month.csv — Monthly route OTP source table in wide format.
  • file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications.
  • file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts.
  • file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes.
  • file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
routes table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5)
  • file data/routes_by_month.csv — Monthly route OTP source table in wide format.
  • file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications.
  • file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts.
  • file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes.
  • file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
numpy dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.
polars dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.
scipy dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.