Analysis

19 - Ridership-Weighted OTP

Route and Service Drivers

Coverage: 2017-01 to 2025-11 (from otp_monthly, ridership_monthly).

Built 2026-04-03 20:09 UTC · Commit 7c56b9a

Page Navigation

Analysis Navigation

Data Provenance

flowchart LR
  19_ridership_weighted_otp(["19 - Ridership-Weighted OTP"])
  t_otp_monthly[("otp_monthly")] --> 19_ridership_weighted_otp
  01_data_ingestion[["Data Ingestion"]] --> t_otp_monthly
  u1_01_data_ingestion[/"data/routes_by_month.csv"/] --> 01_data_ingestion
  u2_01_data_ingestion[/"data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv"/] --> 01_data_ingestion
  u3_01_data_ingestion[/"data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv"/] --> 01_data_ingestion
  u4_01_data_ingestion[/"data/PRT_Stop_Reference_Lookup_Table.csv"/] --> 01_data_ingestion
  u5_01_data_ingestion[/"data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv"/] --> 01_data_ingestion
  t_ridership_monthly[("ridership_monthly")] --> 19_ridership_weighted_otp
  01_data_ingestion[["Data Ingestion"]] --> t_ridership_monthly
  t_route_stops[("route_stops")] --> 19_ridership_weighted_otp
  01_data_ingestion[["Data Ingestion"]] --> t_route_stops
  d1_19_ridership_weighted_otp(("polars (lib)")) --> 19_ridership_weighted_otp
  d2_19_ridership_weighted_otp(("scipy (lib)")) --> 19_ridership_weighted_otp
  classDef page fill:#dbeafe,stroke:#1d4ed8,color:#1e3a8a,stroke-width:2px;
  classDef table fill:#ecfeff,stroke:#0e7490,color:#164e63;
  classDef dep fill:#fff7ed,stroke:#c2410c,color:#7c2d12,stroke-dasharray: 4 2;
  classDef file fill:#eef2ff,stroke:#6366f1,color:#3730a3;
  classDef api fill:#f0fdf4,stroke:#16a34a,color:#14532d;
  classDef pipeline fill:#f5f3ff,stroke:#7c3aed,color:#4c1d95;
  class 19_ridership_weighted_otp page;
  class t_otp_monthly,t_ridership_monthly,t_route_stops table;
  class d1_19_ridership_weighted_otp,d2_19_ridership_weighted_otp dep;
  class u1_01_data_ingestion,u2_01_data_ingestion,u3_01_data_ingestion,u4_01_data_ingestion,u5_01_data_ingestion file;
  class 01_data_ingestion pipeline;

Findings

Findings: Ridership-Weighted OTP

Summary

Ridership-weighted system OTP (69.4%) runs 1.6 percentage points higher than trip-weighted OTP (67.8%), and the difference is statistically significant (paired t = -18.1, p < 0.001; Wilcoxon W = 1, p < 0.001). This means the average PRT rider experiences slightly better on-time performance than the trip-weighted system average suggests.

Key Numbers

Unweighted OTP (all routes equal): 69.9% mean, 70.3% median
Trip-weighted OTP (scheduled frequency): 67.8% mean, 67.8% median
Ridership-weighted OTP (avg daily riders): 69.4% mean, 69.2% median
Ridership vs trip-weighted gap: +1.6 pp (p < 0.001, n = 70 months)
93 routes with 12+ months of paired OTP + ridership data
Overlap period: Jan 2019 -- Oct 2024 (70 months)

Interpretation

Both weighted series fall below the unweighted average, meaning both scheduled trips and actual riders concentrate somewhat on worse-performing routes. However, trip frequency overstates how much ridership is concentrated on the worst routes. High-frequency routes tend to have many stops and poor OTP (Analysis 07), but riders don't fill those trips proportionally -- some high-frequency routes carry fewer riders per trip than expected. The result is that the average rider's experience is worse than the average route's OTP, but not as bad as the trip-weighted number implies.

This does not mean high-ridership routes have better OTP. It means ridership is distributed more evenly across the OTP spectrum than trip frequency is.

Observations

The gap between trip-weighted and ridership-weighted OTP is not constant: it was near zero during COVID (2020), widened to ~3 pp in late 2022, and has stabilized around 1--2 pp since 2023. This likely reflects post-COVID ridership redistribution.
All three series show the same overall trend: COVID spike, steady decline through late 2022, partial stabilization in 2023--2024.

Caveats

Ridership data is weekday average only; the analysis does not capture weekend rider experience.
route_stops.trips_7d is a current snapshot, not a monthly time series -- trip-weighted OTP uses the same weights for all months.
Routes missing from the ridership dataset (RLSH, SWL) are excluded from all three series for comparability, so the unweighted series here may differ slightly from Analysis 01.
The ridership CSV ends Oct 2024 while OTP data extends to Nov 2025; the analysis is restricted to the overlap period.

Output

image ridership_weighted_otp_trend.png
three-series time plot.

No interactive outputs declared.

data summary_stats.csv

mean, median, std for each weighting scheme.

Preview CSV

Expand to load preview.

data weighting_comparison.csv

monthly values for all three series.

Preview CSV

Expand to load preview.

Methods

Methods: Ridership-Weighted OTP

Question

How does the average rider's on-time experience differ from the average route's OTP? Does weighting by actual ridership instead of scheduled trip frequency change the system-wide OTP picture?

Approach

Join monthly OTP data with monthly average weekday ridership by route and month.
Compute three monthly system OTP series:
1. Unweighted: simple mean of all route OTPs (all routes equal).
2. Trip-weighted: sum(otp_i * trips_7d_i) / sum(trips_7d_i) using static scheduled trip counts from route_stops (same weight every month).
3. Ridership-weighted: sum(otp_i * avg_riders_i) / sum(avg_riders_i) using that route's average daily weekday ridership for the same month (weight varies month-to-month).
Plot all three series over time to visualize divergence.
Compute summary statistics (mean, spread) for each weighting scheme.
Test whether the ridership-weighted series is significantly different from the trip-weighted series (paired t-test or Wilcoxon).

Data

Name	Description	Source
`otp_monthly`	Route, month, OTP	`prt.db` table
`average-ridership`	Route, month_start, day_type='WEEKDAY', avg_riders	Local CSV (`data/average-ridership/`)
`route_stops`	For trip-weighted baseline (trips_wd)	`prt.db` table

Notes: Join on route code and month; restrict to overlap period (Jan 2019 -- Oct 2024). Exclude routes with fewer than 12 months of data.

Output

output/ridership_weighted_otp_trend.png -- three-series time plot
output/weighting_comparison.csv -- monthly values for all three series
output/summary_stats.csv -- mean, median, std for each weighting scheme

  Source Code
    
      
      """Analysis 19: Compare system OTP under three weighting schemes -- unweighted, trip-weighted, and ridership-weighted."""

import polars as pl
from scipy import stats

from prt_otp_analysis.common import analysis_dir, phase, query_to_polars, run_analysis, save_chart, save_csv, setup_plotting, weighted_mean

OUT = analysis_dir(__file__)

MIN_MONTHS = 12


def load_data() -> pl.DataFrame:
    """Load OTP joined with weekday ridership and trip weights, restricted to the overlap period."""
    df = query_to_polars("""
        SELECT o.route_id, o.month, o.otp,
               r.avg_riders,
               COALESCE(rs.trips_7d, 0) AS trips_7d
        FROM otp_monthly o
        JOIN ridership_monthly r
            ON o.route_id = r.route_id AND o.month = r.month
            AND r.day_type = 'WEEKDAY'
        LEFT JOIN (
            SELECT route_id, SUM(trips_7d) AS trips_7d
            FROM route_stops
            GROUP BY route_id
        ) rs ON o.route_id = rs.route_id
    """)

    # Filter to routes with at least MIN_MONTHS of paired data
    route_counts = df.group_by("route_id").agg(pl.col("month").count().alias("n"))
    keep = route_counts.filter(pl.col("n") >= MIN_MONTHS)["route_id"].to_list()
    df = df.filter(pl.col("route_id").is_in(keep))

    return df


def compute_monthly(df: pl.DataFrame) -> pl.DataFrame:
    """Compute three monthly OTP series: unweighted, trip-weighted, ridership-weighted."""
    monthly = (
        df.group_by("month")
        .agg(
            unweighted_otp=pl.col("otp").mean(),
            trip_weighted_otp=weighted_mean("otp", "trips_7d", safe=True),
            ridership_weighted_otp=weighted_mean("otp", "avg_riders"),
            route_count=pl.col("route_id").n_unique(),
            total_riders=pl.col("avg_riders").sum(),
        )
        .sort("month")
    )
    return monthly


def compute_summary(monthly: pl.DataFrame) -> pl.DataFrame:
    """Compute summary statistics for each weighting scheme."""
    rows = []
    for col in ["unweighted_otp", "trip_weighted_otp", "ridership_weighted_otp"]:
        s = monthly[col]
        rows.append({
            "weighting": col.replace("_otp", ""),
            "mean": s.mean(),
            "median": s.median(),
            "std": s.std(),
            "min": s.min(),
            "max": s.max(),
        })
    return pl.DataFrame(rows)


def statistical_test(monthly: pl.DataFrame) -> dict:
    """Test whether ridership-weighted OTP differs from trip-weighted OTP."""
    trip = monthly["trip_weighted_otp"].to_numpy()
    rider = monthly["ridership_weighted_otp"].to_numpy()

    # Paired t-test
    t_stat, t_p = stats.ttest_rel(trip, rider)

    # Wilcoxon signed-rank (non-parametric)
    w_stat, w_p = stats.wilcoxon(trip, rider)

    mean_diff = (rider - trip).mean()

    return {
        "mean_difference": mean_diff,
        "paired_t_stat": t_stat,
        "paired_t_p": t_p,
        "wilcoxon_stat": w_stat,
        "wilcoxon_p": w_p,
        "n_months": len(trip),
    }


def make_chart(monthly: pl.DataFrame) -> None:
    """Plot three OTP series over time."""
    plt = setup_plotting()

    months = monthly["month"].to_list()
    x = range(len(months))
    tick_positions = [i for i, m in enumerate(months) if m.endswith("-01")]
    tick_labels = [months[i][:4] for i in tick_positions]

    fig, ax = plt.subplots(figsize=(14, 6))

    ax.plot(x, monthly["unweighted_otp"].to_list(),
            color="#9ca3af", linewidth=1, linestyle="--", label="Unweighted (all routes equal)")
    ax.plot(x, monthly["trip_weighted_otp"].to_list(),
            color="#2563eb", linewidth=1.5, label="Trip-weighted (scheduled frequency)")
    ax.plot(x, monthly["ridership_weighted_otp"].to_list(),
            color="#e11d48", linewidth=1.5, label="Ridership-weighted (avg daily riders)")

    # Shade gap between trip-weighted and ridership-weighted
    trip = monthly["trip_weighted_otp"].to_list()
    rider = monthly["ridership_weighted_otp"].to_list()
    ax.fill_between(x, trip, rider, alpha=0.15, color="#e11d48", label="Trip vs ridership gap")

    # COVID marker
    if "2020-03" in months:
        covid_idx = months.index("2020-03")
        ax.axvline(covid_idx, color="#ef4444", linestyle=":", alpha=0.7)
        ax.text(covid_idx + 0.5, ax.get_ylim()[1] * 0.98, "COVID",
                color="#ef4444", fontsize=8, va="top")

    ax.set_ylabel("On-Time Performance")
    ax.set_xlabel("Month")
    ax.set_title("PRT System OTP: Three Weighting Schemes (2019\u20132024)")
    ax.set_xticks(tick_positions)
    ax.set_xticklabels(tick_labels)
    ax.legend(loc="lower left", fontsize=8)
    ax.set_ylim(0.5, 0.85)

    save_chart(fig, OUT / "ridership_weighted_otp_trend.png")


@run_analysis(19, "Ridership-Weighted OTP")
def main() -> None:
    """Entry point: load data, compute weighted OTP series, test, chart, and save."""
    with phase("Loading data"):
        df = load_data()
        n_routes = df["route_id"].n_unique()
        print(f"  {len(df):,} route-month observations ({n_routes} routes)")

    with phase("Computing monthly OTP series"):
        monthly = compute_monthly(df)
        print(f"  {len(monthly)} months computed")

    with phase("Computing summary statistics"):
        summary = compute_summary(monthly)
        for row in summary.iter_rows(named=True):
            print(f"  {row['weighting']:30s}  mean={row['mean']:.3%}  "
                  f"median={row['median']:.3%}  std={row['std']:.3%}")

    with phase("Statistical test (ridership-weighted vs trip-weighted)"):
        test = statistical_test(monthly)
        print(f"  Mean difference: {test['mean_difference']:+.4%}")
        print(f"  Paired t-test:   t={test['paired_t_stat']:.3f}, p={test['paired_t_p']:.4f}")
        print(f"  Wilcoxon test:   W={test['wilcoxon_stat']:.0f}, p={test['wilcoxon_p']:.4f}")
        print(f"  N months:        {test['n_months']}")

    with phase("Saving CSVs"):
        save_csv(monthly, OUT / "weighting_comparison.csv")
        save_csv(summary, OUT / "summary_stats.csv")

    with phase("Generating chart"):
        make_chart(monthly)


if __name__ == "__main__":
    main()

    

    

Sources

Name	Type	Why It Matters	Owner	Freshness	Caveat
otp_monthly	table	Primary analytical table used in this page's computations.	Produced by Data Ingestion.	Updated when the producing pipeline step is rerun.	Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5) file data/routes_by_month.csv — Monthly route OTP source table in wide format. file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications. file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts. file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes. file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
ridership_monthly	table	Primary analytical table used in this page's computations.	Produced by Data Ingestion.	Updated when the producing pipeline step is rerun.	Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5) file data/routes_by_month.csv — Monthly route OTP source table in wide format. file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications. file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts. file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes. file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
route_stops	table	Primary analytical table used in this page's computations.	Produced by Data Ingestion.	Updated when the producing pipeline step is rerun.	Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5) file data/routes_by_month.csv — Monthly route OTP source table in wide format. file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications. file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts. file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes. file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
polars	dependency	Runtime dependency required for this page's pipeline or analysis code.	Open-source Python ecosystem maintainers.	Version pinned by project environment until dependency updates are applied.	Library updates may change behavior or defaults.
scipy	dependency	Runtime dependency required for this page's pipeline or analysis code.	Open-source Python ecosystem maintainers.	Version pinned by project environment until dependency updates are applied.	Library updates may change behavior or defaults.