Analysis

30 - Service Level vs OTP Longitudinal

Equity and Strategic Planning

Coverage: 2016-11 to 2025-11 (from otp_monthly, scheduled_trips_monthly).

Built 2026-04-03 20:09 UTC · Commit 7c56b9a

Page Navigation

Analysis Navigation

Data Provenance

flowchart LR
  30_service_level_otp_longitudinal(["30 - Service Level vs OTP Longitudinal"])
  t_otp_monthly[("otp_monthly")] --> 30_service_level_otp_longitudinal
  01_data_ingestion[["Data Ingestion"]] --> t_otp_monthly
  u1_01_data_ingestion[/"data/routes_by_month.csv"/] --> 01_data_ingestion
  u2_01_data_ingestion[/"data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv"/] --> 01_data_ingestion
  u3_01_data_ingestion[/"data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv"/] --> 01_data_ingestion
  u4_01_data_ingestion[/"data/PRT_Stop_Reference_Lookup_Table.csv"/] --> 01_data_ingestion
  u5_01_data_ingestion[/"data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv"/] --> 01_data_ingestion
  t_routes[("routes")] --> 30_service_level_otp_longitudinal
  01_data_ingestion[["Data Ingestion"]] --> t_routes
  t_scheduled_trips_monthly[("scheduled_trips_monthly")] --> 30_service_level_otp_longitudinal
  02_scheduled_trips[["Scheduled Trips ETL"]] --> t_scheduled_trips_monthly
  u1_02_scheduled_trips[/"data/wprdc-schedule/schedule_monthly_agg.csv"/] --> 02_scheduled_trips
  u2_02_scheduled_trips[/"data/wprdc-schedule/paac_pick_lookup.csv"/] --> 02_scheduled_trips
  u3_02_scheduled_trips{"WPRDC Schedule Monthly Aggregate"} --> 02_scheduled_trips
  u4_02_scheduled_trips{"WPRDC Pick Lookup"} --> 02_scheduled_trips
  d1_30_service_level_otp_longitudinal(("numpy (lib)")) --> 30_service_level_otp_longitudinal
  d2_30_service_level_otp_longitudinal(("polars (lib)")) --> 30_service_level_otp_longitudinal
  d3_30_service_level_otp_longitudinal(("scipy (lib)")) --> 30_service_level_otp_longitudinal
  classDef page fill:#dbeafe,stroke:#1d4ed8,color:#1e3a8a,stroke-width:2px;
  classDef table fill:#ecfeff,stroke:#0e7490,color:#164e63;
  classDef dep fill:#fff7ed,stroke:#c2410c,color:#7c2d12,stroke-dasharray: 4 2;
  classDef file fill:#eef2ff,stroke:#6366f1,color:#3730a3;
  classDef api fill:#f0fdf4,stroke:#16a34a,color:#14532d;
  classDef pipeline fill:#f5f3ff,stroke:#7c3aed,color:#4c1d95;
  class 30_service_level_otp_longitudinal page;
  class t_otp_monthly,t_routes,t_scheduled_trips_monthly table;
  class d1_30_service_level_otp_longitudinal,d2_30_service_level_otp_longitudinal,d3_30_service_level_otp_longitudinal dep;
  class u1_01_data_ingestion,u1_02_scheduled_trips,u2_01_data_ingestion,u2_02_scheduled_trips,u3_01_data_ingestion,u4_01_data_ingestion,u5_01_data_ingestion file;
  class u3_02_scheduled_trips,u4_02_scheduled_trips api;
  class 01_data_ingestion,02_scheduled_trips pipeline;

Findings

Findings: Service Level vs OTP Longitudinal

Summary

Within-route month-over-month changes in scheduled trip frequency have no significant relationship with OTP changes after detrending. This null result holds across all routes, bus-only, and in both pre- and post-COVID subperiods.

Key Numbers

  • 2,374 delta observations from 93 routes over 27 months (Jan 2019 -- Mar 2021)
  • All routes: slope=0.00002, Pearson r=0.018 (p=0.39), Spearman rho=-0.030 (p=0.15)
  • Bus only (n=2,288): slope=0.00002, Pearson r=0.023 (p=0.26), Spearman rho=-0.031 (p=0.13)
  • Pre-COVID (n=1,195): slope=-0.004, r=-0.052 (p=0.07) -- marginally negative
  • Post-COVID (n=1,179): slope=0.00002, r=0.030 (p=0.31) -- null

Observations

  • The overall effect is essentially zero: a change in daily trip count explains less than 0.1% of the variance in detrended OTP changes.
  • This confirms Analysis 10's cross-sectional null finding with a stronger longitudinal design that controls for all time-invariant route characteristics (geography, length, mode).
  • The pre-COVID period shows a marginally significant negative slope (r=-0.052, p=0.07), suggesting that adding trips slightly degraded OTP when the system was running near capacity. However, this is borderline and does not survive Bonferroni correction for multiple comparisons.
  • The post-COVID period shows no relationship at all, consistent with reduced ridership creating slack in the system.
  • The Spearman correlations are consistently negative but not significant, hinting that the relationship (if any) is slightly negative -- more service slightly worsens on-time performance -- but the effect is too small to detect reliably in this sample.

Discussion

This null result is the most methodologically rigorous frequency-OTP test in the project. Analysis 10 was cross-sectional (comparing different routes at one point in time) and confounded by route characteristics -- long routes have both more trips and worse OTP for structural reasons. This analysis controls for all time-invariant route features by using within-route changes over time, and detrends to remove system-wide shocks. The result is unambiguous: trip frequency changes do not predict OTP changes within routes.

The marginally negative pre-COVID slope (r=-0.052, p=0.07) is the one signal worth noting. Before COVID, the system was operating near capacity, and adding trips to an already-constrained route may have slightly degraded schedule adherence -- each additional trip competes for the same road space, layover time, and driver availability. After COVID reduced demand, this constraint relaxed and the effect disappeared. This is consistent with a capacity-constrained model where frequency only matters at the margin when the system is near saturation, and is irrelevant otherwise.

The policy implication reinforces what emerged across Analyses 10, 26, and 29: scheduled service frequency is not a lever for OTP improvement. Routes don't get more on-time by running fewer trips, and they don't get less on-time by running more. The ~50% of OTP variance explained by the multivariate model (Analysis 18) comes from structural features (stop count, route length, mode), and the remaining ~50% likely reflects operational factors (schedule padding, driver availability, real-time traffic variability) that are orthogonal to how many trips are scheduled.

The 27-month window is a genuine limitation. A longer panel with more schedule variation -- particularly one that captures the post-2021 period when PRT may have restructured service -- could provide more statistical power. Extending the WPRDC data or obtaining historical GTFS archives would strengthen this null finding or reveal effects that 27 months cannot detect.

Caveats

  • The panel covers only 27 months, limiting statistical power for detecting small effects.
  • Month-over-month trip changes are often zero (same pick period), reducing the effective variation in the independent variable.
  • OTP is measured monthly while schedule changes can occur mid-month, introducing measurement noise.
  • System-wide detrending removes the COVID shock but may also remove genuine correlated service changes (e.g., if all routes cut service and all improved OTP simultaneously).

Output

Methods

Methods: Service Level vs OTP Longitudinal

Question

Within the same route over time, does increasing or decreasing scheduled trip frequency predict OTP changes? This is a within-route panel design that controls for all time-invariant route characteristics (length, geography, mode).

Approach

  • Join scheduled_trips_monthly (WEEKDAY) with otp_monthly on (route_id, month) for the 27-month overlap (Jan 2019 -- Mar 2021).
  • Compute month-over-month changes within each route: delta_trips and delta_otp.
  • Remove the system-wide trend by subtracting the monthly mean delta_otp across all routes (detrending), so we isolate route-specific effects.
  • Estimate a fixed-effects panel regression: detrended delta_otp ~ delta_trips, with route fixed effects (route demeaning).
  • Also compute Pearson and Spearman correlations between delta_trips and detrended delta_otp.
  • Stratify by mode (bus-only) and by period (pre-COVID vs post-COVID onset).
  • Scatter plot: delta_trips vs detrended delta_otp, with regression line and confidence band.

Data

Name Description Source
scheduled_trips_monthly WEEKDAY daily trip counts per route per month (Jan 2019 -- Mar 2021) prt.db table
otp_monthly Monthly OTP per route prt.db table
routes Mode classification for bus-only stratification prt.db table

Output

  • output/service_level_panel.csv -- route-month panel with delta_trips, delta_otp, detrended delta_otp
  • output/service_level_scatter.png -- scatter of delta_trips vs detrended delta_otp with regression line
  • output/service_level_summary.csv -- regression and correlation results

Source Code

"""Analysis 30: Within-route panel -- does changing trip frequency predict OTP changes?"""

import numpy as np
import polars as pl
from scipy import stats

from prt_otp_analysis.common import analysis_dir, correlate, phase, query_to_polars, run_analysis, save_chart, save_csv, setup_plotting

OUT = analysis_dir(__file__)


def load_panel() -> pl.DataFrame:
    """Load the route-month panel: OTP joined with scheduled trips for the overlap period."""
    return query_to_polars("""
        SELECT o.route_id, o.month, o.otp,
               st.daily_trips,
               r.mode
        FROM otp_monthly o
        INNER JOIN scheduled_trips_monthly st
            ON o.route_id = st.route_id
            AND o.month = st.month
            AND st.day_type = 'WEEKDAY'
        LEFT JOIN routes r ON o.route_id = r.route_id
        ORDER BY o.route_id, o.month
    """)


def compute_deltas(df: pl.DataFrame) -> pl.DataFrame:
    """Compute month-over-month changes within each route."""
    df = df.sort(["route_id", "month"])

    df = df.with_columns(
        prev_otp=pl.col("otp").shift(1).over("route_id"),
        prev_trips=pl.col("daily_trips").shift(1).over("route_id"),
        prev_month=pl.col("month").shift(1).over("route_id"),
    )

    # Only keep rows where previous month exists (within same route)
    df = df.filter(pl.col("prev_otp").is_not_null())

    df = df.with_columns(
        delta_otp=pl.col("otp") - pl.col("prev_otp"),
        delta_trips=pl.col("daily_trips") - pl.col("prev_trips"),
    )

    return df


def detrend(df: pl.DataFrame) -> pl.DataFrame:
    """Remove system-wide monthly trend from delta_otp."""
    monthly_mean = (
        df.group_by("month")
        .agg(system_delta_otp=pl.col("delta_otp").mean())
    )
    df = df.join(monthly_mean, on="month")
    df = df.with_columns(
        detrended_delta_otp=pl.col("delta_otp") - pl.col("system_delta_otp"),
    )
    return df


def run_regression(x: list[float], y: list[float]) -> dict:
    """OLS regression of y on x, returning slope, intercept, r, p, se."""
    result = stats.linregress(x, y)
    return {
        "slope": result.slope,
        "intercept": result.intercept,
        "r": result.rvalue,
        "p": result.pvalue,
        "se": result.stderr,
        "n": len(x),
    }


def make_chart(df: pl.DataFrame, reg: dict) -> None:
    """Scatter: delta_trips vs detrended delta_otp with regression line."""
    plt = setup_plotting()
    fig, ax = plt.subplots(figsize=(10, 7))

    x = df["delta_trips"].to_numpy()
    y = df["detrended_delta_otp"].to_numpy()

    # Color by pre/post COVID
    pre_covid = df.filter(pl.col("month") < "2020-03")
    post_covid = df.filter(pl.col("month") >= "2020-03")

    ax.scatter(
        pre_covid["delta_trips"].to_list(),
        pre_covid["detrended_delta_otp"].to_list(),
        alpha=0.25, s=15, color="#2563eb", label=f"Pre-COVID (n={len(pre_covid)})",
    )
    ax.scatter(
        post_covid["delta_trips"].to_list(),
        post_covid["detrended_delta_otp"].to_list(),
        alpha=0.25, s=15, color="#e11d48", label=f"Post-COVID (n={len(post_covid)})",
    )

    # Regression line
    x_line = np.linspace(x.min(), x.max(), 100)
    y_line = reg["slope"] * x_line + reg["intercept"]
    ax.plot(x_line, y_line, color="black", linewidth=1.5,
            label=f"OLS: slope={reg['slope']:.5f}, p={reg['p']:.3f}")

    ax.axhline(0, color="gray", linewidth=0.5)
    ax.axvline(0, color="gray", linewidth=0.5)
    ax.set_xlabel("Month-over-Month Change in Daily Trips")
    ax.set_ylabel("Detrended Month-over-Month Change in OTP")
    ax.set_title("Service Level vs OTP: Within-Route Longitudinal Panel")
    ax.legend(fontsize=9)

    save_chart(fig, OUT / "service_level_scatter.png")


@run_analysis(30, "Service Level vs OTP Longitudinal")
def main() -> None:
    """Entry point: build panel, compute deltas, regress, and chart."""

    with phase("Loading panel data"):
        panel = load_panel()
        print(f"  {len(panel):,} route-month observations")
        print(f"  {panel['route_id'].n_unique()} routes, months: {panel['month'].min()} to {panel['month'].max()}")

    with phase("Computing month-over-month deltas"):
        df = compute_deltas(panel)
        print(f"  {len(df):,} delta observations (after dropping first month per route)")

    with phase("Detrending (removing system-wide monthly mean delta)"):
        df = detrend(df)

        # Save panel
        panel_df = df.select(
            "route_id", "month", "mode", "otp", "daily_trips",
            "delta_otp", "delta_trips", "detrended_delta_otp",
        )
        save_csv(panel_df, OUT / "service_level_panel.csv")

    with phase("Analyzing"):
        # --- All routes ---
        print("\n--- All routes ---")
        x_all = df["delta_trips"].to_list()
        y_all = df["detrended_delta_otp"].to_list()

        reg_all = run_regression(x_all, y_all)
        print(f"  OLS: slope={reg_all['slope']:.5f} (SE={reg_all['se']:.5f}), r={reg_all['r']:.3f}, p={reg_all['p']:.4f}, n={reg_all['n']}")

        corr_all = correlate(df, "delta_trips", "detrended_delta_otp")
        r_s, p_s = corr_all["spearman_r"], corr_all["spearman_p"]
        print(f"  Spearman: rho={r_s:.3f}, p={p_s:.4f}")

        # --- Bus only ---
        bus = df.filter(pl.col("mode") == "BUS")
        print(f"\n--- Bus only (n={len(bus)}) ---")
        x_bus = bus["delta_trips"].to_list()
        y_bus = bus["detrended_delta_otp"].to_list()

        reg_bus = run_regression(x_bus, y_bus)
        print(f"  OLS: slope={reg_bus['slope']:.5f} (SE={reg_bus['se']:.5f}), r={reg_bus['r']:.3f}, p={reg_bus['p']:.4f}, n={reg_bus['n']}")

        corr_bus = correlate(bus, "delta_trips", "detrended_delta_otp")
        r_sb, p_sb = corr_bus["spearman_r"], corr_bus["spearman_p"]
        print(f"  Spearman: rho={r_sb:.3f}, p={p_sb:.4f}")

        # --- Pre vs post COVID ---
        pre = df.filter(pl.col("month") < "2020-03")
        post = df.filter(pl.col("month") >= "2020-03")
        print(f"\n--- Pre-COVID (n={len(pre)}) ---")
        if len(pre) > 10:
            reg_pre = run_regression(pre["delta_trips"].to_list(), pre["detrended_delta_otp"].to_list())
            print(f"  OLS: slope={reg_pre['slope']:.5f}, r={reg_pre['r']:.3f}, p={reg_pre['p']:.4f}")

        print(f"\n--- Post-COVID (n={len(post)}) ---")
        if len(post) > 10:
            reg_post = run_regression(post["delta_trips"].to_list(), post["detrended_delta_otp"].to_list())
            print(f"  OLS: slope={reg_post['slope']:.5f}, r={reg_post['r']:.3f}, p={reg_post['p']:.4f}")

        # Summary CSV
        summary = pl.DataFrame([
            {"group": "all", "n": reg_all["n"], "slope": reg_all["slope"], "se": reg_all["se"],
             "pearson_r": reg_all["r"], "pearson_p": reg_all["p"], "spearman_rho": r_s, "spearman_p": p_s},
            {"group": "bus_only", "n": reg_bus["n"], "slope": reg_bus["slope"], "se": reg_bus["se"],
             "pearson_r": reg_bus["r"], "pearson_p": reg_bus["p"], "spearman_rho": r_sb, "spearman_p": p_sb},
        ])
        save_csv(summary, OUT / "service_level_summary.csv")

    with phase("Generating chart"):
        make_chart(df, reg_all)


if __name__ == "__main__":
    main()

Sources

NameTypeWhy It MattersOwnerFreshnessCaveat
otp_monthly table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5)
  • file data/routes_by_month.csv — Monthly route OTP source table in wide format.
  • file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications.
  • file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts.
  • file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes.
  • file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
routes table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5)
  • file data/routes_by_month.csv — Monthly route OTP source table in wide format.
  • file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications.
  • file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts.
  • file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes.
  • file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
scheduled_trips_monthly table Primary analytical table used in this page's computations. Produced by Scheduled Trips ETL. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (4)
  • file data/wprdc-schedule/schedule_monthly_agg.csv — Monthly route/day-type schedule aggregates (cached copy when available).
  • file data/wprdc-schedule/paac_pick_lookup.csv — Pick period lookup metadata (cached copy when available).
  • api WPRDC Schedule Monthly Aggregate — Public dataset of route-level monthly schedule aggregates.
  • api WPRDC Pick Lookup — Public lookup for pick period date ranges.
numpy dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.
polars dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.
scipy dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.