Analysis

10: Trip Frequency vs OTP

Route and Service Drivers

Coverage: 2019-01 to 2025-11 (from otp_monthly).

Built 2026-04-03 20:09 UTC · Commit 7c56b9a

Page Navigation

Analysis Navigation

Data Provenance

flowchart LR
  10_frequency_vs_otp(["10: Trip Frequency vs OTP"])
  t_otp_monthly[("otp_monthly")] --> 10_frequency_vs_otp
  01_data_ingestion[["Data Ingestion"]] --> t_otp_monthly
  u1_01_data_ingestion[/"data/routes_by_month.csv"/] --> 01_data_ingestion
  u2_01_data_ingestion[/"data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv"/] --> 01_data_ingestion
  u3_01_data_ingestion[/"data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv"/] --> 01_data_ingestion
  u4_01_data_ingestion[/"data/PRT_Stop_Reference_Lookup_Table.csv"/] --> 01_data_ingestion
  u5_01_data_ingestion[/"data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv"/] --> 01_data_ingestion
  t_route_stops[("route_stops")] --> 10_frequency_vs_otp
  01_data_ingestion[["Data Ingestion"]] --> t_route_stops
  t_routes[("routes")] --> 10_frequency_vs_otp
  01_data_ingestion[["Data Ingestion"]] --> t_routes
  d1_10_frequency_vs_otp(("polars (lib)")) --> 10_frequency_vs_otp
  d2_10_frequency_vs_otp(("scipy (lib)")) --> 10_frequency_vs_otp
  classDef page fill:#dbeafe,stroke:#1d4ed8,color:#1e3a8a,stroke-width:2px;
  classDef table fill:#ecfeff,stroke:#0e7490,color:#164e63;
  classDef dep fill:#fff7ed,stroke:#c2410c,color:#7c2d12,stroke-dasharray: 4 2;
  classDef file fill:#eef2ff,stroke:#6366f1,color:#3730a3;
  classDef api fill:#f0fdf4,stroke:#16a34a,color:#14532d;
  classDef pipeline fill:#f5f3ff,stroke:#7c3aed,color:#4c1d95;
  class 10_frequency_vs_otp page;
  class t_otp_monthly,t_route_stops,t_routes table;
  class d1_10_frequency_vs_otp,d2_10_frequency_vs_otp dep;
  class u1_01_data_ingestion,u2_01_data_ingestion,u3_01_data_ingestion,u4_01_data_ingestion,u5_01_data_ingestion file;
  class 01_data_ingestion pipeline;

Findings

Findings: Trip Frequency vs OTP

Summary

There is no meaningful correlation between peak weekday trip frequency and OTP. The previous finding (r = -0.39) was an artifact of using SUM(trips_wd) across stops, which conflated frequency with route length. After correcting to MAX(trips_wd) (peak frequency at any single stop), the correlation vanishes.

Key Numbers

  • All routes: Pearson r = 0.03 (p = 0.81, n = 92) -- essentially zero
  • Bus only: Pearson r = -0.06 (p = 0.55, n = 89)
  • Bus only: Spearman r = -0.11 (p = 0.29)

Methodology Note

The original analysis summed trips_wd across all stops on a route. Because trips_wd is recorded per stop, a route with 50 trips per day and 100 stops produces a sum of ~5,000, while a route with 50 trips per day and 20 stops produces ~1,000. This made the metric a proxy for frequency x stop_count rather than pure frequency. Using MAX(trips_wd) isolates the peak trip frequency at the busiest stop, which is a better measure of how often the route actually runs.

Observations

  • Running more trips per se does not degrade OTP. The previous apparent correlation was entirely driven by the confounding of frequency with route length (stop count).
  • This result is consistent with Analysis 07's finding that stop count is the real structural predictor -- once route complexity is removed from the frequency metric, the effect disappears.
  • Some of the highest-frequency routes (P1 East Busway, RAIL lines) actually have excellent OTP, because they combine high frequency with few stops and dedicated right-of-way.

Implication

PRT should not expect OTP penalties from increasing service frequency on existing routes. The capacity to run more trips does not inherently strain schedule adherence. The real lever for improving OTP is route design (stop count, right-of-way), not service volume.

Caveats

  • trips_wd in route_stops represents current weekday frequency, not historical. Frequency may have changed over the analysis period.
  • MAX(trips_wd) captures the peak stop, which for short-turn routes may overstate the frequency experienced by riders at outer stops.
  • Routes with fewer than 12 months of OTP data are excluded (1 route dropped vs prior version).
  • Three correlation tests were run (Pearson all-routes, Pearson bus-only, Spearman bus-only) without multiple-comparison correction. Since all three are non-significant (smallest p = 0.29), correction would not change any conclusion.

Review History

  • 2026-02-11: RED-TEAM-REPORTS/2026-02-11-analyses-01-05-07-11.md -- 6 issues (1 significant). Updated METHODS.md to reflect MAX(trips_wd) instead of SUM; documented all three correlation tests; added minimum-month filter (HAVING COUNT >= 12); added NULL filter for trips_wd; replaced manual regression with scipy.stats.linregress; noted multiple-test caveat.

Output

Methods

Methods: Trip Frequency vs OTP

Question

Is there a correlation between how often a route runs (trip frequency) and its on-time performance? High-frequency routes may suffer from schedule adherence issues like bunching.

Approach

  • Compute maximum weekday trips per route from route_stops (MAX(trips_wd) across all stops, used as a peak frequency proxy). Stops with trips_wd IS NULL are excluded.
  • Compute average OTP per route from otp_monthly, requiring at least 12 months of data (HAVING COUNT(*) >= 12).
  • Scatter plot of trip frequency vs average OTP, colored by mode.
  • Compute Pearson correlation (all routes), Pearson correlation (bus-only), and Spearman rank correlation (bus-only).

Data

Name Description Source
otp_monthly Monthly OTP per route prt.db table
route_stops Trip counts (trips_wd, trips_7d) prt.db table
routes Mode classification prt.db table

Output

  • output/frequency_otp.csv -- per-route frequency and OTP summary
  • output/frequency_vs_otp.png -- scatter plot with correlation

Source Code

"""Correlation analysis of weekday trip frequency versus on-time performance."""

import polars as pl

from prt_otp_analysis.common import analysis_dir, correlate_by_mode, mode_scatter, phase, query_to_polars, run_analysis, save_chart, save_csv, setup_plotting

OUT = analysis_dir(__file__)


def load_data() -> pl.DataFrame:
    """Load per-route peak trip frequency, average OTP, and mode."""
    frequency = query_to_polars("""
        SELECT route_id, MAX(trips_wd) AS max_trips_wd
        FROM route_stops
        WHERE trips_wd IS NOT NULL
        GROUP BY route_id
    """)
    avg_otp = query_to_polars("""
        SELECT o.route_id, r.route_name, r.mode,
               AVG(o.otp) AS avg_otp, COUNT(*) AS months
        FROM otp_monthly o
        JOIN routes r ON o.route_id = r.route_id
        GROUP BY o.route_id
        HAVING COUNT(*) >= 12
    """)
    return avg_otp.join(frequency, on="route_id", how="inner")


def analyze(df: pl.DataFrame) -> tuple[pl.DataFrame, dict]:
    """Compute Pearson and Spearman correlations, overall and bus-only."""
    return df, correlate_by_mode(df, "max_trips_wd", "avg_otp")


def make_chart(df: pl.DataFrame) -> None:
    """Generate scatter plot of trip frequency vs OTP with bus-only trendline."""
    plt = setup_plotting()
    fig, ax = plt.subplots(figsize=(10, 7))
    mode_scatter(ax, df, "max_trips_wd", "avg_otp")
    ax.set_xlabel("Peak Weekday Trips (max across stops)")
    ax.set_ylabel("Average OTP")
    ax.set_title("Trip Frequency vs On-Time Performance by Route")
    ax.legend(fontsize=9)
    ax.set_ylim(0, 1)
    save_chart(fig, OUT / "frequency_vs_otp.png")


@run_analysis(10, "Trip Frequency vs OTP")
def main() -> None:
    """Entry point: load data, analyze, chart, and save."""
    with phase("Loading data"):
        df = load_data()
        print(f"  {len(df)} routes with both frequency and OTP data")

    with phase("Analyzing"):
        df, results = analyze(df)
        print(f"  All routes:  Pearson r = {results['all_pearson_r']:.4f} (p = {results['all_pearson_p']:.4f})")
        print(f"  Bus only:    Pearson r = {results['bus_pearson_r']:.4f} (p = {results['bus_pearson_p']:.4f})")
        print(f"               Spearman r = {results['bus_spearman_r']:.4f} (p = {results['bus_spearman_p']:.4f})")
        print(f"               n = {results['bus_n']} bus routes")

    with phase("Saving CSV"):
        save_csv(df, OUT / "frequency_otp.csv")

    with phase("Generating chart"):
        make_chart(df)


if __name__ == "__main__":
    main()

Sources

NameTypeWhy It MattersOwnerFreshnessCaveat
otp_monthly table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5)
  • file data/routes_by_month.csv — Monthly route OTP source table in wide format.
  • file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications.
  • file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts.
  • file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes.
  • file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
route_stops table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5)
  • file data/routes_by_month.csv — Monthly route OTP source table in wide format.
  • file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications.
  • file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts.
  • file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes.
  • file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
routes table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5)
  • file data/routes_by_month.csv — Monthly route OTP source table in wide format.
  • file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications.
  • file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts.
  • file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes.
  • file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
polars dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.
scipy dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.