Analysis

06: Seasonal Patterns

Core OTP Patterns

Coverage: 2019-01 to 2025-11 (from otp_monthly).

Built 2026-04-03 20:09 UTC · Commit 7c56b9a

Page Navigation

Analysis Navigation

Data Provenance

flowchart LR
  06_seasonal_patterns(["06: Seasonal Patterns"])
  t_otp_monthly[("otp_monthly")] --> 06_seasonal_patterns
  01_data_ingestion[["Data Ingestion"]] --> t_otp_monthly
  u1_01_data_ingestion[/"data/routes_by_month.csv"/] --> 01_data_ingestion
  u2_01_data_ingestion[/"data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv"/] --> 01_data_ingestion
  u3_01_data_ingestion[/"data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv"/] --> 01_data_ingestion
  u4_01_data_ingestion[/"data/PRT_Stop_Reference_Lookup_Table.csv"/] --> 01_data_ingestion
  u5_01_data_ingestion[/"data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv"/] --> 01_data_ingestion
  t_route_stops[("route_stops")] --> 06_seasonal_patterns
  01_data_ingestion[["Data Ingestion"]] --> t_route_stops
  t_routes[("routes")] --> 06_seasonal_patterns
  01_data_ingestion[["Data Ingestion"]] --> t_routes
  d1_06_seasonal_patterns(("numpy (lib)")) --> 06_seasonal_patterns
  d2_06_seasonal_patterns(("polars (lib)")) --> 06_seasonal_patterns
  classDef page fill:#dbeafe,stroke:#1d4ed8,color:#1e3a8a,stroke-width:2px;
  classDef table fill:#ecfeff,stroke:#0e7490,color:#164e63;
  classDef dep fill:#fff7ed,stroke:#c2410c,color:#7c2d12,stroke-dasharray: 4 2;
  classDef file fill:#eef2ff,stroke:#6366f1,color:#3730a3;
  classDef api fill:#f0fdf4,stroke:#16a34a,color:#14532d;
  classDef pipeline fill:#f5f3ff,stroke:#7c3aed,color:#4c1d95;
  class 06_seasonal_patterns page;
  class t_otp_monthly,t_route_stops,t_routes table;
  class d1_06_seasonal_patterns,d2_06_seasonal_patterns dep;
  class u1_01_data_ingestion,u2_01_data_ingestion,u3_01_data_ingestion,u4_01_data_ingestion,u5_01_data_ingestion file;
  class 01_data_ingestion pipeline;

Findings

Findings: Seasonal Patterns

Summary

PRT shows a consistent seasonal cycle with winter months outperforming summer/fall. The system-wide seasonal swing is about 6.8 percentage points after detrending. 93 routes had sufficient data (3+ years) for route-level seasonal analysis.

System-Wide Seasonal Profile

Computed from a balanced panel of 93 routes (present in all 12 months-of-year) over complete calendar years (2019--2024).

Month Weighted OTP Deviation from Trend
January 70.8% +2.8 pp (Best)
February 68.9% +1.0 pp
March 69.9% +2.1 pp
April 68.1% +0.3 pp
May 67.4% -0.4 pp
June 66.7% -1.1 pp
July 68.1% -0.2 pp
August 67.2% -1.1 pp
September 64.4% -3.9 pp (Worst)
October 66.9% -1.3 pp
November 68.4% +0.2 pp
December 69.6% +1.4 pp

Methodology Note

Seasonal profiles are computed by first removing a 12-month centered rolling mean (trend), then averaging the detrended residuals by month-of-year. This ensures that long-term trends (e.g., the COVID dip) do not distort the seasonal pattern. Route-level analysis requires at least 3 years of data to ensure each month-of-year is represented multiple times.

Red-team corrections applied (2026-02-10):

  • Restricted to complete calendar years (2019-01 through 2024-12) to ensure every month-of-year has equal year coverage. Previously, December was missing 2025 data (the worst-performing year), inflating its seasonal average.
  • Used a balanced panel of routes present in all 12 months-of-year for the system-wide profile. This excluded 5 routes: 3 winter-only routes with very high OTP (37 Castle Shannon 81%, 42 Potomac 83%, RLSH Red Line Shuttle 98%) and 2 with other gaps (53 Homestead Park, SWL). Their inclusion previously inflated winter averages.
  • Note: trips_7d weighting is a static snapshot and may not reflect historical service levels. The seasonal pattern holds in both weighted and unweighted averages, so the core finding is robust to weighting choice.

Most Seasonally Affected Routes (detrended)

Route Seasonal Amplitude
15 - Charles 15.8%
O5 - Thompson Run Flyer via 279 15.2%
P2 - East Busway Short 14.8%

Observations

  • The winter advantage is somewhat counterintuitive -- one might expect snow and ice to worsen OTP. Possible explanations: lower ridership reduces dwell time; fewer construction detours; school breaks reduce congestion.
  • September is consistently the worst month (-3.9 pp from trend), possibly due to the return of school-year traffic and late-summer construction.
  • After applying the balanced panel filter and complete-years restriction, the seasonal pattern persists with minimal change in magnitude, confirming it is a genuine operational pattern rather than a data artifact.
  • Most routes have a seasonal amplitude under 15%. The heatmap of the top 20 routes shows no single month dominates as "worst" across all routes -- the pattern varies, though fall months are generally weaker.

Caveats

  • Seasonal decomposition uses a centered moving-average method to remove trend, not a formal STL decomposition. Results are directionally correct but assume additive seasonality.
  • Only 6 years of complete data (2019--2024) means each month-of-year average is based on ~6 detrended observations, limiting statistical power.
  • The trips_7d weighting reflects a single point-in-time snapshot of service frequency, not historical values. Route frequencies likely changed over the study period (especially during COVID).

Review History

Output

Methods

Methods: Seasonal Patterns

Question

Do PRT routes exhibit consistent seasonal OTP patterns? Are summer or winter months systematically better or worse?

Approach

  • Restrict to complete calendar years (2019--2024) to ensure every month-of-year has equal year coverage.
  • Use a balanced panel of routes present in all 12 months-of-year for the system-wide profile, preventing compositional bias from seasonal or short-lived routes.
  • Compute a system-wide seasonal profile by averaging across balanced-panel routes (trip-weighted).
  • Measure seasonal amplitude: max(month-of-year avg) - min(month-of-year avg) per route.
  • Decompose into trend + seasonal + residual using a moving-average approach:
    1. Trend = 12-month centered rolling mean
    2. Seasonal = month-of-year mean of (OTP - trend)
    3. Residual = OTP - trend - seasonal
  • Rank routes by seasonal amplitude to identify those most affected by seasons.
  • Route-level analysis requires at least 3 years of data (36 months) for reliable seasonal estimates.

Data

Name Description Source
otp_monthly Monthly OTP per route prt.db table
route_stops Trip counts for weighting prt.db table
routes Route metadata prt.db table

Output

  • output/seasonal_patterns.csv -- month-of-year seasonal profile per route
  • output/seasonal_patterns.png -- seasonal decomposition charts

Source Code

"""Seasonal decomposition of OTP into trend, seasonal, and residual components."""

import polars as pl

from prt_otp_analysis.common import analysis_dir, phase, query_to_polars, run_analysis, save_chart, save_csv, setup_plotting, weighted_mean

OUT = analysis_dir(__file__)

# OTP color-scale bounds for the seasonal heatmap.
OTP_HEATMAP_VMIN = 0.3
OTP_HEATMAP_VMAX = 1.0

MONTH_LABELS = ["Jan", "Feb", "Mar", "Apr", "May", "Jun",
                "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
MIN_YEARS = 3  # minimum years of data for route-level seasonal amplitude
# Restrict to complete calendar years so every month-of-year has the same
# number of years of data (avoids bias from partial years at the boundaries).
COMPLETE_YEAR_START = "2019-01"
COMPLETE_YEAR_END = "2024-12"


def load_data() -> pl.DataFrame:
    """Load OTP data with trip weights, restricted to complete calendar years."""
    return query_to_polars(f"""
        SELECT o.route_id, o.month, o.otp, r.route_name, r.mode,
               COALESCE(rs_agg.trips_7d, 0) AS trips_7d
        FROM otp_monthly o
        JOIN routes r ON o.route_id = r.route_id
        LEFT JOIN (
            SELECT route_id, SUM(trips_7d) AS trips_7d
            FROM route_stops
            GROUP BY route_id
        ) rs_agg ON o.route_id = rs_agg.route_id
        WHERE o.month >= '{COMPLETE_YEAR_START}' AND o.month <= '{COMPLETE_YEAR_END}'
    """)


def analyze(df: pl.DataFrame) -> tuple[pl.DataFrame, pl.DataFrame, pl.DataFrame]:
    """Compute detrended seasonal profiles and amplitudes."""
    # Extract month-of-year
    df = df.with_columns(
        month_num=pl.col("month").str.slice(5, 2).cast(pl.Int32),
    )

    # --- Balanced panel: only routes present in all 12 months-of-year ---
    # This prevents compositional bias (e.g., winter-only routes inflating
    # winter averages).
    months_per_route = (
        df.group_by("route_id")
        .agg(n_months_of_year=pl.col("month_num").n_unique())
    )
    balanced_routes = months_per_route.filter(
        pl.col("n_months_of_year") == 12
    )["route_id"].to_list()
    n_total = df["route_id"].n_unique()
    n_balanced = len(balanced_routes)
    print(f"  Balanced panel: {n_balanced} of {n_total} routes present in all 12 months-of-year")

    df_balanced = df.filter(pl.col("route_id").is_in(balanced_routes))

    # --- System-wide seasonal profile (trip-weighted, detrended) ---
    # Uses only balanced-panel routes so route composition is constant across months.
    # Note: trips_7d is a static snapshot and may not reflect historical service levels.
    # Step 1: trip-weighted system OTP per month
    system_monthly = (
        df_balanced.group_by("month")
        .agg(
            weighted_otp=weighted_mean("otp", "trips_7d", safe=True),
        )
        .sort("month")
    )

    # Step 2: 12-month centered rolling mean as trend (approximate center via shift)
    system_monthly = system_monthly.with_columns(
        trend=pl.col("weighted_otp").rolling_mean(window_size=12, min_samples=6).shift(-6),
        month_num=pl.col("month").str.slice(5, 2).cast(pl.Int32),
    )

    # Step 3: detrend and average by month-of-year
    system_monthly = system_monthly.with_columns(
        detrended=pl.col("weighted_otp") - pl.col("trend"),
    )
    system_seasonal = (
        system_monthly.filter(pl.col("detrended").is_not_null())
        .group_by("month_num")
        .agg(
            deviation=pl.col("detrended").mean(),
            weighted_otp=pl.col("weighted_otp").mean(),
        )
        .sort("month_num")
    )

    # --- Per-route: detrend, then compute seasonal amplitude ---
    min_months = MIN_YEARS * 12
    route_counts = df.group_by("route_id").agg(n_months=pl.col("month").n_unique())
    eligible = route_counts.filter(pl.col("n_months") >= min_months)["route_id"].to_list()
    df_elig = df.filter(pl.col("route_id").is_in(eligible)).sort(["route_id", "month"])

    # Per-route trend and detrending
    df_elig = df_elig.with_columns(
        trend=pl.col("otp")
        .rolling_mean(window_size=12, min_samples=6)
        .shift(-6)
        .over("route_id"),
    )
    df_elig = df_elig.with_columns(
        detrended=pl.col("otp") - pl.col("trend"),
    )

    # Raw month-of-year averages (for heatmap display)
    route_seasonal = (
        df_elig.group_by(["route_id", "route_name", "month_num"])
        .agg(avg_otp=pl.col("otp").mean())
        .sort(["route_id", "month_num"])
    )

    # Amplitude from detrended values
    detrended_by_month = (
        df_elig.filter(pl.col("detrended").is_not_null())
        .group_by(["route_id", "route_name", "month_num"])
        .agg(avg_detrended=pl.col("detrended").mean())
    )
    route_amplitude = (
        detrended_by_month.group_by(["route_id", "route_name"])
        .agg(
            seasonal_amplitude=pl.col("avg_detrended").max() - pl.col("avg_detrended").min(),
            best_month=pl.col("month_num").sort_by("avg_detrended", descending=True).first(),
            worst_month=pl.col("month_num").sort_by("avg_detrended").first(),
        )
        .sort("seasonal_amplitude", descending=True)
    )

    return system_seasonal, route_seasonal, route_amplitude


def make_chart(
    system_seasonal: pl.DataFrame,
    route_seasonal: pl.DataFrame,
    route_amplitude: pl.DataFrame,
) -> None:
    """Generate seasonal pattern charts."""
    plt = setup_plotting()
    fig, axes = plt.subplots(2, 2, figsize=(16, 10))

    # Top-left: System-wide seasonal profile (bar chart)
    ax = axes[0, 0]
    months = system_seasonal["month_num"].to_list()
    devs = system_seasonal["deviation"].to_list()
    colors = ["#22c55e" if d >= 0 else "#ef4444" for d in devs]
    ax.bar(months, devs, color=colors, alpha=0.8)
    ax.set_xticks(range(1, 13))
    ax.set_xticklabels(MONTH_LABELS)
    ax.set_ylabel("OTP Deviation from Trend")
    ax.set_title("System-Wide Seasonal Profile (detrended)")
    ax.axhline(0, color="black", linewidth=0.5)

    # Top-right: Top 10 routes by seasonal amplitude
    ax = axes[0, 1]
    top10 = route_amplitude.head(10)
    labels = [f"{r} - {n}" for r, n in zip(top10["route_id"].to_list(), top10["route_name"].to_list())]
    values = top10["seasonal_amplitude"].to_list()
    y_pos = range(len(labels))
    ax.barh(y_pos, values, color="#7c3aed", alpha=0.8)
    ax.set_yticks(y_pos)
    ax.set_yticklabels(labels, fontsize=8)
    ax.set_xlabel("Seasonal Amplitude (detrended, max - min)")
    ax.set_title(f"Routes Most Affected by Season ({MIN_YEARS}+ years data)")
    ax.invert_yaxis()

    # Bottom-left: Heatmap of seasonal OTP for top-amplitude routes
    ax = axes[1, 0]
    top_routes = route_amplitude.head(20)["route_id"].to_list()
    heatmap_data = (
        route_seasonal.filter(pl.col("route_id").is_in(top_routes))
        .pivot(on="month_num", index=["route_id", "route_name"], values="avg_otp")
        .sort("route_id")
    )
    # Build matrix
    route_labels = [f"{r} - {n}" for r, n in zip(
        heatmap_data["route_id"].to_list(), heatmap_data["route_name"].to_list()
    )]
    month_cols = sorted([c for c in heatmap_data.columns if c not in ("route_id", "route_name")], key=int)
    matrix = []
    for col in month_cols:
        matrix.append(heatmap_data[col].to_list())
    import numpy as np
    matrix_arr = np.array(matrix, dtype=float).T  # routes x months

    im = ax.imshow(matrix_arr, aspect="auto", cmap="RdYlGn", vmin=OTP_HEATMAP_VMIN, vmax=OTP_HEATMAP_VMAX)
    ax.set_xticks(range(len(month_cols)))
    ax.set_xticklabels([MONTH_LABELS[int(c) - 1] for c in month_cols], fontsize=8)
    ax.set_yticks(range(len(route_labels)))
    ax.set_yticklabels(route_labels, fontsize=6)
    ax.set_title("Seasonal OTP Heatmap (top 20 by amplitude)")
    fig.colorbar(im, ax=ax, label="Average OTP", shrink=0.8)

    # Bottom-right: Overall best/worst months (system-wide distribution)
    ax = axes[1, 1]
    system_otps = system_seasonal["weighted_otp"].to_list()
    ax.bar(range(1, 13), system_otps, color="#3b82f6", alpha=0.8)
    ax.set_xticks(range(1, 13))
    ax.set_xticklabels(MONTH_LABELS)
    ax.set_ylabel("Weighted Average OTP")
    ax.set_title("System OTP by Month of Year")
    ax.set_ylim(0, 1)

    fig.suptitle("Seasonal Patterns in PRT On-Time Performance", fontsize=13)
    save_chart(fig, OUT / "seasonal_patterns.png")


@run_analysis(6, "Seasonal Patterns")
def main() -> None:
    """Entry point: load data, analyze seasonal patterns, chart, and save."""
    with phase("Loading data"):
        df = load_data()
        print(f"  {len(df):,} OTP observations loaded")

    with phase("Analyzing"):
        system_seasonal, route_seasonal, route_amplitude = analyze(df)

        # Summary
        best_sys = system_seasonal.sort("weighted_otp", descending=True).head(1)
        worst_sys = system_seasonal.sort("weighted_otp").head(1)
        print(f"  Best system month:  {MONTH_LABELS[best_sys['month_num'][0] - 1]} ({best_sys['weighted_otp'][0]:.1%})")
        print(f"  Worst system month: {MONTH_LABELS[worst_sys['month_num'][0] - 1]} ({worst_sys['weighted_otp'][0]:.1%})")

        n_eligible = route_amplitude.height
        print(f"  {n_eligible} routes with {MIN_YEARS}+ years of data for seasonal ranking")

        top3 = route_amplitude.head(3)
        print("\n  Most seasonally affected routes (detrended):")
        for row in top3.iter_rows(named=True):
            print(f"    {row['route_id']} - {row['route_name']}: amplitude = {row['seasonal_amplitude']:.3f}")

    with phase("Saving CSV"):
        save_csv(route_amplitude, OUT / "seasonal_patterns.csv")

    with phase("Generating chart"):
        make_chart(system_seasonal, route_seasonal, route_amplitude)


if __name__ == "__main__":
    main()

Sources

NameTypeWhy It MattersOwnerFreshnessCaveat
otp_monthly table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5)
  • file data/routes_by_month.csv — Monthly route OTP source table in wide format.
  • file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications.
  • file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts.
  • file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes.
  • file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
route_stops table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5)
  • file data/routes_by_month.csv — Monthly route OTP source table in wide format.
  • file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications.
  • file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts.
  • file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes.
  • file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
routes table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5)
  • file data/routes_by_month.csv — Monthly route OTP source table in wide format.
  • file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications.
  • file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts.
  • file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes.
  • file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
numpy dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.
polars dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.