Analysis

23 - Garage-Level Performance

Ridership and External Factors

Coverage: 2017-01 to 2025-11 (from otp_monthly, ridership_monthly).

Built 2026-06-15 11:52 UTC · Commit e5cf673

Page Navigation

Analysis Navigation

Data Provenance

flowchart LR
  23_garage_performance(["23 - Garage-Level Performance"])
  t_otp_monthly[("otp_monthly")] --> 23_garage_performance
  01_data_ingestion[["Data Ingestion"]] --> t_otp_monthly
  u1_01_data_ingestion[/"data/routes_by_month.csv"/] --> 01_data_ingestion
  u2_01_data_ingestion[/"data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv"/] --> 01_data_ingestion
  u3_01_data_ingestion[/"data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv"/] --> 01_data_ingestion
  u4_01_data_ingestion[/"data/PRT_Stop_Reference_Lookup_Table.csv"/] --> 01_data_ingestion
  u5_01_data_ingestion[/"data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv"/] --> 01_data_ingestion
  t_ridership_monthly[("ridership_monthly")] --> 23_garage_performance
  01_data_ingestion[["Data Ingestion"]] --> t_ridership_monthly
  t_route_stops[("route_stops")] --> 23_garage_performance
  01_data_ingestion[["Data Ingestion"]] --> t_route_stops
  t_routes[("routes")] --> 23_garage_performance
  01_data_ingestion[["Data Ingestion"]] --> t_routes
  t_stops[("stops")] --> 23_garage_performance
  01_data_ingestion[["Data Ingestion"]] --> t_stops
  d1_23_garage_performance(("numpy (lib)")) --> 23_garage_performance
  d2_23_garage_performance(("polars (lib)")) --> 23_garage_performance
  d3_23_garage_performance(("scipy (lib)")) --> 23_garage_performance
  classDef page fill:#dbeafe,stroke:#1d4ed8,color:#1e3a8a,stroke-width:2px;
  classDef table fill:#ecfeff,stroke:#0e7490,color:#164e63;
  classDef dep fill:#fff7ed,stroke:#c2410c,color:#7c2d12,stroke-dasharray: 4 2;
  classDef file fill:#eef2ff,stroke:#6366f1,color:#3730a3;
  classDef api fill:#f0fdf4,stroke:#16a34a,color:#14532d;
  classDef pipeline fill:#f5f3ff,stroke:#7c3aed,color:#4c1d95;
  class 23_garage_performance page;
  class t_otp_monthly,t_ridership_monthly,t_route_stops,t_routes,t_stops table;
  class d1_23_garage_performance,d2_23_garage_performance,d3_23_garage_performance dep;
  class u1_01_data_ingestion,u2_01_data_ingestion,u3_01_data_ingestion,u4_01_data_ingestion,u5_01_data_ingestion file;
  class 01_data_ingestion pipeline;

Findings

Findings: Garage-Level Performance

Summary

PRT garages differ significantly in route-level OTP, and the difference survives controlling for route structure. An OLS model with stop count and geographic span as controls shows that garage dummies add significant explanatory power (F = 4.55, p = 0.005). Collier garage routes run +5.4 pp above East Liberty routes after controlling for stop count and span (p < 0.001).

Key Numbers

Garage	Routes	Mean OTP	Ridership-Wtd OTP	Avg Daily Riders
South Hills Village	3	85.4%	85.3%	11,588
Collier	18	73.6%	73.9%	14,226
Ross	22	70.7%	69.5%	20,141
West Mifflin	19	67.9%	65.8%	32,388
East Liberty	31	67.1%	67.5%	42,259

Kruskal-Wallis (all routes): H = 24.4, p = 0.0001 (5 garages)
Kruskal-Wallis (bus only): H = 20.0, p = 0.0002 (4 garages)
93 routes with 12+ months of paired data

Controlled OLS Model (bus only, n = 89, reference = East Liberty)

Model	R²	Adj R²
Base (stop_count + span)	0.313	0.297
Full (+ garage dummies)	0.410	0.374

F-test for garage dummies: F = 4.55, p = 0.005 -- garages are significant after controls.

Feature	Coefficient	p-value
stop_count	-0.00039	0.001
span_km	-0.0022	0.014
garage_Collier	+0.054	<0.001
garage_Ross	+0.029	0.040
garage_West_Mifflin	+0.014	0.358

Observations

Collier routes outperform East Liberty by +5.4 pp even after controlling for stop count and span (p < 0.001). This is a substantial and statistically robust effect.
Ross routes outperform East Liberty by +2.9 pp after controls (p = 0.04), a marginally significant effect.
West Mifflin does not differ significantly from East Liberty after controls (+1.4 pp, p = 0.36). The raw difference between these two garages (0.8 pp) was largely explained by route structure.
Adding garage dummies increases R² from 0.313 to 0.410 (+9.7 pp), a meaningful improvement beyond what route structure alone explains.
The monthly trend chart shows all garages move together with system-wide trends (COVID spike, 2022 trough), but the relative ordering is stable: Collier consistently above Ross, which is consistently above East Liberty and West Mifflin.

Discussion

The controlled analysis overturns the initial interpretation that garage differences were purely a composition effect. Collier's advantage is real and operationally meaningful: after accounting for the fact that it operates shorter routes with fewer stops, Collier routes still outperform East Liberty routes by 5.4 pp.

Operational feedback from PRT-experienced observers identifies several likely explanations for the Collier advantage:

Lower corridor congestion. Collier serves the western suburbs, which have less traffic congestion than the eastern corridors served by East Liberty and West Mifflin. The AADT-based congestion analysis (Analysis 27) found 24-hour traffic volume non-significant, but that measure is too coarse to capture the peak-hour congestion differences that matter for bus operations.
Shorter downtown routing. Some Collier routes have very short segments in downtown Pittsburgh, reducing exposure to the congested street grid where delays accumulate. East Liberty routes tend to traverse longer downtown segments.
Route distribution. Garages do not cover equal areas or operate equal numbers of routes. Collier's route portfolio may be inherently more favorable for OTP beyond what stop count and span capture.

These factors suggest the garage effect is primarily a corridor-level congestion proxy rather than evidence of operational differences in garage management. The controlled model accounts for route length and stop count but not for the traffic environment each route operates in -- and the available traffic data (AADT) is too coarse to fill this gap.

West Mifflin's poor raw performance, by contrast, is largely explained by route structure: it operates the long eastern-corridor routes, and after controlling for that, it is statistically indistinguishable from East Liberty.

The R² increase from 0.31 to 0.41 suggests that garage assignment captures roughly 10% of OTP variance beyond what stop count and span explain. Given the corridor-congestion explanation, this 10% likely represents traffic environment variance rather than garage-specific operational practices.

Caveats

The current_garage field is a snapshot; historical garage assignments are not available. If routes were reassigned between garages, the analysis would not capture that (though the data shows no garage changes for any route).
The controlled model uses only stop count and span as structural controls. Adding further controls (e.g., traffic density, road type, schedule slack) could reduce or eliminate the garage effect.
The F-test assumes normally distributed residuals; the Kruskal-Wallis test (non-parametric) is more robust but does not support covariates.
South Hills Village (n=3 rail routes) and Incline (excluded, no OTP data) are too small for meaningful comparison and are excluded from the controlled model.

Review History

2026-02-27: RED-TEAM-REPORTS/2026-02-27-analyses-19-25.md — 1 significant issue. Added route-garage uniqueness guard: most-common garage per route used for grouping, with assertion to catch future data with reassignments.

Output

image garage_boxplot.png
OTP distribution by garage.

image garage_otp_trend.png
monthly OTP by garage.

No interactive outputs declared.

data garage_monthly.csv

Monthly OTP and ridership aggregates by operating garage.

Preview CSV

Expand to load preview.

data garage_route_detail.csv

Route-level OTP, ridership, and feature records used in garage comparison models.

Preview CSV

Expand to load preview.

data garage_summary.csv

garage-level summary statistics.

Preview CSV

Expand to load preview.

Methods

Methods: Garage-Level Performance

Question

Do PRT garages (Ross, Collier, East Liberty, West Mifflin) differ systematically in the OTP and ridership of routes they operate?

Approach

Join ridership data (which includes current_garage) with OTP data by route and month.
Compute garage-level aggregate OTP (ridership-weighted and unweighted) and total ridership per month.
Test for differences across garages using Kruskal-Wallis on route-level average OTP, grouped by garage.
Plot garage-level OTP trends over time to see if garages diverge or move together.
Control for route composition (garages may differ because they operate different types of routes, not because of operational quality) by comparing within-mode (bus-only) results.
Fit a controlled OLS model (bus-only): base model with stop_count and span_km, then full model adding garage dummy variables (East Liberty as reference, being the largest). Use an F-test on the nested models to determine if garage dummies add significant explanatory power beyond route structure.

Data

Name	Description	Source
`otp_monthly`	route_id, month, otp	`prt.db` table
`ridership_monthly`	route_id, month, current_garage, avg_riders; filtered to day_type='WEEKDAY'	`prt.db` table
`routes`	route_id, mode for bus-only stratification	`prt.db` table
`route_stops`	route_id, stop_id for stop counts	`prt.db` table
`stops`	stop_id, lat, lon for geographic span computation	`prt.db` table

Notes: Join on route_id and month; overlap period only (Jan 2019 -- Oct 2024). Exclude routes with fewer than 12 months of paired data or NULL garage.

Output

output/garage_otp_trend.png -- monthly OTP by garage
output/garage_boxplot.png -- OTP distribution by garage
output/garage_summary.csv -- garage-level summary statistics

  Source Code
    
      
      """Analysis 23: Compare OTP and ridership across PRT garages to surface operational differences."""

import math

import numpy as np
import polars as pl
from scipy import stats

from prt_otp_analysis.common import analysis_dir, phase, query_to_polars, run_analysis, save_chart, save_csv, setup_plotting, weighted_mean

OUT = analysis_dir(__file__)

MIN_MONTHS = 12


def load_data() -> pl.DataFrame:
    """Load OTP joined with garage assignment and ridership."""
    df = query_to_polars("""
        SELECT o.route_id, o.month, o.otp,
               r.avg_riders, r.current_garage,
               rt.mode
        FROM otp_monthly o
        JOIN ridership_monthly r
            ON o.route_id = r.route_id AND o.month = r.month
            AND r.day_type = 'WEEKDAY'
        JOIN routes rt ON o.route_id = rt.route_id
        WHERE r.current_garage IS NOT NULL
    """)

    # Filter to routes with enough data
    route_counts = df.group_by("route_id").agg(pl.col("month").count().alias("n"))
    keep = route_counts.filter(pl.col("n") >= MIN_MONTHS)["route_id"].to_list()
    df = df.filter(pl.col("route_id").is_in(keep))

    return df


def route_level_summary(df: pl.DataFrame) -> pl.DataFrame:
    """Compute route-level average OTP and ridership with garage assignment."""
    # Take the most common garage per route to handle potential reassignments
    garage_per_route = (
        df.group_by("route_id", "current_garage")
        .agg(n=pl.len())
        .sort("n", descending=True)
        .group_by("route_id")
        .first()
        .select("route_id", "current_garage")
    )
    mode_per_route = df.select("route_id", "mode").unique(subset=["route_id"])

    route_agg = (
        df.group_by("route_id")
        .agg(
            avg_otp=pl.col("otp").mean(),
            avg_riders=pl.col("avg_riders").mean(),
            n_months=pl.col("month").count(),
        )
    )
    result = route_agg.join(garage_per_route, on="route_id").join(mode_per_route, on="route_id")
    assert result["route_id"].n_unique() == len(result), (
        "route_level_summary produced duplicate route_ids — a route may have changed garages"
    )
    return result.sort("current_garage", "avg_otp")


def garage_summary(route_df: pl.DataFrame) -> pl.DataFrame:
    """Compute garage-level summary statistics."""
    return (
        route_df.group_by("current_garage")
        .agg(
            n_routes=pl.col("route_id").count(),
            mean_otp=pl.col("avg_otp").mean(),
            median_otp=pl.col("avg_otp").median(),
            std_otp=pl.col("avg_otp").std(),
            min_otp=pl.col("avg_otp").min(),
            max_otp=pl.col("avg_otp").max(),
            total_riders=pl.col("avg_riders").sum(),
            weighted_otp=weighted_mean("avg_otp", "avg_riders"),
        )
        .sort("mean_otp", descending=True)
    )


def monthly_by_garage(df: pl.DataFrame) -> pl.DataFrame:
    """Compute monthly ridership-weighted OTP per garage."""
    return (
        df.group_by("current_garage", "month")
        .agg(
            weighted_otp=weighted_mean("otp", "avg_riders"),
            unweighted_otp=pl.col("otp").mean(),
            total_riders=pl.col("avg_riders").sum(),
            n_routes=pl.col("route_id").n_unique(),
        )
        .sort("current_garage", "month")
    )


def statistical_tests(route_df: pl.DataFrame) -> dict:
    """Run Kruskal-Wallis and pairwise Mann-Whitney tests across garages."""
    results = {}

    # All routes
    garages = sorted(route_df["current_garage"].unique().to_list())
    groups = []
    garage_names = []
    for g in garages:
        vals = route_df.filter(pl.col("current_garage") == g)["avg_otp"].to_list()
        if len(vals) >= 3:
            groups.append(vals)
            garage_names.append(g)

    if len(groups) >= 2:
        h, p = stats.kruskal(*groups)
        results["kruskal_h_all"] = h
        results["kruskal_p_all"] = p
        results["garages_tested_all"] = garage_names

    # Bus-only
    bus_df = route_df.filter(pl.col("mode") == "BUS")
    bus_groups = []
    bus_names = []
    for g in garages:
        vals = bus_df.filter(pl.col("current_garage") == g)["avg_otp"].to_list()
        if len(vals) >= 3:
            bus_groups.append(vals)
            bus_names.append(g)

    if len(bus_groups) >= 2:
        h, p = stats.kruskal(*bus_groups)
        results["kruskal_h_bus"] = h
        results["kruskal_p_bus"] = p
        results["garages_tested_bus"] = bus_names

    return results


def make_trend_chart(monthly: pl.DataFrame) -> None:
    """Plot monthly ridership-weighted OTP by garage."""
    plt = setup_plotting()
    fig, ax = plt.subplots(figsize=(14, 6))

    garage_colors = {
        "Ross": "#2563eb",
        "Collier": "#16a34a",
        "East Liberty": "#e11d48",
        "West Mifflin": "#f59e0b",
        "South Hills Village": "#8b5cf6",
        "Incline": "#9ca3af",
    }

    # Build x-axis from all months
    all_months = sorted(monthly["month"].unique().to_list())
    month_to_x = {m: i for i, m in enumerate(all_months)}
    tick_positions = [i for i, m in enumerate(all_months) if m.endswith("-01")]
    tick_labels = [all_months[i][:4] for i in tick_positions]

    for garage in sorted(monthly["current_garage"].unique().to_list()):
        gdf = monthly.filter(pl.col("current_garage") == garage).sort("month")
        months = gdf["month"].to_list()
        x = [month_to_x[m] for m in months]
        y = gdf["weighted_otp"].to_list()
        color = garage_colors.get(garage, "#9ca3af")
        ax.plot(x, y, color=color, linewidth=1.5, label=garage, alpha=0.8)

    if "2020-03" in all_months:
        covid_idx = month_to_x["2020-03"]
        ax.axvline(covid_idx, color="#ef4444", linestyle=":", alpha=0.5)

    ax.set_ylabel("Ridership-Weighted OTP")
    ax.set_xlabel("Month")
    ax.set_title("Monthly OTP by Garage (Ridership-Weighted)")
    ax.set_xticks(tick_positions)
    ax.set_xticklabels(tick_labels)
    ax.legend(loc="lower left", fontsize=8)
    ax.set_ylim(0.45, 0.90)

    save_chart(fig, OUT / "garage_otp_trend.png")


def make_boxplot(route_df: pl.DataFrame) -> None:
    """Box plot of route-level OTP by garage, all routes and bus-only."""
    plt = setup_plotting()
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

    garage_colors = ["#2563eb", "#16a34a", "#e11d48", "#f59e0b", "#8b5cf6", "#9ca3af"]

    for ax, title, filt in [
        (ax1, "All Routes", route_df),
        (ax2, "Bus Only", route_df.filter(pl.col("mode") == "BUS")),
    ]:
        garages = sorted(filt["current_garage"].unique().to_list())
        data = []
        labels = []
        for g in garages:
            vals = filt.filter(pl.col("current_garage") == g)["avg_otp"].to_list()
            if len(vals) >= 2:
                data.append(vals)
                labels.append(f"{g}\n(n={len(vals)})")

        bp = ax.boxplot(data, tick_labels=labels, patch_artist=True)
        for patch, color in zip(bp["boxes"], garage_colors):
            patch.set_facecolor(color)
            patch.set_alpha(0.6)
        ax.set_ylabel("Route-Level Average OTP")
        ax.set_title(title)
        ax.tick_params(axis="x", labelsize=8)

    fig.suptitle("OTP Distribution by Garage", fontsize=13, y=1.02)
    save_chart(fig, OUT / "garage_boxplot.png")


def haversine_km(lat1: float, lon1: float, lat2: float, lon2: float) -> float:
    """Return the great-circle distance in km between two lat/lon points."""
    R = 6371.0
    dlat = math.radians(lat2 - lat1)
    dlon = math.radians(lon2 - lon1)
    a = (math.sin(dlat / 2) ** 2
         + math.cos(math.radians(lat1)) * math.cos(math.radians(lat2))
         * math.sin(dlon / 2) ** 2)
    return R * 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))


def compute_span(lats: list[float], lons: list[float]) -> float:
    """Return the max pairwise haversine distance (km) among a set of points."""
    max_dist = 0.0
    n = len(lats)
    for i in range(n):
        for j in range(i + 1, n):
            d = haversine_km(lats[i], lons[i], lats[j], lons[j])
            if d > max_dist:
                max_dist = d
    return max_dist


def fit_ols(y: np.ndarray, X_raw: np.ndarray, feature_names: list[str]) -> dict:
    """Fit OLS regression and return results dict."""
    n, k = X_raw.shape
    X = np.column_stack([np.ones(n), X_raw])
    beta, _, _, _ = np.linalg.lstsq(X, y, rcond=None)
    y_hat = X @ beta
    residuals = y - y_hat
    ss_res = np.sum(residuals ** 2)
    ss_tot = np.sum((y - np.mean(y)) ** 2)
    r_squared = 1 - ss_res / ss_tot
    adj_r_squared = 1 - (1 - r_squared) * (n - 1) / (n - k - 1)
    mse = ss_res / (n - k - 1)
    XtX_inv = np.linalg.pinv(X.T @ X)
    se = np.sqrt(np.diag(XtX_inv) * mse)
    t_vals = beta / se
    p_vals = [2 * (1 - stats.t.cdf(abs(t), df=n - k - 1)) for t in t_vals]
    return {
        "r_squared": r_squared,
        "adj_r_squared": adj_r_squared,
        "n": n, "k": k,
        "features": ["intercept"] + feature_names,
        "coefficients": beta.tolist(),
        "std_errors": se.tolist(),
        "p_values": p_vals,
    }


def load_structural_features() -> pl.DataFrame:
    """Load route-level structural features (stop count, span) for the controlled model."""
    stop_counts = query_to_polars("""
        SELECT route_id, COUNT(DISTINCT stop_id) AS stop_count
        FROM route_stops GROUP BY route_id
    """)
    stops_by_route = query_to_polars("""
        SELECT rs.route_id, s.lat, s.lon
        FROM route_stops rs
        JOIN stops s ON rs.stop_id = s.stop_id
        WHERE s.lat IS NOT NULL AND s.lon IS NOT NULL
    """)
    spans = []
    for route_id in stops_by_route["route_id"].unique().sort().to_list():
        subset = stops_by_route.filter(pl.col("route_id") == route_id)
        span_km = compute_span(subset["lat"].to_list(), subset["lon"].to_list())
        spans.append({"route_id": route_id, "span_km": span_km})
    span_df = pl.DataFrame(spans)

    return stop_counts.join(span_df, on="route_id", how="inner")


def controlled_garage_test(route_df: pl.DataFrame) -> dict:
    """Fit OLS models with and without garage dummies, controlling for stop count, span, and mode."""
    struct = load_structural_features()
    df = route_df.join(struct, on="route_id", how="inner")
    df = df.with_columns(
        pl.when(pl.col("mode") == "RAIL").then(1.0).otherwise(0.0).alias("is_rail"),
    )

    # Bus-only analysis (drop rail and South Hills Village)
    bus_df = df.filter(pl.col("mode") == "BUS").drop_nulls(subset=["stop_count", "span_km"])

    # Reference garage = East Liberty (largest, excluded from dummies)
    bus_garages = sorted(set(bus_df["current_garage"].to_list()) - {"East Liberty"})
    for g in bus_garages:
        col_name = f"garage_{g.replace(' ', '_')}"
        bus_df = bus_df.with_columns(
            pl.when(pl.col("current_garage") == g).then(1.0).otherwise(0.0).alias(col_name),
        )

    y = bus_df["avg_otp"].to_numpy()

    # Model 1: structural only (stop_count, span_km)
    base_features = ["stop_count", "span_km"]
    X_base = np.column_stack([bus_df[f].to_numpy().astype(float) for f in base_features])
    base_results = fit_ols(y, X_base, base_features)

    # Model 2: structural + garage dummies
    garage_cols = [f"garage_{g.replace(' ', '_')}" for g in bus_garages]
    full_features = base_features + garage_cols
    X_full = np.column_stack([bus_df[f].to_numpy().astype(float) for f in full_features])
    full_results = fit_ols(y, X_full, full_features)

    # F-test for garage dummies (nested model comparison)
    n = len(y)
    k_base = len(base_features)
    k_full = len(full_features)
    k_diff = k_full - k_base
    ss_res_base = np.sum((y - np.mean(y)) ** 2) * (1 - base_results["r_squared"])
    ss_res_full = np.sum((y - np.mean(y)) ** 2) * (1 - full_results["r_squared"])
    f_stat = ((ss_res_base - ss_res_full) / k_diff) / (ss_res_full / (n - k_full - 1))
    f_p = 1 - stats.f.cdf(f_stat, k_diff, n - k_full - 1)

    return {
        "base_results": base_results,
        "full_results": full_results,
        "f_stat": f_stat,
        "f_p": f_p,
        "k_diff": k_diff,
        "n_bus": n,
        "bus_garages": bus_garages,
        "garage_cols": garage_cols,
    }


@run_analysis(23, "Garage-Level Performance")
def main() -> None:
    """Entry point: load, summarize, test, chart, and save."""

    with phase("Loading data"):
        df = load_data()
        n_routes = df["route_id"].n_unique()
        print(f"  {len(df):,} route-month observations ({n_routes} routes)")

    with phase("Computing route-level summaries"):
        route_df = route_level_summary(df)
        gsummary = garage_summary(route_df)

        print("\n  Garage summary:")
        print(f"  {'Garage':<22} {'Routes':>6} {'Mean OTP':>9} {'Wt OTP':>9} {'Riders':>10}")
        for row in gsummary.iter_rows(named=True):
            print(f"  {row['current_garage']:<22} {row['n_routes']:>6} "
                  f"{row['mean_otp']:>9.1%} {row['weighted_otp']:>9.1%} "
                  f"{row['total_riders']:>10,.0f}")

    with phase("Statistical tests"):
        test_results = statistical_tests(route_df)

        if "kruskal_h_all" in test_results:
            print(f"  Kruskal-Wallis (all routes): H = {test_results['kruskal_h_all']:.3f}, "
                  f"p = {test_results['kruskal_p_all']:.4f}")
            print(f"    Garages tested: {test_results['garages_tested_all']}")

        if "kruskal_h_bus" in test_results:
            print(f"  Kruskal-Wallis (bus only):   H = {test_results['kruskal_h_bus']:.3f}, "
                  f"p = {test_results['kruskal_p_bus']:.4f}")
            print(f"    Garages tested: {test_results['garages_tested_bus']}")

    with phase("Controlled analysis (OLS with structural controls)"):
        ctrl = controlled_garage_test(route_df)
        base = ctrl["base_results"]
        full = ctrl["full_results"]
        print(f"\n  Bus-only models (n = {ctrl['n_bus']}, reference garage = East Liberty):")
        print(f"  Base model (stop_count + span): R² = {base['r_squared']:.4f}, "
              f"Adj R² = {base['adj_r_squared']:.4f}")
        print(f"  Full model (+ garage dummies):  R² = {full['r_squared']:.4f}, "
              f"Adj R² = {full['adj_r_squared']:.4f}")
        print(f"  R² change: +{full['r_squared'] - base['r_squared']:.4f}")
        print(f"\n  F-test for garage dummies: F = {ctrl['f_stat']:.3f}, "
              f"p = {ctrl['f_p']:.4f} (df = {ctrl['k_diff']}, {ctrl['n_bus'] - full['k'] - 1})")
        if ctrl['f_p'] < 0.05:
            print("  => Garage dummies ARE significant after controlling for route structure.")
        else:
            print("  => Garage dummies are NOT significant after controlling for route structure.")

        # Print garage coefficients
        print(f"\n  {'Feature':<25s} {'Coeff':>10s} {'Std Err':>10s} {'p-value':>10s}")
        print(f"  {'-'*55}")
        for i, feat in enumerate(full["features"]):
            coeff = full["coefficients"][i]
            se = full["std_errors"][i]
            p = full["p_values"][i]
            sig = "***" if p < 0.001 else "**" if p < 0.01 else "*" if p < 0.05 else ""
            print(f"  {feat:<25s} {coeff:>10.6f} {se:>10.6f} {p:>10.4f} {sig}")

    with phase("Computing monthly trends"):
        monthly = monthly_by_garage(df)

    with phase("Saving CSVs"):
        save_csv(gsummary, OUT / "garage_summary.csv")
        save_csv(route_df, OUT / "garage_route_detail.csv")
        save_csv(monthly, OUT / "garage_monthly.csv")

    with phase("Generating charts"):
        make_trend_chart(monthly)
        make_boxplot(route_df)


if __name__ == "__main__":
    main()

    

    

Sources

Name	Type	Why It Matters	Owner	Freshness	Caveat
otp_monthly	table	Primary analytical table used in this page's computations.	Produced by Data Ingestion.	Updated when the producing pipeline step is rerun.	Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5) file data/routes_by_month.csv — Monthly route OTP source table in wide format. file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications. file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts. file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes. file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
ridership_monthly	table	Primary analytical table used in this page's computations.	Produced by Data Ingestion.	Updated when the producing pipeline step is rerun.	Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5) file data/routes_by_month.csv — Monthly route OTP source table in wide format. file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications. file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts. file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes. file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
route_stops	table	Primary analytical table used in this page's computations.	Produced by Data Ingestion.	Updated when the producing pipeline step is rerun.	Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5) file data/routes_by_month.csv — Monthly route OTP source table in wide format. file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications. file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts. file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes. file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
routes	table	Primary analytical table used in this page's computations.	Produced by Data Ingestion.	Updated when the producing pipeline step is rerun.	Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5) file data/routes_by_month.csv — Monthly route OTP source table in wide format. file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications. file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts. file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes. file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
stops	table	Primary analytical table used in this page's computations.	Produced by Data Ingestion.	Updated when the producing pipeline step is rerun.	Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5) file data/routes_by_month.csv — Monthly route OTP source table in wide format. file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications. file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts. file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes. file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
numpy	dependency	Runtime dependency required for this page's pipeline or analysis code.	Open-source Python ecosystem maintainers.	Version pinned by project environment until dependency updates are applied.	Library updates may change behavior or defaults.
polars	dependency	Runtime dependency required for this page's pipeline or analysis code.	Open-source Python ecosystem maintainers.	Version pinned by project environment until dependency updates are applied.	Library updates may change behavior or defaults.
scipy	dependency	Runtime dependency required for this page's pipeline or analysis code.	Open-source Python ecosystem maintainers.	Version pinned by project environment until dependency updates are applied.	Library updates may change behavior or defaults.