Analysis

41 - Operating Cost Drivers

Equity and Strategic Planning

Coverage: Coverage window unavailable for this page.

Built 2026-04-03 20:09 UTC · Commit 7c56b9a

Page Navigation

Analysis Navigation

Data Provenance

flowchart LR
  41_operating_cost_drivers(["41 - Operating Cost Drivers"])
  t_ntd_annual_service[("ntd_annual_service")] --> 41_operating_cost_drivers
  06_ntd_service[["NTD Annual Service ETL"]] --> t_ntd_annual_service
  u1_06_ntd_service[/"data/ntd-annual-service/2023_TS2.2_Service_Data.xlsx"/] --> 06_ntd_service
  t_ntd_fleet_age[("ntd_fleet_age")] --> 41_operating_cost_drivers
  d1_41_operating_cost_drivers(("polars (lib)")) --> 41_operating_cost_drivers
  d2_41_operating_cost_drivers(("numpy (lib)")) --> 41_operating_cost_drivers
  classDef page fill:#dbeafe,stroke:#1d4ed8,color:#1e3a8a,stroke-width:2px;
  classDef table fill:#ecfeff,stroke:#0e7490,color:#164e63;
  classDef dep fill:#fff7ed,stroke:#c2410c,color:#7c2d12,stroke-dasharray: 4 2;
  classDef file fill:#eef2ff,stroke:#6366f1,color:#3730a3;
  classDef api fill:#f0fdf4,stroke:#16a34a,color:#14532d;
  classDef pipeline fill:#f5f3ff,stroke:#7c3aed,color:#4c1d95;
  class 41_operating_cost_drivers page;
  class t_ntd_annual_service,t_ntd_fleet_age table;
  class d1_41_operating_cost_drivers,d2_41_operating_cost_drivers dep;
  class u1_06_ntd_service file;
  class 06_ntd_service pipeline;

Findings

Findings: Operating Cost Drivers

Key findings

  1. PRT's cost premium is concentrated in vehicle maintenance. At $3.34 per trip, Pittsburgh's vehicle maintenance cost is the highest among all 8 peers — 47% above the peer average ($2.27). The other three cost categories (vehicle operations, non-vehicle maintenance, general admin) are mid-pack or below average.

  2. General administration is PRT's leanest category. At $2.23 per trip, PRT's gen admin cost is the second-lowest among peers (only Buffalo is lower at $1.60). Baltimore ($4.07), Denver ($3.79), and St. Louis ($3.69) spend far more on admin per trip.

  3. Bus fleet age does not explain the maintenance premium. A scatter plot of bus average age vs vehicle maintenance cost per trip shows a negative correlation (r = -0.52) — the opposite of what we'd expect if older buses drove higher costs. Pittsburgh's bus fleet (avg 7.1 years) is younger than Minneapolis (10.5) and Buffalo (9.1), yet its maintenance costs are far higher. The negative correlation likely reflects that cities with older buses (Minneapolis, Buffalo) are smaller systems with lower cost-of-labor.

  4. PRT's light rail fleet is the 3rd oldest among peers at 32.4 years. Only Cleveland (43.0) and Buffalo (40.0) have older light rail vehicles. Denver (16.6) and Minneapolis (10.8) have much newer fleets. The aging light rail fleet likely contributes to PRT's elevated vehicle maintenance costs — rail vehicle maintenance is more expensive per unit than bus maintenance, and older rail vehicles require more intensive overhaul.

  5. Vehicle maintenance costs spiked in 2020-2021 and stayed elevated. The cost trends show PRT's vehicle maintenance per trip jumped sharply when ridership cratered (fixed maintenance costs spread across fewer trips), but unlike most peers, it has not recovered. This suggests PRT's maintenance costs are largely fixed rather than variable with ridership.

  6. All per-trip costs spiked during COVID due to the denominator effect. When ridership drops 40% but expenses don't drop proportionally, per-trip costs mechanically increase. The key question is which costs recovered as ridership partially stabilized — and for PRT, vehicle maintenance did not.

Limitations

  • Per-trip normalization amplifies fixed costs. Agencies with large fixed infrastructure (rail, maintenance facilities) will show higher per-trip costs when ridership drops, even if absolute spending is unchanged.
  • NTD cost categories are broad. "Vehicle maintenance" includes both bus and rail maintenance. We cannot separate the light rail maintenance premium from bus maintenance in this data.
  • Fleet age is a proxy. Average age doesn't capture fleet condition, maintenance practices, or rebuild history. Two fleets of the same age can have very different maintenance needs.
  • 8 peers is a small sample for correlation analysis. The r = -0.52 is suggestive but not statistically robust with n = 8.

Validation

  • Data source verified. Cost sub-categories from NTD TS2.2 OpExp sheets (VO, VM, NVM, GA). Fleet age from NTD Socrata API dataset 6abt-uhgq.
  • Aggregates sanity-checked. OpExp sub-categories sum exactly to OpExp Total for all peers and years.
  • Direction of effects checked. Per-trip costs spiked in 2020-2021 (consistent with ridership crash) and partially recovered (consistent with partial ridership recovery). No anomalies.
  • Surprising result investigated. The negative bus-age-vs-maintenance correlation is counterintuitive but likely reflects a confound: smaller cities (Buffalo, Minneapolis) have older fleets AND lower labor costs. With n = 8 we cannot control for this.

Output

Methods

Methods: Operating Cost Drivers

Question

Where does PRT's operating cost premium come from relative to peers, and does fleet age explain the vehicle maintenance gap?

Approach

  • Pull OpExp sub-categories (VO, VM, NVM, GA) and UPT from ntd_annual_service for 8 peers, 2019 and 2024.
  • Normalize all costs to per-trip amounts (opexp_xx / upt) to control for agency size.
  • Pull fleet age data from ntd_fleet_age for 2024 (bus and light rail vehicle types).
  • Scatter-plot bus average age vs vehicle maintenance cost per trip to test whether older fleets explain higher maintenance costs.
  • Show cost-per-trip breakdown as stacked bars and time trends (2019–2024).

Data

  • ntd_annual_service: columns ntd_id, year, upt, opexp_vo, opexp_vm, opexp_nvm, opexp_ga.
  • ntd_fleet_age: columns ntd_id, year, vehicle_type, avg_age, total_vehicles.
  • Filtered to the 8 peer city NTD IDs defined in PEERS.

Output

  • output/cost_breakdown.png — stacked bar chart of per-trip cost by category for each peer (2024).
  • output/cost_change.png — grouped bars showing 2019 vs 2024 per-trip cost by category for Pittsburgh.
  • output/fleet_age_vs_maintenance.png — scatter plot correlating bus fleet age with vehicle maintenance cost per trip.
  • output/fleet_age_comparison.png — bar chart of average bus and light rail fleet age per peer.
  • output/cost_trends.png — 2×2 line chart panel showing per-trip cost trajectories by category (2019–2024).
  • output/cost_breakdown.csv — full per-trip cost data and fleet age metrics for all peers.

Source Code

"""Decompose PRT's operating cost premium and correlate with fleet age."""

import numpy as np
import polars as pl

from prt_otp_analysis.common import (
    PEERS,
    analysis_dir,
    get_db,
    phase,
    run_analysis,
    save_chart,
    save_csv,
    setup_plotting,
)

OUT = analysis_dir(__file__)

PRT_NTD_ID = 30022
BASELINE_YEAR = 2019
COMPARE_YEAR = 2024

COLORS = {
    "Pittsburgh": "#E24A33",
    "Baltimore": "#4878CF",
    "Cleveland": "#6ACC65",
    "Denver": "#D65F5F",
    "St. Louis": "#B47CC7",
    "Buffalo": "#C4AD66",
    "Portland": "#77BEDB",
    "Minneapolis": "#FFB347",
}

# Cost category display names and colors
COST_CATS = {
    "opexp_vo": ("Vehicle Ops", "#4878CF"),
    "opexp_vm": ("Vehicle Maint", "#E24A33"),
    "opexp_nvm": ("Non-Veh Maint", "#6ACC65"),
    "opexp_ga": ("Gen Admin", "#B47CC7"),
}


def load_cost_data(conn) -> pl.DataFrame:
    """Load OpExp sub-categories and UPT for peers across all years."""
    id_list = ",".join(str(i) for i in PEERS)
    rows = conn.execute(f"""
        SELECT ntd_id, year, upt, opexp, opexp_vo, opexp_vm, opexp_nvm, opexp_ga
        FROM ntd_annual_service
        WHERE ntd_id IN ({id_list})
          AND year BETWEEN {BASELINE_YEAR} AND {COMPARE_YEAR}
    """).fetchall()
    peer_map = pl.DataFrame({
        "ntd_id": list(PEERS.keys()),
        "city": list(PEERS.values()),
    })
    df = pl.DataFrame([dict(r) for r in rows])
    df = df.join(peer_map, on="ntd_id", how="left")

    # Compute per-trip costs
    for cat in COST_CATS:
        df = df.with_columns((pl.col(cat) / pl.col("upt")).alias(f"{cat}_pt"))
    df = df.with_columns((pl.col("opexp") / pl.col("upt")).alias("cost_per_trip"))
    return df.sort("city", "year")


def load_fleet_age(conn) -> pl.DataFrame:
    """Load 2024 fleet age data for peer cities."""
    id_list = ",".join(str(i) for i in PEERS)
    rows = conn.execute(f"""
        SELECT ntd_id, vehicle_type, total_vehicles, avg_age, avg_miles
        FROM ntd_fleet_age
        WHERE ntd_id IN ({id_list})
          AND year = {COMPARE_YEAR}
    """).fetchall()
    peer_map = pl.DataFrame({
        "ntd_id": list(PEERS.keys()),
        "city": list(PEERS.values()),
    })
    df = pl.DataFrame([dict(r) for r in rows])
    return df.join(peer_map, on="ntd_id", how="left")


def chart_cost_breakdown(plt, cost_df: pl.DataFrame) -> None:
    """Stacked bar: per-trip cost breakdown by category (2024)."""
    df_2024 = cost_df.filter(pl.col("year") == COMPARE_YEAR).sort("cost_per_trip", descending=True)
    cities = df_2024["city"].to_list()
    x = np.arange(len(cities))

    fig, ax = plt.subplots(figsize=(12, 7))
    bottom = np.zeros(len(cities))

    for cat, (label, color) in COST_CATS.items():
        vals = df_2024[f"{cat}_pt"].to_list()
        ax.bar(x, vals, bottom=bottom, label=label, color=color, edgecolor="white", linewidth=0.5)
        bottom += np.array(vals)

    # Label totals on top
    for i, total in enumerate(df_2024["cost_per_trip"].to_list()):
        ax.text(i, total + 0.15, f"${total:.2f}", ha="center", va="bottom", fontsize=9, fontweight="bold")

    ax.set_xticks(x)
    labels = ax.set_xticklabels(cities, rotation=35, ha="right")
    for label in labels:
        if label.get_text() == "Pittsburgh":
            label.set_fontweight("bold")

    ax.set_ylabel("Cost per Trip ($)")
    ax.set_title(f"Operating Cost per Trip by Category \u2014 Peer Cities ({COMPARE_YEAR})")
    ax.legend(loc="upper right")
    save_chart(fig, OUT / "cost_breakdown.png")


def chart_cost_change(plt, cost_df: pl.DataFrame) -> None:
    """Grouped bar: per-trip cost by category, 2019 vs 2024, all peers."""
    df_base = cost_df.filter(pl.col("year") == BASELINE_YEAR).sort("city")
    df_comp = cost_df.filter(pl.col("year") == COMPARE_YEAR).sort("city")
    cities = df_comp["city"].to_list()
    x = np.arange(len(cities))

    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    for idx, (cat, (label, color)) in enumerate(COST_CATS.items()):
        ax = axes[idx // 2, idx % 2]
        vals_base = df_base[f"{cat}_pt"].to_list()
        vals_comp = df_comp[f"{cat}_pt"].to_list()
        width = 0.35

        ax.bar(x - width / 2, vals_base, width, label=str(BASELINE_YEAR), color="#4878CF", alpha=0.8)
        ax.bar(x + width / 2, vals_comp, width, label=str(COMPARE_YEAR), color="#E24A33", alpha=0.8)
        ax.set_xticks(x)
        xlabels = ax.set_xticklabels(cities, rotation=35, ha="right", fontsize=8)
        for xl in xlabels:
            if xl.get_text() == "Pittsburgh":
                xl.set_fontweight("bold")
        ax.set_ylabel("$/Trip")
        ax.set_title(label)
        ax.legend(fontsize=8)

    fig.suptitle(f"Cost per Trip by Category \u2014 {BASELINE_YEAR} vs {COMPARE_YEAR}",
                 fontsize=14, fontweight="bold", y=0.98)
    save_chart(fig, OUT / "cost_change.png")


def chart_fleet_age_vs_maintenance(plt, cost_df: pl.DataFrame, fleet_df: pl.DataFrame) -> None:
    """Scatter: bus fleet age vs vehicle maintenance cost/trip."""
    bus_age_df = fleet_df.filter(pl.col("vehicle_type") == "Bus").select("ntd_id", "city", "avg_age")
    maint_df = cost_df.filter(pl.col("year") == COMPARE_YEAR).select("ntd_id", "city", "opexp_vm_pt")
    merged_df = maint_df.join(bus_age_df, on=["ntd_id", "city"], how="inner")

    fig, ax = plt.subplots(figsize=(10, 7))
    for row in merged_df.iter_rows(named=True):
        city = row["city"]
        color = COLORS[city]
        size = 120 if city == "Pittsburgh" else 80
        zorder = 10 if city == "Pittsburgh" else 1
        ax.scatter(row["avg_age"], row["opexp_vm_pt"], color=color, s=size,
                   zorder=zorder, edgecolors="white", linewidth=1)
        offset_y = 0.08 if city != "Baltimore" else -0.12
        ax.annotate(city, (row["avg_age"], row["opexp_vm_pt"]),
                    textcoords="offset points", xytext=(8, offset_y * 50),
                    fontsize=9, fontweight="bold" if city == "Pittsburgh" else "normal")

    # Trendline
    ages = merged_df["avg_age"].to_list()
    costs = merged_df["opexp_vm_pt"].to_list()
    z = np.polyfit(ages, costs, 1)
    p = np.poly1d(z)
    age_range = np.linspace(min(ages) - 0.5, max(ages) + 0.5, 50)
    ax.plot(age_range, p(age_range), "--", color="#999999", linewidth=1, alpha=0.7)

    # Correlation
    corr = np.corrcoef(ages, costs)[0, 1]
    ax.text(0.05, 0.95, f"r = {corr:.2f}", transform=ax.transAxes,
            fontsize=11, verticalalignment="top", color="#666666")

    ax.set_xlabel("Average Bus Fleet Age (years)")
    ax.set_ylabel("Vehicle Maintenance Cost per Trip ($)")
    ax.set_title(f"Bus Fleet Age vs Vehicle Maintenance Cost \u2014 Peer Cities ({COMPARE_YEAR})")
    save_chart(fig, OUT / "fleet_age_vs_maintenance.png")


def chart_fleet_age_comparison(plt, fleet_df: pl.DataFrame) -> None:
    """Bar chart: average bus and light rail age per peer."""
    bus_df = (fleet_df.filter(pl.col("vehicle_type") == "Bus")
              .select("city", "avg_age").rename({"avg_age": "bus_age"}).sort("city"))
    lrv_df = (fleet_df.filter(pl.col("vehicle_type") == "Light Rail Vehicle")
              .select("city", "avg_age").rename({"avg_age": "lrv_age"}).sort("city"))

    # Only cities with bus data (all should have it)
    cities = bus_df["city"].to_list()
    bus_ages = bus_df["bus_age"].to_list()

    fig, ax = plt.subplots(figsize=(12, 6))
    x = np.arange(len(cities))
    width = 0.35

    ax.bar(x - width / 2, bus_ages, width, label="Bus", color="#4878CF")

    # Add LRV bars where available
    lrv_ages = []
    for city in cities:
        lrv_row = lrv_df.filter(pl.col("city") == city)
        lrv_ages.append(lrv_row["lrv_age"][0] if len(lrv_row) > 0 else 0)
    ax.bar(x + width / 2, lrv_ages, width, label="Light Rail", color="#E24A33")

    for i, (ba, la) in enumerate(zip(bus_ages, lrv_ages)):
        ax.text(i - width / 2, ba + 0.3, f"{ba:.1f}", ha="center", va="bottom", fontsize=8)
        if la > 0:
            ax.text(i + width / 2, la + 0.3, f"{la:.1f}", ha="center", va="bottom", fontsize=8)

    ax.set_xticks(x)
    xlabels = ax.set_xticklabels(cities, rotation=35, ha="right")
    for xl in xlabels:
        if xl.get_text() == "Pittsburgh":
            xl.set_fontweight("bold")

    ax.set_ylabel("Average Fleet Age (years)")
    ax.set_title(f"Revenue Vehicle Fleet Age \u2014 Peer Cities ({COMPARE_YEAR})")
    ax.legend()
    save_chart(fig, OUT / "fleet_age_comparison.png")


def chart_cost_trends(plt, cost_df: pl.DataFrame) -> None:
    """2×2 line chart: per-trip cost trajectories by category."""
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))

    for idx, (cat, (label, _)) in enumerate(COST_CATS.items()):
        ax = axes[idx // 2, idx % 2]
        for city in sorted(PEERS.values()):
            city_df = cost_df.filter(pl.col("city") == city).sort("year")
            years = city_df["year"].to_list()
            vals = city_df[f"{cat}_pt"].to_list()
            lw = 2.5 if city == "Pittsburgh" else 1.2
            alpha = 1.0 if city == "Pittsburgh" else 0.6
            zorder = 10 if city == "Pittsburgh" else 1
            ax.plot(years, vals, label=city, color=COLORS[city],
                    linewidth=lw, alpha=alpha, zorder=zorder, marker="o", markersize=3)
        ax.set_ylabel("$/Trip")
        ax.set_title(label)
        ax.set_xticks(list(range(BASELINE_YEAR, COMPARE_YEAR + 1)))

    axes[0, 0].legend(loc="upper left", fontsize=7)
    fig.suptitle(f"Cost per Trip Trends \u2014 Pittsburgh vs 7 Peers ({BASELINE_YEAR}\u2013{COMPARE_YEAR})",
                 fontsize=14, fontweight="bold", y=0.98)
    save_chart(fig, OUT / "cost_trends.png")


@run_analysis(41, "Operating Cost Drivers")
def main():
    plt = setup_plotting()
    conn = get_db()

    with phase("Loading cost data"):
        cost_df = load_cost_data(conn)
        print(f"   {len(cost_df)} rows ({BASELINE_YEAR}-{COMPARE_YEAR})")

    with phase("Loading fleet age data"):
        fleet_df = load_fleet_age(conn)
        conn.close()
        print(f"   {len(fleet_df)} vehicle type rows")

    # Print summary table
    with phase("Cost per trip breakdown (2024)"):
        df_2024 = cost_df.filter(pl.col("year") == COMPARE_YEAR).sort("cost_per_trip", descending=True)
        print(f"\n   {'City':<15s} {'Veh Ops':>8s} {'Veh Maint':>10s} {'Non-Veh':>8s} {'Gen Adm':>8s} {'Total':>8s}")
        for row in df_2024.iter_rows(named=True):
            marker = " <<<" if row["ntd_id"] == PRT_NTD_ID else ""
            print("   %-15s $%6.2f    $%6.2f  $%6.2f  $%6.2f  $%6.2f%s" % (
                row["city"], row["opexp_vo_pt"], row["opexp_vm_pt"],
                row["opexp_nvm_pt"], row["opexp_ga_pt"], row["cost_per_trip"], marker))

    with phase("Fleet age summary (2024)"):
        bus_df = fleet_df.filter(pl.col("vehicle_type") == "Bus").sort("avg_age", descending=True)
        print(f"\n   {'City':<15s} {'Buses':>6s} {'Avg Age':>8s}")
        for row in bus_df.iter_rows(named=True):
            marker = " <<<" if row["ntd_id"] == PRT_NTD_ID else ""
            age = f"{row['avg_age']:.1f}" if row["avg_age"] else "N/A"
            print(f"   {row['city']:<15s} {row['total_vehicles']:>6d} {age:>8s}{marker}")

    # Save CSV
    with phase("Saving data"):
        csv_df = df_2024.select(
            "city", "cost_per_trip", "opexp_vo_pt", "opexp_vm_pt", "opexp_nvm_pt", "opexp_ga_pt",
        )
        # Join fleet age
        bus_age_for_csv = (fleet_df.filter(pl.col("vehicle_type") == "Bus")
                           .select("city", avg_bus_age=pl.col("avg_age")))
        lrv_age_for_csv = (fleet_df.filter(pl.col("vehicle_type") == "Light Rail Vehicle")
                           .select("city", avg_lrv_age=pl.col("avg_age")))
        csv_df = csv_df.join(bus_age_for_csv, on="city", how="left")
        csv_df = csv_df.join(lrv_age_for_csv, on="city", how="left")
        save_csv(csv_df, OUT / "cost_breakdown.csv")

    # Charts
    with phase("Generating cost breakdown chart"):
        chart_cost_breakdown(plt, cost_df)

    with phase("Generating cost change chart"):
        chart_cost_change(plt, cost_df)

    with phase("Generating fleet age vs maintenance chart"):
        chart_fleet_age_vs_maintenance(plt, cost_df, fleet_df)

    with phase("Generating fleet age comparison chart"):
        chart_fleet_age_comparison(plt, fleet_df)

    with phase("Generating cost trend lines"):
        chart_cost_trends(plt, cost_df)


if __name__ == "__main__":
    main()

Sources

NameTypeWhy It MattersOwnerFreshnessCaveat
ntd_annual_service table Primary analytical table used in this page's computations. Produced by NTD Annual Service ETL. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (1)
  • file data/ntd-annual-service/2023_TS2.2_Service_Data.xlsx — NTD TS2.2 workbook with annual service data by system.
ntd_fleet_age table Primary analytical table used in this page's computations. Project pipeline owner not linked. Refresh cadence unknown. Coverage depends on upstream source availability and ETL assumptions.
polars dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.
numpy dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.