Analysis

08: Hot-Spot Map

Core OTP Patterns

Coverage: 2019-01 to 2025-11 (from otp_monthly).

Built 2026-04-03 20:09 UTC · Commit 7c56b9a

Page Navigation

Analysis Navigation

Data Provenance

flowchart LR
  08_hotspot_map(["08: Hot-Spot Map"])
  t_otp_monthly[("otp_monthly")] --> 08_hotspot_map
  01_data_ingestion[["Data Ingestion"]] --> t_otp_monthly
  u1_01_data_ingestion[/"data/routes_by_month.csv"/] --> 01_data_ingestion
  u2_01_data_ingestion[/"data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv"/] --> 01_data_ingestion
  u3_01_data_ingestion[/"data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv"/] --> 01_data_ingestion
  u4_01_data_ingestion[/"data/PRT_Stop_Reference_Lookup_Table.csv"/] --> 01_data_ingestion
  u5_01_data_ingestion[/"data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv"/] --> 01_data_ingestion
  t_route_stops[("route_stops")] --> 08_hotspot_map
  01_data_ingestion[["Data Ingestion"]] --> t_route_stops
  t_routes[("routes")] --> 08_hotspot_map
  01_data_ingestion[["Data Ingestion"]] --> t_routes
  t_stops[("stops")] --> 08_hotspot_map
  01_data_ingestion[["Data Ingestion"]] --> t_stops
  d1_08_hotspot_map(("branca (lib)")) --> 08_hotspot_map
  d2_08_hotspot_map(("folium (lib)")) --> 08_hotspot_map
  d3_08_hotspot_map(("matplotlib (lib)")) --> 08_hotspot_map
  d4_08_hotspot_map(("polars (lib)")) --> 08_hotspot_map
  classDef page fill:#dbeafe,stroke:#1d4ed8,color:#1e3a8a,stroke-width:2px;
  classDef table fill:#ecfeff,stroke:#0e7490,color:#164e63;
  classDef dep fill:#fff7ed,stroke:#c2410c,color:#7c2d12,stroke-dasharray: 4 2;
  classDef file fill:#eef2ff,stroke:#6366f1,color:#3730a3;
  classDef api fill:#f0fdf4,stroke:#16a34a,color:#14532d;
  classDef pipeline fill:#f5f3ff,stroke:#7c3aed,color:#4c1d95;
  class 08_hotspot_map page;
  class t_otp_monthly,t_route_stops,t_routes,t_stops table;
  class d1_08_hotspot_map,d2_08_hotspot_map,d3_08_hotspot_map,d4_08_hotspot_map dep;
  class u1_01_data_ingestion,u2_01_data_ingestion,u3_01_data_ingestion,u4_01_data_ingestion,u5_01_data_ingestion file;
  class 01_data_ingestion pipeline;

Findings

Findings: Hot-Spot Map

Important: Derived Metric

Stop-level OTP is a derived metric: each stop inherits the average OTP of the routes serving it, weighted by trip frequency (trips_7d). It reflects route composition at each stop, not independently measured stop-level performance. A stop served by a single high-OTP route will appear "high-performing" even if that stop is a chronic delay point on that route. Conversely, a stop served by many routes will reflect the blended average of those routes.

Summary

6,212 stops were mapped with route-weighted OTP (after excluding 2 stops with null/NaN OTP due to zero total trips, and excluding routes with fewer than 12 months of data). Poor performance clusters in eastern Pittsburgh (Penn Hills, Squirrel Hill, Highland Park), while the best performance follows the light rail and busway corridors.

Geographic Patterns

  • Best corridors: The light rail T line (Beechview/Overbrook south to Library) and the East Busway (Wilkinsburg to downtown) form clear green bands on the map, with 80%+ OTP at most stops.
  • Worst corridors: Eastern neighborhoods served by Route 77 (Penn Hills) show the lowest stop-level OTP at 55.8%. The 61-series routes through McKeesport and Homestead also form a low-OTP cluster.
  • Downtown: Mixed performance. Stops in the Golden Triangle are served by many routes, so their weighted OTP reflects the system average (~65--70%).

Mode Context

The best-performing stops (88.4%) are all served exclusively by BUS routes -- specifically Route 18 (Manchester). The high-OTP corridor along the T line reflects rail's structural advantage (dedicated right-of-way), not independently measured stop performance. When interpreting the map:

  • Rail stops (light rail T line) appear green primarily because RAIL routes average ~84% OTP system-wide. These stops' high performance reflects mode advantage, not stop-specific factors.
  • Busway stops (East Busway, West Busway) also appear green for similar reasons -- dedicated right-of-way.
  • Genuinely high-performing bus stops include those on Route 18 (Manchester, 88.4%) and Route 39 (Brookline), which achieve high OTP on mixed-traffic streets.

Observations

  • The worst-performing stops (55.8%) are exclusively served by Route 77, the system's lowest-ranked route.
  • The best-performing stops (88.4%) are exclusively served by Route 18 (Manchester) -- a bus route, not rail.
  • Stops served by multiple routes tend toward the system mean, since the weighting blends good and bad routes.
  • The system average displayed in the chart title (unweighted stop-level average) treats every stop equally regardless of trip volume. A trip-weighted system average would differ slightly.

Caveats

  • The map uses a simple lat/lon scatter, not a true geographic projection. At Pittsburgh's latitude, this introduces minor distortion but the overall shape is recognizable.
  • Stops with null or NaN OTP values (2 stops) were excluded from the map.
  • OTP is averaged across all months (2019--2025), so the map doesn't show temporal changes.
  • Routes with fewer than 12 months of OTP data were excluded to avoid projecting noisy estimates onto the map.

Review History

  • 2026-02-11: RED-TEAM-REPORTS/2026-02-11-analyses-01-05-07-11.md -- 7 issues (1 significant). Documented derived-metric nature of stop OTP, added mode column and bus/rail context, added minimum 12-month filter for route OTP, clarified unweighted system average, corrected NaN claim, fixed hood="0" sentinel, added dropped-stop logging.

Output

Methods

Methods: Hot-Spot Map

Question

Where do poor-performing routes cluster geographically? Are there corridor-level bottlenecks visible on a map?

Approach

  • For each stop, compute the trip-weighted average OTP of all routes serving it. This is a derived metric ("route-weighted OTP"): each stop inherits the average OTP of the routes serving it, weighted by trip frequency (trips_7d). It reflects route composition at each stop, not independently measured stop-level performance.
  • Only include routes with at least 12 months of OTP data to avoid projecting noisy estimates onto the map.
  • Plot stops on a lat/lon scatter plot, colored by route-weighted OTP performance.
  • Use a diverging red-yellow-green colormap so low OTP areas are immediately visible.
  • Display the unweighted stop-level average as a reference (note: this is unweighted across stops, not weighted by trip volume).
  • Track the mode (BUS/RAIL) of routes serving each stop for context.
  • Stops with null or NaN OTP (due to zero total trips or missing data) are excluded from the map.

Data

Name Description Source
otp_monthly Monthly OTP per route (averaged across all months, routes with < 12 months excluded) prt.db table
route_stops Which stops are served by which routes, with trip counts prt.db table
routes Route metadata including mode prt.db table
stops Lat/lon coordinates prt.db table
shapes.txt Route polyline geometries GTFS file (data/GTFS/)
trips.txt Route-to-shape mapping GTFS file (data/GTFS/)

Output

  • output/hotspot_map.csv -- per-stop route-weighted OTP with coordinates
  • output/hotspot_map.png -- geographic scatter plot
  • output/hotspot_map.html -- interactive folium map over OpenStreetMap tiles with per-stop popups

Source Code

"""Geographic scatter plot of route-weighted OTP at each stop."""

from pathlib import Path

import folium
import polars as pl
from branca.colormap import LinearColormap

from prt_otp_analysis.common import analysis_dir, phase, query_to_polars, run_analysis, save_chart, save_csv, setup_plotting, weighted_mean

OUT = analysis_dir(__file__)

# OTP color-scale bounds for geographic maps.
OTP_MAP_VMIN = 0.5
OTP_MAP_VMAX = 0.9
GTFS = Path(__file__).resolve().parent.parent.parent / "data" / "GTFS"


def load_data() -> pl.DataFrame:
    """Load per-stop route-weighted OTP with coordinates and mode."""
    return query_to_polars("""
        SELECT rs.stop_id, rs.route_id, rs.trips_7d,
               s.lat, s.lon, s.hood, s.muni,
               r.mode,
               route_avg.avg_otp
        FROM route_stops rs
        JOIN stops s ON rs.stop_id = s.stop_id
        JOIN routes r ON rs.route_id = r.route_id
        JOIN (
            SELECT route_id, AVG(otp) AS avg_otp
            FROM otp_monthly
            GROUP BY route_id
            HAVING COUNT(*) >= 12
        ) route_avg ON rs.route_id = route_avg.route_id
    """)


def analyze(df: pl.DataFrame) -> pl.DataFrame:
    """Compute per-stop route-weighted OTP.

    Each stop inherits the average OTP of the routes serving it, weighted by
    trip frequency (trips_7d). This is a derived metric reflecting route
    composition at each stop, not independently measured stop-level performance.
    """
    # Collect modes serving each stop for later reporting
    stop_modes = (
        df.group_by("stop_id")
        .agg(modes=pl.col("mode").unique())
    )

    stop_otp_raw = (
        df.group_by(["stop_id", "lat", "lon", "hood", "muni"])
        .agg(
            weighted_otp=weighted_mean("avg_otp", "trips_7d"),
            route_count=pl.col("route_id").n_unique(),
            total_trips_7d=pl.col("trips_7d").sum(),
        )
    )

    # Log dropped stops
    n_before = stop_otp_raw.height
    stop_otp = (
        stop_otp_raw
        .filter(pl.col("weighted_otp").is_not_null() & pl.col("weighted_otp").is_not_nan())
        .sort("weighted_otp")
    )
    n_after = stop_otp.height
    n_dropped = n_before - n_after
    if n_dropped > 0:
        print(f"  {n_dropped} stops dropped due to null/NaN OTP (zero total trips or missing data)")

    # Join mode info
    stop_otp = stop_otp.join(stop_modes, on="stop_id", how="left")

    return stop_otp


def make_chart(df: pl.DataFrame) -> None:
    """Generate geographic scatter plot colored by OTP."""
    plt = setup_plotting()
    from matplotlib.colors import Normalize

    fig, ax = plt.subplots(figsize=(12, 10))

    lon = df["lon"].to_list()
    lat = df["lat"].to_list()
    otp = df["weighted_otp"].to_list()

    norm = Normalize(vmin=OTP_MAP_VMIN, vmax=OTP_MAP_VMAX)
    sc = ax.scatter(
        lon, lat, c=otp, cmap="RdYlGn", norm=norm,
        s=4, alpha=0.6, edgecolors="none",
    )

    fig.colorbar(sc, ax=ax, label="Route-Weighted OTP", shrink=0.8)

    system_avg = sum(otp) / len(otp)
    ax.set_xlabel("Longitude")
    ax.set_ylabel("Latitude")
    ax.set_title(f"PRT Route-Weighted OTP at Stop (unweighted stop avg: {system_avg:.1%})")
    ax.set_aspect("equal")

    save_chart(fig, OUT / "hotspot_map.png")


def load_route_shapes() -> dict[str, list[tuple[float, float]]]:
    """Load GTFS shapes and return one polyline per route (the longest variant)."""
    trips = pl.read_csv(
        GTFS / "trips.txt",
        columns=["route_id", "shape_id"],
        schema_overrides={"service_id": pl.Utf8},
    )
    shapes = pl.read_csv(GTFS / "shapes.txt")

    # Count points per shape to pick the most complete variant per route
    shape_lengths = shapes.group_by("shape_id").len()
    route_shapes = (
        trips.select("route_id", "shape_id")
        .unique()
        .join(shape_lengths, on="shape_id")
        .sort(["route_id", "len"], descending=[False, True])
        .group_by("route_id")
        .first()
    )
    best_shape_ids = set(route_shapes["shape_id"].to_list())

    # Build polylines from the selected shapes
    selected = shapes.filter(pl.col("shape_id").is_in(best_shape_ids))
    polylines: dict[str, list[tuple[float, float]]] = {}
    shape_to_route = dict(zip(
        route_shapes["shape_id"].to_list(),
        route_shapes["route_id"].to_list(),
    ))
    for shape_id, group in selected.sort("shape_pt_sequence").group_by("shape_id"):
        route_id = shape_to_route[shape_id[0]]
        polylines[str(route_id)] = list(zip(
            group["shape_pt_lat"].to_list(),
            group["shape_pt_lon"].to_list(),
        ))
    return polylines


def load_route_otp() -> dict[str, float]:
    """Load average OTP per route from the database (min 12 months)."""
    df = query_to_polars("""
        SELECT route_id, AVG(otp) AS avg_otp
        FROM otp_monthly
        GROUP BY route_id
        HAVING COUNT(*) >= 12
    """)
    return dict(zip(df["route_id"].to_list(), df["avg_otp"].to_list()))


def make_interactive_map(
    df: pl.DataFrame,
    route_shapes: dict[str, list[tuple[float, float]]],
    route_otp: dict[str, float],
) -> None:
    """Generate an interactive folium map with OTP-colored stop markers and route lines."""
    center_lat = df["lat"].mean()
    center_lon = df["lon"].mean()

    m = folium.Map(location=[center_lat, center_lon], zoom_start=11,
                   tiles="CartoDB positron")

    colormap = LinearColormap(
        colors=["#d73027", "#fee08b", "#1a9850"],  # red -> yellow -> green
        vmin=OTP_MAP_VMIN, vmax=OTP_MAP_VMAX,
        caption="Route-Weighted OTP",
    )

    # Route lines layer (added first so stops render on top)
    routes_layer = folium.FeatureGroup(name="Route Lines", show=False)
    for route_id, coords in sorted(route_shapes.items()):
        otp = route_otp.get(route_id)
        if otp is None:
            continue
        folium.PolyLine(
            locations=coords,
            color=colormap(otp),
            weight=3,
            opacity=0.7,
            popup=folium.Popup(
                f"<b>Route {route_id}</b><br>OTP: {otp:.1%}",
                max_width=200,
            ),
        ).add_to(routes_layer)
    routes_layer.add_to(m)

    # Stops layer
    stops_layer = folium.FeatureGroup(name="Stops")
    valid = df.filter(pl.col("weighted_otp").is_not_nan() & pl.col("weighted_otp").is_not_null())
    for row in valid.iter_rows(named=True):
        otp = row["weighted_otp"]
        hood = row["hood"] if row["hood"] and row["hood"] != "0" else "N/A"
        popup_text = (
            f"<b>Stop {row['stop_id']}</b><br>"
            f"Neighborhood: {hood}<br>"
            f"Municipality: {row['muni'] or 'N/A'}<br>"
            f"OTP: {otp:.1%}<br>"
            f"Routes: {row['route_count']}<br>"
            f"Weekly trips: {row['total_trips_7d']:,}"
        )
        folium.CircleMarker(
            location=[row["lat"], row["lon"]],
            radius=2,
            color=colormap(otp),
            weight=0,
            fill=True,
            fill_color=colormap(otp),
            fill_opacity=0.7,
            popup=folium.Popup(popup_text, max_width=250),
        ).add_to(stops_layer)
    stops_layer.add_to(m)

    colormap.add_to(m)
    folium.LayerControl().add_to(m)

    # Scale marker radius with zoom: 2px at zoom 11, 3px (~50% bigger) at zoom 15.
    zoom_js = folium.Element("""
    <script>
    document.addEventListener("DOMContentLoaded", function() {
        var map = Object.values(window).find(v => v instanceof L.Map);
        function scaleMarkers() {
            var zoom = map.getZoom();
            var radius = 2 * Math.pow(1.1, zoom - 11);
            map.eachLayer(function(layer) {
                if (layer instanceof L.CircleMarker && !(layer instanceof L.Circle)) {
                    layer.setRadius(radius);
                }
            });
        }
        map.on("zoomend", scaleMarkers);
    });
    </script>
    """)
    m.get_root().html.add_child(zoom_js)

    m.save(str(OUT / "hotspot_map.html"))
    print(f"  Interactive map saved to {OUT / 'hotspot_map.html'}")


@run_analysis(8, "Hot-Spot Map")
def main() -> None:
    """Entry point: load data, compute stop OTP, map, and save."""
    with phase("Loading data"):
        raw = load_data()
        print(f"  {len(raw):,} route-stop records loaded")

    with phase("Analyzing"):
        stop_otp = analyze(raw)
        print(f"  {len(stop_otp):,} stops with OTP computed")

        best = stop_otp.sort("weighted_otp", descending=True).head(3)
        worst = stop_otp.sort("weighted_otp").head(3)
        print("\n  Best-performing stops (route-weighted OTP):")
        for row in best.iter_rows(named=True):
            hood = row["hood"] if row["hood"] and row["hood"] != "0" else "N/A"
            modes = ", ".join(sorted(row["modes"])) if row.get("modes") else "N/A"
            print(f"    {row['stop_id']} ({hood}, {modes}): {row['weighted_otp']:.1%}")
        print("  Worst-performing stops (route-weighted OTP):")
        for row in worst.iter_rows(named=True):
            hood = row["hood"] if row["hood"] and row["hood"] != "0" else "N/A"
            modes = ", ".join(sorted(row["modes"])) if row.get("modes") else "N/A"
            print(f"    {row['stop_id']} ({hood}, {modes}): {row['weighted_otp']:.1%}")

    with phase("Saving CSV"):
        stop_otp_out = stop_otp.drop("modes")
        save_csv(stop_otp_out, OUT / "hotspot_map.csv")

    with phase("Generating chart"):
        make_chart(stop_otp)

    with phase("Loading GTFS route shapes"):
        route_shapes = load_route_shapes()
        route_otp = load_route_otp()
        print(f"  {len(route_shapes)} route shapes loaded")

    with phase("Generating interactive map"):
        make_interactive_map(stop_otp, route_shapes, route_otp)


if __name__ == "__main__":
    main()

Sources

NameTypeWhy It MattersOwnerFreshnessCaveat
otp_monthly table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5)
  • file data/routes_by_month.csv — Monthly route OTP source table in wide format.
  • file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications.
  • file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts.
  • file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes.
  • file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
route_stops table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5)
  • file data/routes_by_month.csv — Monthly route OTP source table in wide format.
  • file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications.
  • file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts.
  • file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes.
  • file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
routes table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5)
  • file data/routes_by_month.csv — Monthly route OTP source table in wide format.
  • file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications.
  • file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts.
  • file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes.
  • file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
stops table Primary analytical table used in this page's computations. Produced by Data Ingestion. Updated when the producing pipeline step is rerun. Coverage depends on upstream source availability and ETL assumptions.
Upstream sources (5)
  • file data/routes_by_month.csv — Monthly route OTP source table in wide format.
  • file data/PRT_Current_Routes_Full_System_de0e48fcbed24ebc8b0d933e47b56682.csv — Current route metadata and mode classifications.
  • file data/Transit_stops_(current)_by_route_e040ee029227468ebf9d217402a82fa9.csv — Current stop-to-route coverage and trip counts.
  • file data/PRT_Stop_Reference_Lookup_Table.csv — Historical stop reference file with geography attributes.
  • file data/average-ridership/12bb84ed-397e-435c-8d1b-8ce543108698.csv — Average ridership by route and month.
branca dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.
folium dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.
matplotlib dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.
polars dependency Runtime dependency required for this page's pipeline or analysis code. Open-source Python ecosystem maintainers. Version pinned by project environment until dependency updates are applied. Library updates may change behavior or defaults.