Analysis

35 - Boarding/Alighting Flow Analysis

Equity and Strategic Planning

Coverage: Coverage window unavailable for this page.

Built 2026-04-03 20:09 UTC · Commit 7c56b9a

Page Navigation

Analysis Navigation

Data Provenance

flowchart LR
  35_boarding_alighting_flows(["35 - Boarding/Alighting Flow Analysis"])
  f1_35_boarding_alighting_flows[/"data/bus-stop-usage/wprdc_stop_data.csv"/] --> 35_boarding_alighting_flows
  d1_35_boarding_alighting_flows(("numpy (lib)")) --> 35_boarding_alighting_flows
  d2_35_boarding_alighting_flows(("polars (lib)")) --> 35_boarding_alighting_flows
  classDef page fill:#dbeafe,stroke:#1d4ed8,color:#1e3a8a,stroke-width:2px;
  classDef table fill:#ecfeff,stroke:#0e7490,color:#164e63;
  classDef dep fill:#fff7ed,stroke:#c2410c,color:#7c2d12,stroke-dasharray: 4 2;
  classDef file fill:#eef2ff,stroke:#6366f1,color:#3730a3;
  classDef api fill:#f0fdf4,stroke:#16a34a,color:#14532d;
  classDef pipeline fill:#f5f3ff,stroke:#7c3aed,color:#4c1d95;
  class 35_boarding_alighting_flows page;
  class d1_35_boarding_alighting_flows,d2_35_boarding_alighting_flows dep;
  class f1_35_boarding_alighting_flows file;

Findings

Findings: Boarding/Alighting Flow Analysis

Summary

PRT's system-wide boarding/alighting balance is nearly perfect (ratio 1.003), but individual stops show strong directional asymmetry. The top generators (net boardings) are outbound departure points on Smithfield St, 5th Ave, and Liberty Ave in downtown. The top attractors (net alightings) are inbound arrival points on Wood St, Liberty Ave at Gateway, and North Side Station. The directional data confirms a classic radial commuter pattern: inbound trips net +22,385 boardings vs outbound trips net -24,197 (i.e., people board going in and alight going out).

Key Numbers

130,121 avg weekday boardings vs 129,684 avg alightings (ratio 1.003)
3,103 (46%) stops are net generators; 3,442 (51%) are net attractors; 174 (3%) balanced
Inbound on/off ratio: 1.36 (net +11,207 boardings)
Outbound on/off ratio: 0.72 (net -12,234 boardings)
Top generator: Smithfield St at Sixth Ave (+1,993 net/day)
Top attractor: Wood St btw Forbes & Fifth (-1,350 net/day)
Median stop on/off ratio: 0.76 (slight attractor skew)
38.8% of stops are strong generators (ratio > 1.5); 48.0% are strong attractors (ratio < 0.67)

Observations

The network is classically radial. Inbound stops generate a strong net surplus of boardings (~+11K/day), meaning suburban riders board inbound and alight downtown. Outbound stops show the mirror: net ~-12K, as downtown riders board outbound and alight in suburbs.
Downtown has both strong generators and attractors, on different streets. Smithfield St and 5th Ave are net generators (people boarding outbound departures), while Wood St and Liberty Ave/Gateway are net attractors (people arriving inbound). This reflects the one-way street grid routing buses through different downtown streets for inbound vs outbound.
Busway stations appear on both lists. Wilkinsburg Station shows split behavior: Platform C is a generator (+837/day, outbound departures from the East Busway) while Platform A is an attractor (-935/day, inbound arrivals). This confirms the station's role as a major suburban interchange.
North Side Station is the #5 attractor (-994 net/day), consistent with its role as a major inbound terminus where light rail and busway passengers alight.
Oakland/university stops are generators: 5th Ave opp Thackeray (+871) and Forbes at Morewood (+868) show students boarding outbound to return home.
The median stop is a slight attractor (ratio 0.76), reflecting that many suburban stops have more people getting off (returning home) than getting on.

Discussion

The boarding/alighting flow pattern is a direct fingerprint of Pittsburgh's commuter geography. The strong inbound-boarding / outbound-alighting asymmetry (+11K/-12K) confirms that PRT operates as a radial commuter network focused on downtown. The slight imbalance (outbound net is ~1,000 higher than inbound net) likely reflects "Both"-direction stops that aggregate mixed flows.

The street-level generator/attractor split in downtown is operationally informative: Smithfield St outbound stops need boarding capacity (shelters, queuing space, real-time info), while Wood St inbound stops need alighting capacity (wide sidewalks, clear exits). This differs from a simple "downtown = destination" model and has implications for stop amenity design.

The presence of Oakland as a net generator is notable -- it suggests the 5th Ave/Forbes corridor serves as a secondary hub where students and workers board to travel outbound, not just as a destination from downtown. This bidirectional flow may explain why 5th Ave routes face OTP challenges: high dwell time at stops that serve both directions heavily.

For service planning, the net flow data identifies where passenger demand is structurally asymmetric. Routes serving strong generators could potentially benefit from express or limited-stop variants in the outbound direction, while routes terminating at strong attractors could justify higher inbound frequency.

Caveats

The "direction" field (Inbound/Outbound/Both) groups stops, not individual trips. Stops coded "Both" handle mixed flows that don't cleanly separate.
Net flow reflects the first stop a person is counted at, not their complete trip. Transfer passengers are counted at each stop they use.
Pre-pandemic weekday data (FY2019); commuter patterns may have shifted with remote work.
Physical stops serving multiple routes aggregate flows across routes, which may mask route-specific patterns.

Review History

2026-02-27: RED-TEAM-REPORTS/2026-02-27-analyses-31-35.md — 1 significant issue. Fixed double-counting in load_directional_flows(): now averages across datekeys per stop-route-direction before summing. Directional net flows corrected (~halved); ratios unchanged.

Output

image net_flow_map.png
geographic scatter plot colored by net flow (blue = generator, red = attractor).

image top_generators_attractors.png
horizontal bar chart of top 15 generators and attractors.

No interactive outputs declared.

data stop_net_flow.csv

per-stop boardings, alightings, net flow, and classification.

Preview CSV

Expand to load preview.

Methods

Methods: Boarding/Alighting Flow Analysis

Question

Which stops are major trip generators (net boardings) vs trip attractors (net alightings), and how does the boarding/alighting balance vary by direction and geography?

Approach

Aggregate pre-pandemic weekday ridership to the physical-stop level, keeping boardings and alightings separate.
Compute net flow per stop: net = avg_ons - avg_offs. Positive = net generator (more people board than alight); negative = net attractor.
Classify stops by net flow magnitude and sign; identify the top generators and attractors.
Examine the inbound/outbound direction dimension: inbound stops should skew toward alightings (attractor) and outbound toward boardings (generator) if the network is radial.
Map net flow geographically to reveal land-use patterns (residential origins vs employment/commercial destinations).
Compute the boarding-to-alighting ratio per stop and examine its distribution.

Data

Name	Description	Source
`wprdc_stop_data.csv`	Stop-level boardings/alightings by route, direction, and period	Local CSV (`data/bus-stop-usage/`)

Output

output/stop_net_flow.csv -- per-stop boardings, alightings, net flow, and classification
output/net_flow_map.png -- geographic scatter plot colored by net flow (blue = generator, red = attractor)
output/top_generators_attractors.png -- horizontal bar chart of top 15 generators and attractors

  Source Code
    
      
      """Analyze net boarding-alighting flows by stop to identify trip generators and attractors."""

import numpy as np
import polars as pl

from prt_otp_analysis.common import DATA_DIR, analysis_dir, phase, run_analysis, save_chart, save_csv, setup_plotting

OUT = analysis_dir(__file__)


def load_stop_flows() -> pl.DataFrame:
    """Load pre-pandemic weekday boardings/alightings aggregated to physical-stop level."""
    csv_path = DATA_DIR / "bus-stop-usage" / "wprdc_stop_data.csv"
    df = pl.read_csv(csv_path, null_values=["NA", ""])

    df = df.filter(
        (pl.col("time_period") == "Pre-pandemic")
        & (pl.col("serviceday") == "Weekday")
    )

    # Average across datekeys per stop-route, then sum across routes per physical stop
    per_stop_route = (
        df.group_by(["stop_id", "route_name"])
        .agg(
            pl.col("avg_ons").mean().alias("avg_ons"),
            pl.col("avg_offs").mean().alias("avg_offs"),
            pl.col("stop_name").first(),
            pl.col("latitude").first().alias("lat"),
            pl.col("longitude").first().alias("lon"),
            pl.col("direction").first(),
            pl.col("mode").first(),
        )
    )

    per_stop = (
        per_stop_route.group_by("stop_id")
        .agg(
            pl.col("avg_ons").sum(),
            pl.col("avg_offs").sum(),
            pl.col("stop_name").first(),
            pl.col("lat").first(),
            pl.col("lon").first(),
            pl.col("direction").first(),
            pl.col("mode").first(),
        )
        .with_columns(
            (pl.col("avg_ons") - pl.col("avg_offs")).alias("net_flow"),
            (pl.col("avg_ons") + pl.col("avg_offs")).alias("total_usage"),
        )
        .with_columns(
            pl.when(pl.col("net_flow") > 0)
            .then(pl.lit("Generator"))
            .when(pl.col("net_flow") < 0)
            .then(pl.lit("Attractor"))
            .otherwise(pl.lit("Balanced"))
            .alias("flow_type"),
            pl.when(pl.col("avg_offs") > 0)
            .then(pl.col("avg_ons") / pl.col("avg_offs"))
            .otherwise(None)
            .alias("on_off_ratio"),
        )
    )
    return per_stop


def load_directional_flows() -> pl.DataFrame:
    """Load direction-level boarding/alighting summary."""
    csv_path = DATA_DIR / "bus-stop-usage" / "wprdc_stop_data.csv"
    df = pl.read_csv(csv_path, null_values=["NA", ""])

    df = df.filter(
        (pl.col("time_period") == "Pre-pandemic")
        & (pl.col("serviceday") == "Weekday")
        & pl.col("direction").is_not_null()
    )

    # Average across datekeys per stop-route-direction first, then sum to direction level
    per_stop_route = (
        df.group_by(["stop_id", "route_name", "direction"])
        .agg(
            pl.col("avg_ons").mean().alias("avg_ons"),
            pl.col("avg_offs").mean().alias("avg_offs"),
        )
    )

    return (
        per_stop_route.group_by("direction")
        .agg(
            pl.col("avg_ons").sum().alias("total_ons"),
            pl.col("avg_offs").sum().alias("total_offs"),
            pl.len().alias("n_records"),
        )
        .with_columns(
            (pl.col("total_ons") - pl.col("total_offs")).alias("net_flow"),
            pl.when(pl.col("total_offs") > 0)
            .then(pl.col("total_ons") / pl.col("total_offs"))
            .otherwise(None)
            .alias("on_off_ratio"),
        )
    )


def make_charts(df: pl.DataFrame) -> None:
    """Generate net flow map and top generators/attractors bar chart."""
    plt = setup_plotting()

    # Filter to stops with meaningful usage
    active = df.filter(pl.col("total_usage") > 1)

    # --- Net flow geographic map ---
    fig, ax = plt.subplots(figsize=(10, 10))

    net = np.array(active["net_flow"].to_list())
    # Clamp for color scale
    net_clipped = np.clip(net, -200, 200)
    sizes = np.clip(np.abs(net) * 0.05, 3, 30)

    sc = ax.scatter(
        active["lon"].to_list(), active["lat"].to_list(),
        c=net_clipped, cmap="RdBu", s=sizes, alpha=0.6,
        vmin=-200, vmax=200, edgecolors="none",
    )
    cbar = plt.colorbar(sc, ax=ax, label="Net Flow (ons - offs)", shrink=0.6)
    ax.set_xlabel("Longitude")
    ax.set_ylabel("Latitude")
    ax.set_title("Stop Net Flow: Generators (blue) vs Attractors (red)")
    save_chart(fig, OUT / "net_flow_map.png")

    # --- Top generators and attractors ---
    fig, axes = plt.subplots(1, 2, figsize=(16, 8))

    # Top generators (highest net flow)
    top_gen = df.sort("net_flow", descending=True).head(15)
    names = [n[:35] for n in top_gen["stop_name"].to_list()]
    vals = top_gen["net_flow"].to_list()
    axes[0].barh(range(len(names)), vals, color="#3b82f6", edgecolor="white")
    axes[0].set_yticks(range(len(names)))
    axes[0].set_yticklabels(names, fontsize=8)
    axes[0].invert_yaxis()
    axes[0].set_xlabel("Net Flow (avg daily ons - offs)")
    axes[0].set_title("Top 15 Trip Generators\n(more boardings than alightings)")
    for i, v in enumerate(vals):
        axes[0].text(v + 5, i, f"+{v:.0f}", va="center", fontsize=8)

    # Top attractors (lowest net flow)
    top_attr = df.sort("net_flow").head(15)
    names = [n[:35] for n in top_attr["stop_name"].to_list()]
    vals = top_attr["net_flow"].to_list()
    axes[1].barh(range(len(names)), [abs(v) for v in vals], color="#ef4444", edgecolor="white")
    axes[1].set_yticks(range(len(names)))
    axes[1].set_yticklabels(names, fontsize=8)
    axes[1].invert_yaxis()
    axes[1].set_xlabel("Net Flow magnitude (avg daily offs - ons)")
    axes[1].set_title("Top 15 Trip Attractors\n(more alightings than boardings)")
    for i, v in enumerate(vals):
        axes[1].text(abs(v) + 5, i, f"{v:.0f}", va="center", fontsize=8)

    save_chart(fig, OUT / "top_generators_attractors.png")


@run_analysis(35, "Boarding/Alighting Flow Analysis")
def main() -> None:
    """Entry point: load data, analyze boarding/alighting flows, chart."""

    with phase("Loading stop-level flows (pre-pandemic weekday)"):
        df = load_stop_flows()
        print(f"  {len(df):,} unique physical stops")

    n_gen = len(df.filter(pl.col("flow_type") == "Generator"))
    n_attr = len(df.filter(pl.col("flow_type") == "Attractor"))
    n_bal = len(df.filter(pl.col("flow_type") == "Balanced"))
    print(f"\nFlow classification:")
    print(f"  Generators: {n_gen:,} ({n_gen / len(df) * 100:.0f}%)")
    print(f"  Attractors: {n_attr:,} ({n_attr / len(df) * 100:.0f}%)")
    print(f"  Balanced:   {n_bal:,} ({n_bal / len(df) * 100:.0f}%)")

    total_ons = df["avg_ons"].sum()
    total_offs = df["avg_offs"].sum()
    print(f"\nSystem totals:")
    print(f"  Total boardings: {total_ons:,.0f}/day")
    print(f"  Total alightings: {total_offs:,.0f}/day")
    print(f"  System on/off ratio: {total_ons / total_offs:.3f}")

    print("\nTop 10 generators (net boardings):")
    for row in df.sort("net_flow", descending=True).head(10).iter_rows(named=True):
        print(f"  {row['stop_name'][:40]:40s}  net +{row['net_flow']:,.0f}/day  (ons={row['avg_ons']:,.0f}, offs={row['avg_offs']:,.0f})")

    print("\nTop 10 attractors (net alightings):")
    for row in df.sort("net_flow").head(10).iter_rows(named=True):
        print(f"  {row['stop_name'][:40]:40s}  net {row['net_flow']:,.0f}/day  (ons={row['avg_ons']:,.0f}, offs={row['avg_offs']:,.0f})")

    print("\nDirectional analysis:")
    dir_flows = load_directional_flows()
    for row in dir_flows.sort("direction").iter_rows(named=True):
        print(f"  {row['direction']:10s}: ons={row['total_ons']:,.0f}, offs={row['total_offs']:,.0f}, "
              f"ratio={row['on_off_ratio']:.2f}, net={row['net_flow']:+,.0f}")

    print("\nOn/off ratio distribution:")
    valid_ratio = df.filter(pl.col("on_off_ratio").is_not_null() & pl.col("on_off_ratio").is_finite())
    print(f"  Median on/off ratio: {valid_ratio['on_off_ratio'].median():.2f}")
    print(f"  Mean on/off ratio: {valid_ratio['on_off_ratio'].mean():.2f}")
    pct_gen = len(valid_ratio.filter(pl.col("on_off_ratio") > 1.5)) / len(valid_ratio) * 100
    pct_attr = len(valid_ratio.filter(pl.col("on_off_ratio") < 0.67)) / len(valid_ratio) * 100
    print(f"  Strong generators (ratio > 1.5): {pct_gen:.1f}% of stops")
    print(f"  Strong attractors (ratio < 0.67): {pct_attr:.1f}% of stops")

    with phase("Saving CSV"):
        save_csv(df, OUT / "stop_net_flow.csv")

    with phase("Generating charts"):
        make_charts(df)


if __name__ == "__main__":
    main()

    

    

Sources

Name	Type	Why It Matters	Owner	Freshness	Caveat
data/bus-stop-usage/wprdc_stop_data.csv	file	Referenced via DATA_DIR path composition in analysis script.	Local project data owner not specified.	Snapshot file; refresh by rerunning its pipeline step.	May lag upstream source updates.
numpy	dependency	Runtime dependency required for this page's pipeline or analysis code.	Open-source Python ecosystem maintainers.	Version pinned by project environment until dependency updates are applied.	Library updates may change behavior or defaults.
polars	dependency	Runtime dependency required for this page's pipeline or analysis code.	Open-source Python ecosystem maintainers.	Version pinned by project environment until dependency updates are applied.	Library updates may change behavior or defaults.