Analysis

34 - Ridership Concentration (Pareto)

Equity and Strategic Planning

Coverage: 2019-01 to 2025-11 (from otp_monthly).

Built 2026-03-03 02:23 UTC · Commit defd5c8

Page Navigation

Analysis Navigation

Data Provenance

flowchart LR
  34_ridership_concentration(["34 - Ridership Concentration (Pareto)"])
  f1_34_ridership_concentration[/"data/bus-stop-usage/wprdc_stop_data.csv"/] --> 34_ridership_concentration
  t_otp_monthly[("otp_monthly")] --> 34_ridership_concentration
  01_data_ingestion[["Data Ingestion"]] --> t_otp_monthly
  t_routes[("routes")] --> 34_ridership_concentration
  01_data_ingestion[["Data Ingestion"]] --> t_routes
  d1_34_ridership_concentration(("numpy (lib)")) --> 34_ridership_concentration
  d2_34_ridership_concentration(("polars (lib)")) --> 34_ridership_concentration
  d3_34_ridership_concentration(("scipy (lib)")) --> 34_ridership_concentration
  classDef page fill:#dbeafe,stroke:#1d4ed8,color:#1e3a8a,stroke-width:2px;
  classDef table fill:#ecfeff,stroke:#0e7490,color:#164e63;
  classDef dep fill:#fff7ed,stroke:#c2410c,color:#7c2d12,stroke-dasharray: 4 2;
  classDef file fill:#eef2ff,stroke:#6366f1,color:#3730a3;
  classDef api fill:#f0fdf4,stroke:#16a34a,color:#14532d;
  classDef pipeline fill:#f5f3ff,stroke:#7c3aed,color:#4c1d95;
  class 34_ridership_concentration page;
  class t_otp_monthly,t_routes table;
  class d1_34_ridership_concentration,d2_34_ridership_concentration,d3_34_ridership_concentration dep;
  class f1_34_ridership_concentration file;
  class 01_data_ingestion pipeline;

Findings

Findings: Ridership Concentration (Pareto)

Summary

PRT ridership is extremely concentrated: just 2% of stops serve 50% of all weekday riders, and 14% of stops serve 80%. The system-wide Gini coefficient is 0.82, indicating very high inequality in stop-level usage. However, per-route ridership concentration (Gini) has essentially zero correlation with that route's OTP (r = -0.016, p = 0.88), meaning whether a route's riders are clustered at a few stops or spread evenly has no bearing on schedule reliability.

Key Numbers

2.2% of stops serve 50% of ridership
13.9% of stops serve 80% of ridership
27.9% of stops serve 90% of ridership
System-wide Gini = 0.824
Per-route Gini range: 0.338 - 0.890 (median 0.649)
95 routes with >= 3 stops analyzed
90 routes matched to OTP data
Gini vs OTP (bus-only): Pearson r = -0.016 (p = 0.879), Spearman rho = 0.103 (p = 0.339)

Observations

The Pareto curve is steep: the top ~150 stops (out of 6,700+) account for half of all weekday boardings and alightings. This is more extreme than a classic 80/20 rule -- it's closer to a 2/50 pattern.
Most stops see very little usage: the median stop handles only ~7 riders/day, while the top stops see 2,000-5,800/day. The bottom 70% of stops collectively serve only 10% of ridership.
Route-level concentration varies widely: some routes have Gini as low as 0.34 (relatively even usage across stops) while others reach 0.89 (nearly all ridership at a few stops). Flyer/express routes tend to have higher Gini since ridership clusters at downtown endpoints.
Concentration does not predict OTP. The scatter plot shows no trend at all -- the regression line is essentially flat. Routes with highly concentrated ridership perform no better or worse than those with evenly distributed usage.

Discussion

The extreme system-wide concentration (Gini = 0.82) reinforces the stop consolidation finding from Analysis 31: most stops contribute very little ridership, and removing the lowest-usage ones would affect few riders while potentially improving OTP by reducing stop count.

The null result for Gini vs OTP is notable. One might hypothesize that routes with concentrated ridership would have better OTP (less dwell time at most stops), but this doesn't hold. This suggests that dwell time at individual stops is not a dominant factor in OTP variance -- the time cost of stopping (deceleration, door opening, acceleration) matters more than the time cost of boarding passengers. This aligns with the Analysis 07 finding that raw stop count, not passenger volume, drives OTP.

The 2/50 concentration ratio has resource allocation implications: if PRT focused infrastructure investment (shelters, real-time signs, ADA upgrades) on just 150 stops, it would reach half of all riders. The current shelter coverage of 7% (Analysis 32) suggests significant room to target the highest-impact locations.

Caveats

Stop-level usage data is from FY2019; current patterns may differ, especially post-pandemic.
Gini is computed from stop-route combinations, not physical stops. Routes sharing physical stops may inflate the apparent concentration.
The OTP data covers a longer time range (2019-2025) than the usage snapshot (2019), so the correlation compares static usage structure against time-averaged OTP.
Very short routes (< 3 stops) are excluded from the Gini analysis, which drops a few incline and shuttle routes.

Output

image gini_vs_otp.png
scatter plot of route Gini vs average OTP.

image pareto_curve.png
system-wide Pareto curve.

No interactive outputs declared.

data pareto_system.csv

cumulative ridership share by stop rank.

Preview CSV

Expand to load preview.

data route_gini.csv

per-route Gini coefficient and OTP.

Preview CSV

Expand to load preview.

Methods

Methods: Ridership Concentration (Pareto)

Question

How concentrated is ridership across stops? What fraction of stops serves 80% of riders, and does ridership concentration on a route correlate with that route's OTP?

Approach

Aggregate pre-pandemic weekday stop-level ridership (datekeys 201909, 202001) to physical-stop level and per-route level.
System-wide Pareto: sort all stops by usage, compute cumulative share, and find the fraction of stops that serve 50%, 80%, and 90% of total ridership.
Per-route Gini coefficient: for each route, compute the Gini coefficient of stop-level usage as a concentration metric (0 = perfectly even, 1 = all ridership at one stop).
Join per-route Gini with route-level average OTP from the database and test for correlation (Pearson, Spearman).
Generate a system-wide Pareto curve and a scatter plot of Gini vs OTP by route.

Data

Name	Description	Source
`wprdc_stop_data.csv`	Stop-level boardings/alightings	Local CSV (`data/bus-stop-usage/`)
`otp_monthly`	Monthly OTP per route	`prt.db` table
`routes`	Route name and mode	`prt.db` table

Output

output/pareto_system.csv -- cumulative ridership share by stop rank
output/route_gini.csv -- per-route Gini coefficient and OTP
output/pareto_curve.png -- system-wide Pareto curve
output/gini_vs_otp.png -- scatter plot of route Gini vs average OTP

Sources

Name	Type	Why It Matters	Owner	Freshness	Caveat
data/bus-stop-usage/wprdc_stop_data.csv	file	Referenced via DATA_DIR path composition in analysis script.	Local project data owner not specified.	Snapshot file; refresh by rerunning its pipeline step.	May lag upstream source updates.
otp_monthly	table	Primary analytical table used in this page's computations.	Produced by Data Ingestion.	Updated when the producing pipeline step is rerun.	Coverage depends on upstream source availability and ETL assumptions.
routes	table	Primary analytical table used in this page's computations.	Produced by Data Ingestion.	Updated when the producing pipeline step is rerun.	Coverage depends on upstream source availability and ETL assumptions.
numpy	dependency	Runtime dependency required for this page's pipeline or analysis code.	Open-source Python ecosystem maintainers.	Version pinned by project environment until dependency updates are applied.	Library updates may change behavior or defaults.
polars	dependency	Runtime dependency required for this page's pipeline or analysis code.	Open-source Python ecosystem maintainers.	Version pinned by project environment until dependency updates are applied.	Library updates may change behavior or defaults.
scipy	dependency	Runtime dependency required for this page's pipeline or analysis code.	Open-source Python ecosystem maintainers.	Version pinned by project environment until dependency updates are applied.	Library updates may change behavior or defaults.