Reference

Glossary

Common statistical and transit terms used throughout the analyses.

Built 2026-03-03 02:23 UTC ยท Commit defd5c8

Transit Operations

On-Time Performance (OTP)

Share of trips that meet the agency's on-time threshold, usually expressed from 0 to 1.

Trip-Weighted OTP

OTP average where each route is weighted by its scheduled trip volume.

Ridership-Weighted OTP

OTP average where each route is weighted by passenger volume rather than trip count.

Day Type

Service category such as weekday, Saturday, or Sunday that affects schedule and demand patterns.

Route Span

Maximum geographic distance covered by a route across its served stops.

Busway Route

Route operating on dedicated right-of-way for part of its alignment.

Descriptive Statistics

Mean

Arithmetic average of values, computed as sum divided by count.

Median

Middle value of an ordered distribution, robust to extreme outliers.

Standard Deviation (SD)

Dispersion measure showing how far values tend to deviate from the mean.

Variance

Squared standard deviation used in many statistical formulas.

Percentile

Value below which a given percentage of observations falls.

Interquartile Range (IQR)

Difference between the 75th and 25th percentiles, capturing the middle half of data.

Correlation and Regression

Pearson Correlation

Linear association metric ranging from -1 to 1.

Spearman Correlation

Rank-based monotonic association metric less sensitive to nonlinearity and outliers.

Partial Correlation

Correlation between two variables after controlling for one or more covariates.

Ordinary Least Squares (OLS)

Regression method that estimates coefficients by minimizing squared residuals.

R-squared (R2)

Fraction of outcome variance explained by a regression model.

Adjusted R-squared

R2 variant that penalizes unnecessary predictors.

Standardized Coefficient (Beta)

Regression coefficient scaled in standard deviation units for cross-predictor comparison.

Variance Inflation Factor (VIF)

Diagnostic for multicollinearity among predictors.

Nested Model F-Test

Test comparing a restricted model to an expanded model to assess added explanatory value.

Degrees of Freedom (df)

Number of independent pieces of information remaining after estimating model parameters.

Hypothesis Testing

P-Value

Probability of observing results at least this extreme under the null hypothesis.

Confidence Interval (CI)

Interval estimate that captures plausible parameter values at a chosen confidence level.

Paired t-Test

Mean-difference test for matched observations measured on the same units.

Welch t-Test

Two-sample t-test variant that does not assume equal variances.

Mann-Whitney U Test

Non-parametric two-group comparison based on rank ordering.

Kruskal-Wallis Test

Non-parametric multi-group comparison using ranked observations.

Wilcoxon Signed-Rank Test

Non-parametric paired comparison based on signed ranks of differences.

Bonferroni Correction

Multiple-testing adjustment that scales p-value thresholds by the number of tests.

Time Series and Forecasting Concepts

Rolling Mean

Moving-window average used to smooth short-term volatility.

Rolling Z-Score

Standardized deviation from a rolling mean used to detect anomalies.

Seasonal Decomposition

Separation of a time series into trend, seasonal, and residual components.

Detrending

Removal of long-run trend to isolate short-run or relative variation.

Lagged Cross-Correlation

Correlation of two series at offset time lags.

Granger Causality

Test of whether past values of one series improve prediction of another series.

Baseline Indexing

Rescaling series to a reference period equal to 100 for comparability.

Clustering and Concentration

Hierarchical Clustering

Iterative grouping method that forms a nested tree of clusters.

Dendrogram

Tree visualization showing hierarchical cluster merges and distances.

Silhouette Score

Cluster-quality metric measuring cohesion within clusters and separation between clusters.

Gini Coefficient

Inequality metric on a 0 to 1 scale used for concentration analysis.

Lorenz Curve

Cumulative-share plot used to visualize distributional inequality.

Pareto Concentration

Pattern where a small share of units accounts for a large share of outcomes.

Data Quality and Causal Caveats

Simpson's Paradox

Aggregated trends that reverse or change direction after stratification.

Regression to the Mean

Tendency of extreme observations to move closer to average on repeated measurement.

Ecological Fallacy

Error of inferring individual-level behavior from group-level aggregates.

Selection Bias

Distortion caused by non-random inclusion of observations or intervention targets.

Statistical Power

Probability of detecting a true effect when it exists.

Survivorship Bias

Bias introduced by observing only units that remain after attrition or filtering.