Reference
Data Dictionary
Database tables, columns, types, and known data quality notes for the PRT OTP dataset.
Built 2026-04-03 20:09 UTC · Commit 7c56b9a
Overview
Pittsburgh Regional Transit (PRT) on-time performance and system data, normalized into a SQLite database.
| Property | Value |
|---|---|
| Database | data/prt.db (SQLite) |
| Built by | src/prt_otp_analysis/build_db.py |
| Rebuild | uv run python src/prt_otp_analysis/build_db.py |
Database Schema
| Table | Rows | Description |
|---|---|---|
routes |
105 | Dimension: one row per route |
stops |
6,466 | Dimension: one row per physical stop |
route_stops |
11,078 | Bridge: which stops serve which routes |
stop_reference |
7,554 | Dimension: historical stop reference |
otp_monthly |
7,651 | Fact: monthly on-time performance |
ntd_agency |
2,340 | Dimension: NTD agency-mode-TOS combos |
ntd_ridership |
673,920 | Fact: monthly UPT by agency/mode/TOS |
ntd_annual_service |
93,188 | Fact: annual VRH/VRM/UPT/VOMS/fares/opexp by agency |
otp_null_classification |
1,064 | Ref: why each missing OTP value is null |
allegheny_go_weekly |
~96 | Fact: weekly Allegheny Go program ridership |
routes
| Column | Type | Description |
|---|---|---|
route_id |
TEXT PK | Route code ("1", "P1", "BLUE") |
route_name |
TEXT | Human-readable name |
mode |
TEXT | BUS, RAIL, INCLINE, or UNKNOWN (for historical routes) |
stops
| Column | Type | Description |
|---|---|---|
stop_id |
TEXT PK | PRT stop identifier |
stop_code |
INTEGER | Numeric stop code |
stop_name |
TEXT | Stop name |
lat |
REAL | Latitude |
lon |
REAL | Longitude |
county |
TEXT | County |
muni |
TEXT | Municipality |
hood |
TEXT | Neighborhood (nullable) |
route_stops
| Column | Type | Description |
|---|---|---|
route_id |
TEXT FK | References routes.route_id |
stop_id |
TEXT FK | References stops.stop_id |
direction |
TEXT | "IB", "OB", or "IB,OB" |
trips_wd |
INTEGER | Weekday trips |
trips_sa |
INTEGER | Saturday trips |
trips_su |
INTEGER | Sunday trips |
trips_7d |
INTEGER | Total weekly trips |
svc_days |
TEXT | Service days (e.g. "WD,SA,SU") |
Primary key: (route_id, stop_id, direction)
stop_reference
Historical stop lookup table -- all stops ever in the PRT system.
| Column | Type | Description |
|---|---|---|
stop_id |
TEXT PK | PRT stop identifier |
stop_code |
INTEGER | Numeric stop code |
stop_name |
TEXT | Stop name |
stop_source |
TEXT | GTFS feed that introduced it |
public_stop |
TEXT | "yes" / "no" |
lat |
REAL | Latitude |
lon |
REAL | Longitude |
mode |
TEXT | BUS, INCLINE, etc. |
first_served |
TEXT | First GTFS feed version code |
last_served |
TEXT | Last GTFS feed version code |
county |
TEXT | County |
muni |
TEXT | Municipality |
hood |
TEXT | Neighborhood (nullable) |
otp_monthly
| Column | Type | Description |
|---|---|---|
route_id |
TEXT FK | References routes.route_id |
month |
TEXT | "2019-01" through "2025-12" |
otp |
REAL | On-time percentage, 0.0 -- 1.0 |
Primary key: (route_id, month). Rows with NULL OTP in the source CSV are excluded.
ntd_agency
National Transit Database agency dimension — one row per agency-mode-TOS combination.
| Column | Type | Description |
|---|---|---|
ntd_id |
INTEGER PK | NTD agency ID (e.g. 30022 = PRT) |
agency_name |
TEXT | Agency name |
mode |
TEXT PK | Mode code (MB=bus, LR=light rail, etc.) |
tos |
TEXT PK | Type of service (DO=directly operated, PT=purchased transport) |
hq_city |
TEXT | Headquarters city |
hq_state |
TEXT | Headquarters state |
uza_name |
TEXT | Urbanized area name |
Primary key: (ntd_id, mode, tos). Source: NTD Monthly Module Excel, Master sheet.
ntd_ridership
Monthly unlinked passenger trips (UPT) from the NTD Monthly Module, Jan 2002–Dec 2025.
| Column | Type | Description |
|---|---|---|
ntd_id |
INTEGER FK | References ntd_agency.ntd_id |
mode |
TEXT FK | Mode code |
tos |
TEXT FK | Type of service |
month |
TEXT PK | "2002-01" through "2025-12" |
upt |
INTEGER | Unlinked passenger trips (nullable) |
Primary key: (ntd_id, mode, tos, month). Source: NTD Monthly Module Excel, UPT sheet. Built by src/prt_otp_analysis/ntd_ridership.py.
ntd_annual_service
Annual service and financial metrics from the NTD TS2.2 "Service Data and Operating Expenses Time-Series by System" workbook, 1991–2024. System-level (all modes aggregated).
| Column | Type | Description |
|---|---|---|
ntd_id |
INTEGER PK | NTD agency ID (e.g. 30022 = PRT) |
agency_name |
TEXT | Agency name |
city |
TEXT | Headquarters city |
state |
TEXT | Headquarters state |
uza_name |
TEXT | Primary urbanized area name |
year |
INTEGER PK | Report year (1991–2024) |
vrh |
REAL | Vehicle Revenue Hours (nullable) |
vrm |
REAL | Vehicle Revenue Miles (nullable) |
upt |
REAL | Unlinked Passenger Trips (nullable) |
voms |
REAL | Vehicles Operated in Maximum Service (nullable) |
fares |
REAL | Fare revenue in USD (nullable) |
opexp |
REAL | Total operating expenses in USD (nullable) |
opexp_vo |
REAL | Vehicle operations expenses in USD (nullable) |
opexp_vm |
REAL | Vehicle maintenance expenses in USD (nullable) |
opexp_nvm |
REAL | Non-vehicle maintenance expenses in USD (nullable) |
opexp_ga |
REAL | General administration expenses in USD (nullable) |
Primary key: (ntd_id, year). Source: NTD TS2.2 Excel workbook. Built by src/prt_otp_analysis/ntd_service.py.
Route ID Reconciliation
Route codes are extracted from routes_by_month.csv by splitting on " - " (first token).
| Mapping | Notes |
|---|---|
MONONGAHELA INCLINE -> MI |
Name used in OTP data maps to system code |
37, 42, P2, RLSH, SWL |
In OTP data only (historical/temporary routes) |
DQI, Y1, Y45--Y49 |
In current system only (no OTP data) |
otp_null_classification
Classification of every route-month in the OTP date range that lacks an OTP value, cross-referenced against scheduled trips and ridership data.
| Column | Type | Description |
|---|---|---|
route_id |
TEXT PK | References routes.route_id |
month |
TEXT PK | "2019-01" through "2025-11" |
reason |
TEXT | unreported / not_operating / no_coverage |
evidence |
TEXT | schedule / ridership / both / none |
Primary key: (route_id, month). Only rows missing from otp_monthly appear here. Built by src/prt_otp_analysis/otp_null_classification.py.
Reason definitions:
unreported: evidence the route was operating (schedule or ridership data exists) but OTP was not reportednot_operating: within temporal coverage of at least one cross-reference source, but no evidence the route operatedno_coverage: month falls outside both the schedule data range (through 2021-03) and ridership data range (through 2024-10)
allegheny_go_weekly
Weekly ridership for the Allegheny Go discounted fare program, extracted from the county's Tableau dashboard.
| Column | Type | Description |
|---|---|---|
week_start |
TEXT PK | ISO date for the start of the week ("2024-05-12") |
rides |
INTEGER | Total rides taken that week by program participants |
riders |
INTEGER | Unique riders that week |
Source: Allegheny Go Dashboard. Built by src/prt_otp_analysis/allegheny_go.py.
Source Files
| File | Rows | Feeds into |
|---|---|---|
routes_by_month.csv |
99 routes x 84 months | routes, otp_monthly |
Transit_stops_(current)_by_route_*.csv |
17,546 | stops, route_stops |
PRT_Current_Routes_Full_System_*.csv |
277 | routes |
PRT_Stop_Reference_Lookup_Table.csv |
7,554 | stop_reference |
Transit_stops_*.geojson |
17,546 | Not used (same data as stops CSV) |
ntd-monthly-ridership/*.xlsx |
2,340 agencies x 288 months | ntd_agency, ntd_ridership |
ntd-annual-service/*.xlsx |
3,062 agencies x 33 years | ntd_annual_service |
Source File: routes_by_month.csv
Fields
| Column | Type | Description |
|---|---|---|
Route |
Text | Route identifier and name (e.g. "1 - FREEPORT ROAD", "P1 - EAST BUSWAY-ALL STOPS") |
YYYY-Mon |
Float | On-Time Percentage for that route in that month, scaled 0.0 to 1.0 |
Route naming conventions
Routes follow several naming patterns that reflect service type:
| Pattern | Example | Service type |
|---|---|---|
| Number only | 1 - FREEPORT ROAD |
Local bus |
Number + L |
51L - CARRICK LIMITED |
Limited-stop bus |
Number + X |
28X - AIRPORT FLYER |
Express bus |
P prefix |
P1 - EAST BUSWAY-ALL STOPS |
Busway / flyer |
O prefix |
O1 - ROSS FLYER |
Flyer (express) |
G prefix |
G2 - WEST BUSWAY |
Busway |
| Color name | RED - Castle Shannon via Beechview |
Light rail (T) |
SLVR |
SLVR - Libary via Overbrook |
Light rail (T) |
BLUE |
BLUE - SouthHills Village via Overbrook |
Light rail (T) |
| Special | MONONGAHELA INCLINE, RLSH - Red Line Shuttle, SWL - Outbound to SHJ |
Incline / shuttle |
OTP value
Each numeric cell represents the On-Time Percentage for a route in a given month.
| Property | Value |
|---|---|
| Data type | Float (or empty) |
| Range | 0.0 -- 1.0 |
| Interpretation | 0.0 = 0%, 1.0 = 100% |
| Example | 0.6912 = 69.12% on-time |
Known ambiguities and data quality notes
1. "On-time" definition is unspecified
The dataset does not define what threshold constitutes "on-time." Transit industry standards vary:
- Common definition: a vehicle is on-time if it departs 0 to 5 minutes late (and not early)
- PRT may use a different window (e.g. 1 minute early to 5 minutes late)
- It is unclear whether OTP is measured at all stops, timepoints only, or at origin/destination
2. Empty cells are ambiguous
Many cells are blank. Possible reasons include:
- Route did not exist during that period (e.g.
37 - CASTLE SHANNONonly has Jan--Mar 2024) - Route was temporarily suspended (e.g. COVID-era service reductions)
- Data was not collected or reported for that month
- Route was discontinued (e.g.
P2 - EAST BUSWAY SHORTdata ends after Sep 2023)
There is no sentinel value or flag to distinguish these cases.
3. Sparse or short-lived routes
Several routes have very limited data, suggesting they were introduced, discontinued, or are seasonal:
| Route | Data months | Notes |
|---|---|---|
37 - CASTLE SHANNON |
3 | Only Jan--Mar 2024 |
42 - POTOMAC |
3 | Only Jan--Mar 2024 |
53 - HOMESTEAD PARK |
2 | Only Jan--Feb 2020 |
78 - MONONGAHELA INCLINE |
0 | Row present but all values empty |
88 - P2 - EAST BUSWAY SHORT |
56 | Data ends after Sep 2023 |
98 - RLSH - Red Line Shuttle |
3 | Only Jan--Mar 2024 |
100 - SWL - Outbound to SHJ |
~12 | Sparse data, mostly 2024 |
4. Potential outliers or anomalies
Some values appear as sharp drops that may indicate data quality issues rather than true performance:
15 - CHARLES: drops to ~0.35 in Jul--Sep 2022, then rebounds to ~0.80 (possible measurement change or route restructuring)65 - SQUIRREL HILL: drops to 0.28--0.37 range in mid-2023, far below its historical normSWL - Outbound to SHJ: value of 0.2097 in Dec 2024 (single-month anomaly amid ~0.88--0.90 values)39 - BROOKLINE: drops to 0.4000 in Mar 2022, otherwise consistently 0.72--0.897 - SPRING GARDEN: value of 0.2863 in Aug 2025, far below its historical range
5. Data source is undocumented
- The dataset does not specify where the OTP values originate (e.g. AVL/APC systems, manual reporting, GTFS-realtime)
- It is unclear whether these are PRT self-reported figures or independently calculated
- Aggregation method is unknown (simple average across all trips? weighted by ridership?)
6. Trailing comma
Each data row ends with a trailing comma, which may produce an extra empty column when parsed. Parsers should account for this.
7. Typo in source data
SLVR - Libary via Overbrook appears to be a misspelling of "Library."