If you asked subway riders to rate the reliability of the T, most probably wouldn’t put it at nearly 90 percent.
But go to a Massachusetts Bay Transportation Authority website, and the subway dependability score over the past 30 days is 88 percent.
What explains the yawning discrepancy between public perception and the MBTA’s assessment?
According to a consumer group, the transit authority is highlighting numbers that paint T service in a more favorable light than the experience of many riders.
And while the issue boils down to a dispute over how to track on-time performance, there is a lot at stake, including Boston’s appeal as a place to live or locate a business, and the credibility of Governor Charlie Baker, who has made improving MBTA service a top priority of his administration.
The MBTA last year created the Performance Dashboard, a consumer-friendly website where it posts data on ridership, its financial performance, customer satisfaction, and reliability. To measure subway dependability, the agency tracks the percentage of riders who catch a train within a specified period of time after entering the station (using the swipe of a Charlie Card or paper ticket through the turnstile).
In a report set to be released Thursday, the Massachusetts Public Interest Research Group argues that this approach makes the T look more reliable than it really is, in part because some passengers who board trains that are running behind schedule are counted as having been served on time.
The MBTA says its method is in line with how transit agencies are increasingly gauging performance, by emphasizing riders’ waiting times rather than strict schedule adherence.
The MBTA’s subway lines do not operate on a schedule the same way its commuter rail network does, with trains arriving and departing at a set time. Instead, subway trains are supposed to run at specific intervals, called headways. These headways are somewhat flexible; they’re shorter during rush hour and longer during off-peak periods.
When the MBTA says the subway system is operating at 90 percent reliability, it means that when 90 percent of passengers enter a given station, they wait less than the scheduled headway for a train to arrive.
Consider a Red Line station during a period with five-minute headways. A passenger who enters the station and boards a train within five minutes would count as an on-time trip, even if the train is running with a longer headway than scheduled.
Yet the same train could include passengers whose trips register as late, if they board the train six or seven minutes after entering the station.
“If a train with a five-minute headway is two minutes late, it’s going to be counted as reliable for five of seven people,” said MassPIRG staff attorney Matt Casale. “So a late train is contributing pretty significantly to a high reliability rating.”
Laurel Paget-Seekins, the agency’s director of strategic initiatives, said the figure — which is fully explained on the website — is not intended to put a pretty face on subway service.
“If there are 280,000 people on the Red Line on the average weekday, 28,000 people waited longer than the headway. That’s something we want to fix,” she said. “While 90 percent might look like an A-minus, it actually is a number we know needs to be improved.”
She said the T’s goal is to stress individual passenger experiences, gauging whether they’re boarding trains within a reasonable amount of time.
There are issues with the figure, she said. Most prominently, it does not measure passengers who cannot squeeze on to an overly crowded train and are left waiting on a platform for the next trip. New Red and Orange line trains will help solve this problem, she said, because they will be equipped with technology that counts passengers as they enter and exit the vehicle.
An alternative is to gauge whether trains are arriving at their stations on schedule. At the MBTA, it’s less flattering: in September, the ratio of trains that arrived at stations within one minute of scheduled headways floated between 69 percent and 81 percent.
But Paget-Seekins said this data doesn’t differentiate between off-peak trains and rush-hour trains, which carry far more passengers. The measure on the Performance Dashboard better reflects the experience of all passengers, she said.
MassPIRG’s Casale agreed that it’s better to capture all passengers, but said the T should instead compare average wait times to half the scheduled headway. This metric would assume trains are leaving stations at the proper time while still focusing on the customer experience, he said, because a half-headway period should roughly match the wait of average passengers as they enter the station. But his group does not have data on how the T would perform under those metrics.
Zak Accuardi, an analyst with New York-based advocacy group Transit Center, said the T’s preferred metric “to determine whether or not a rider is delayed is very reasonable.”
But he said the T may want to reconsider using it as its sole public-facing reliability metric, since it only accounts for how long passengers are waiting for trains and not their experience on the trains. He pointed to a measure used by London that measures average passenger trip lengths versus how long they should take under ideal circumstances.
The MassPIRG report also recommends that the MBTA add this measurement — called “excess trip time” — to its arsenal.
Paget-Seekins said this would be difficult for the MBTA because — unlike London — riders here do not “tap out” with transit cards when they exit stations, so data about where passengers exit stations is imprecise and based on models. But New York recently began using similar metrics, and it does not require tap-offs either.
But she said the T is open to adding other measures to the dashboard to further illustrate system performance.
One measurement that could eventually make it onto the T’s site gauges the percentage of passenger trips that end within three minutes of their expected finish. The figure — usually only published in an annual performance report — also looked good for the T, according to recent data: all four lines were well over 90 percent.