scorecardresearch Skip to main content

Standardized testing works, depending on where you go to school


The 2001 No Child Left Behind Act, or NCLB, installed high-stakes standardized tests as national education policy. It also kicked off a heated debate about how best to regulate public schools. Advocates of the tests, who until lately included Barack Obama, say they are the only way to ensure all schools provide a good education. Critics say they deform the educational process and do more harm than good.

Missing from the conversation is data on whether standardized testing really promotes outcomes education policy cares about most, like success in college and the job market. A new study out of the Harvard Graduate School of Education provides some of the first evidence one way or the other on the question, and it does that by going back to the place where standardized testing got its start: Texas.


“Texas was the canary in the coal mine for what later happened with No Child Left Behind,” says David Deming, author of the study, noting that Texas began statewide standardized testing in 1993, eight years before NCLB.

The study, published in the current issue of Education Next, follows hundreds of thousands of students who entered Texas high schools between 1995 and 1999. It matches performance on standardized math tests to later outcomes, like whether students enrolled in or completed college, and the amount of money they were earning at age 25.

“It’s the first paper that’s looked at what I would call genuine human capital outcomes that go beyond the types of things measured relatively [close] to the [tests],” says David Figlio, an education policy expert at Northwestern University.

Deming found that standardized tests had different effects depending on the quality of the schools students were attending. In Texas, schools received one of four rankings: Low-Performing, Acceptable, Recognized, and Exemplary. Deming looked at schools that were either on the cusp between Low-Performing and Acceptable or between Acceptable and Recognized. He reasoned that these schools would have felt the effects of standardized testing especially acutely, given pressure to move up a notch in the rankings. At those schools he focused on students who were themselves low-performing, reasoning that how those kids fared on tests was especially crucial for whether their schools advanced in the rankings.


Overall, he found that low-performing kids at low-performing schools did better in long-run measures given an environment of standardized tests. At the same time, low-performing students at high-performing schools (those between Acceptable and Recognized) did worse in the long-run. Deming thinks this is because of the different strategies schools adopted depending on where in the ranking system they fell: Low-performing schools had no choice but to find a way to better educate their many low-performing students, while better performing schools classified their low-performing students as “special education,” removing them from the testing process.

“You can’t classify a whole school as eligible for special education,” Deming says. “If a school is on the bubble between Low-Scoring and Acceptable, there are a lot of kids who are low-scoring and you need to bring those kids up.”

Texas subsequently closed the loophole that allowed special education students not to count toward a school’s average. That, plus the apparent benefits to students in low-performing schools, leads Figlio to conclude that Deming’s study “puts more of a thumb on the side of the scale that says maybe accountability is worth the cost.”


Deming feels that in complicated testing regimes, there will always be unintended consequences. He favors standardized testing as a whole, saying, “I don’t think we’re better off going back to the veil of ignorance” we had before standardized testing. As he sees it, problems develop not with the tests themselves, but with the rewards and sanctions, which promote widely criticized practices like “teaching to the test.” Instead, he favors standardized tests without sanctions that would work kind of the way the Food and Drug Administration evaluates products.

“The FDA certifies this drug is safe for consumer use. If this school is getting taxpayer money, we certify it’s of at least minimum quality,” he says. “I don’t think that’s ideal, but I think it’s the best we can do given how hard it is to measure performance and how many ways there are to game the system.”

Kevin Hartnett is a writer in South Carolina. He can be reached at