It is a throwaway line in stories about science and health research results: Correlation is not causation. Just because things look like they follow a similar pattern, does not mean they are related.
A new website, Spurious Correlations, the work of a first-year student at Harvard Law School, draws into comical and obvious relief how easily we can get trapped into seeing relationships where there are none. Say cheese consumption increases year over year. So does another variable — the number of people who die from becoming tangled in bedsheets. Even though, when plotted side by side, the two trends look mysteriously similar, there is almost certainly no connection there.
The same goes for the amount of sunlight per acre in Massachusetts in a year and the number of lawyers in Ohio. Or the number of women in California who slipped or tripped to their death and the number of women editors of the Harvard Law Review. They can look similar when thrown up on a graph together, but they are definitely not connected.
The website, the product of Tyler Vigen, went up a week ago and has since gone viral, garnering a couple million hits from people eager to find the often comedic things he has found are statistically correlated by scouring data from the US Census and the Centers for Disease Control and Prevention. Vigen said he was inspired after seeing someone find data that tracked up and down like the profile of a specific mountain and idly began to wonder if it might be possible to find other data sources that could, when presented on a graph, suddenly look meaningful.
“It’s good training because we all get to look at those graphs and say, ‘Is there any connection between those things?’ ” Vigen said.
Vigen readily admits that his own website will not stand up to rigorous peer review. Already, he said, statisticians and research professors have contacted him pointing out, for example, that most of these correlations would not be statistically significant. He likes that fact about his data — it contributes to the misconception he was trying to critique. When looking at correlations, one has to be critical and ask questions, such as: Is this correlation even statistically significant?
In addition, he used only one relatively simple way of measuring the correlation between the data points. Vigen acknowledges that if he used other methods, many of the correlations would go away altogether.
Vigen never intended for his website to go viral or to get the scrutiny of card-carrying statisticians. He launched it, in the middle of finals period, hoping to show it to a few friends and get some laughs. His favorite example to share with others? The number of sociology doctorates awarded each year in the United States correlates with the number of noncommercial space launches worldwide on a year-by-year basis. The plausible explanation? Sociologists are being launched into space, of course.
Many of the correlations will not stand up to scrutiny. The graphing tactics that Vigen has deployed on the website make it deceptively easy, a graphics editor at the Globe said, to correlate most anything. But that is sort of the point.