Quantity has a quality all its own, as the saying goes. That’s especially true of information. The more you’ve got, the more useful it becomes. Knowing sensitive facts about one person, or a dozen, may be trivially useful. But analyze the same facts about 100 million people, and you can cure diseases, win elections, or earn billions of dollars, because unpredictable insights emerge when you turn computers loose on vast storehouses of information.
There’s a nickname for the concept: “big data.” It’s one of the buzzwords of corporate executives, tech-savvy politicians, and worried civil libertarians. If you want to know what they’re all talking about, then “Big Data’’ is the book for you, a comprehensive and entertaining introduction to a very large topic.
By analyzing huge amounts of information, it’s possible to discover patterns and relationships that up to now have been invisible to us. In this way, we can find new solutions to tough problems, and opportunities we’d never otherwise have suspected.
Viktor Mayer-Schönberger and Kenneth Cukier set out a host of examples. My favorite involves the giant retailer Walmart and the notorious breakfast snack Pop-Tarts. Walmart records every purchase by every customer for future analysis. Company analysts noticed that when the National Weather Service warned of a hurricane, Walmart stores in the affected area would see a surge in sales of Pop-Tarts. So store managers were told to put their Pop-Tarts near the entrance during hurricane season, and sales soared.
This is big data at its coolest. No human would have guessed the connection.
The big data movement has surged because it’s so cheap and easy to store vast quantities of information we once discarded. Computer memory was once so expensive that we economized by describing years with two digits instead of four — ’99 instead of 1999.
But these days, data storage is dirt cheap; a billion bytes, enough to store a full-length movie, costs about a dime. As a result, we now record darn near everything.
Simply throwing more data at a problem can produce amazing results, the authors explain. Microsoft Corp. found that it could sharply improve the performance of the spell checker in its word processing software just by having it process a database of 1 billion words. Google Inc. boosted its language translation service by scouring the Internet for billions of pages of translated documents and analyzing what it found. Amazon.com once used human reviewers to suggest new books to customers. Then the company found that using computers to analyze millions of transactions was not only cheaper, but produced better results.
Why? Who knows? Big data analysis often uncovers correlations that seem to have no logical reason to exist. A human would want to know why those preparing for a hurricane would stock up on Pop-Tarts and not, say, Snickers bars. Computers don’t care.
Data analysis is quite impressive, but far from flawless. Mayer-Schönberger and Cukier celebrate Google’s Flu Trends service, which uses analysis of billions of Internet searches to estimate the prevalence of flu in the United States. But the authors couldn’t have known that this much-vaunted technique failed utterly this flu season, when Google’s estimate of flu cases was twice the actual number.
Even when big data isn’t flat-out wrong, it can be rather creepy. To their credit, the authors are well aware of technology’s relentless erosion of privacy. Even if you strip names and addresses from a database, it’s possible to identify individuals by analyzing enough of the websites they visit or the Google searches they run. Such privacy-shredding is child’s play when processing billions of online activities.
In addition to spotting trends, big data analysis is getting pretty good at predicting behavior, the book points out. In some cities, police use the technology to proactively beef up patrols on certain streets at certain times of day. In some states, it’s used to decide that some prisoners will be granted parole while others are too dangerous to release.
It’s easy to celebrate any technology that makes our streets safer. But justice can only be meted out one person at a time.
Mayer-Schönberger and Cukier offer up some sensible suggestions on how we can have the blessings of big data and our freedoms, too. Just as well; their lively book leaves no doubt that big data’s growth spurt is just beginning.
Hiawatha Bray can be reached at firstname.lastname@example.org.