There is this question of how does one define “big data.” The answer is that there really is no way to define “big data.” I think that’s true of pretty much any emerging concept. When you think about some of the things that did become obsolete — the Yellow Pages, White Pages — to some extent they represent big data. I think big data has always existed, but the ability to interrogate massive data sets has not. And a huge amount of the Internet, and services like Google, are built on the ability in real time to interrogate large data sets.
The goal [of my research partner and coauthor Jean-Baptiste Michel and I] had always been to study culture. But there were no tools that allowed you to ask the basic questions like “How often did people talk about democracy?” and “When did people get interested in feminism?” At a certain moment, when the Google Books data became sufficiently large, when we had so many books in sufficient quantities, we started to see statistically significant counts for different concepts, like separation of church and state. And all of a sudden it became possible to have serious quantitative, statistical conversations about where concepts came from. Once you could do it, it was impossible to stop.