3. Big data is revolutionary.
In their new book, “Big Data: A Revolution That Will Transform How We Live, Work, and Think,” Viktor Mayer-Schonberger and Kenneth Cukier compare “the current data deluge” to the transformation brought about by the Gutenberg printing press.
If you want more precise advertising directed toward you, then yes, big data is revolutionary. Generally, though, it’s likely to have a modest and gradual impact on our lives.
When a phenomenon or an effect is large, we usually don’t need huge amounts of data to recognize it (and science has traditionally focused on these large effects). As things become more subtle, bigger data helps. It can lead us to smaller pieces of knowledge: how to tailor a product or how to treat a disease a little bit better. If those bits can help lots of people, the effect may be large. But revolutionary for an individual? Probably not.
4. Bigger data is better.
In science, some admittedly mind-blowing big-data analyses are being done. In business, companies are being told to “embrace big data before your competitors do.” But big data is not automatically better.
Really big datasets can be a mess. Unless researchers and analysts can reduce the number of variables and make the data more manageable, they get quantity without a whole lot of quality. Give me some quality medium data over bad big data any day.
And let’s not forget about bias. There’s a common misconception that throwing more data at a problem makes it easier to solve. But if there’s an inherent bias in how the data are collected or examined, a bigger dataset doesn’t help. For example, if you’re trying to understand how people interact based on mobile phone data, a year of data rather than a month’s worth doesn’t address the limitation that certain populations don’t use mobile phones.
Many interesting questions can be explored with little datasets. Big data has refined our idea of six degrees of separation: Facebook has shown that it’s actually closer to four degrees. But the first six-degrees study was done by psychologist Stanley Milgram using a lot of cleverness and a small number of postcards.
Furthermore, although it’s exciting to have massive datasets with incredible breadth, too often they lack much in the way of a temporal dimension. To really understand a phenomenon, such as a social one, we need datasets with large historical sweep. We need long data, not just big data.
5. Big data means the end of scientific theories.
Chris Anderson argued in a 2008 Wired essay that big data renders the scientific method obsolete: Throw enough data at an advanced machine-learning technique and all the correlations and relationships will simply jump out. We’ll understand everything.
But you can’t just go fishing for correlations and hope they will explain the world. If you’re not careful, you’ll end up with spurious correlations. Even more important, to contend with the “why” of things, we still need ideas, hypotheses and theories. If you don’t have good questions, your results can be silly and meaningless.
Having more data won’t substitute for thinking hard, recognizing anomalies and exploring deep truths.
Samuel Arbesman, an applied mathematician and network scientist, is a senior scholar at the Ewing Marion Kauffman Foundation and the author of “The Half-Life of Facts.”