Tuesday, October 1, 2013

Big Data the Big Hassle

The hype surrounding "Big Data" has escalated to borderline nauseating. Is it just a sham?

Yes, I know, I have earlier gushed about the wonders of Big Data. But that was then, and now is now, and I hear my inner contrarian alarm sounding.

One thing is clear: Big Data the phenomenon is not a sham. It's here, it's real, and it must be taken seriously. The ongoing explosion in the quantity of available data, largely the result of recent and unprecedented advancements in data recording and storage technology, is not going away. It's emerging as one of the defining characteristics of our time.

Big Data the business isn't really a sham either, even if it's impossible not to smirk when told, for example, that that major firms are rushing to create new executive titles like "Vice President for Big Data." (I'm not making this up. See Steve Lohr's New York Times piece.) Big Data consultants and software peddlers smell Big Money, and they're salivating profusely. But there's nothing necessarily wrong with that, even if it isn't pretty.

But what about Big Data the scientific field? What is it? Where's the beef?

What's really new, for example, statistically? Of course Big Data has stimulated much fine new work in dimensionality reduction, shrinkage, selection, sparsity, regularization, etc. But are those not traditional areas? In what sense is the scientific Big Data whole truly greater than the sum of its earlier-existing parts?

But primarily: Why all the endless optimistic Big Data buzz about endless Big Data opportunities? What about pitfalls? Isn't Big Data in many respects just a hassle? Aren't we still searching for needles in a haystack, except that the haystack is now growing much more quickly than the needle-discovering technology is improving? Why is that cause for celebration?