Monday, November 24, 2014

More on Big Data

An earlier post, "Big Data the Big Hassle," waxed negative. So let me now give credit where credit is due.

What's true in time-series econometrics is that it's very hard to list the third-most-important, or even second-most-important, contribution of Big Data. Which makes all the more remarkable the mind-boggling -- I mean completely off-the-charts -- success of the first-most-important contribution: volatility estimation from high-frequency trading data. Yacine Ait-Sahalia and Jean Jacod give a masterful overview in their new book, High-Frequency Financial Econometrics.

What do financial econometricians learn from high-frequency data? Although largely uninformative for some purposes (e.g., trend estimation), high-frequency data are highly informative for others (volatility estimation), an insight that traces at least to Merton's early work. Roughly put: as we sample returns arbitrarily finely, we can infer underlying volatility arbitrarily well. Accurate volatility estimation and forecasting, in turn, are crucial for financial risk management, asset pricing, and portfolio allocation. And it's all facilitated by by the trade-by-trade data captured in modern electronic markets.

In stressing "high frequency" financial data, I have thus far implicitly stressed only the massive time-series dimension, with its now nearly-continuous record. But of course we're ultimately concerned with covariance matrices, not just scalar variances, for tens of thousands of assets, so the cross-section dimension is huge as well. (A new term: "Big Big Data"? No, please, no.) Indeed multivariate now defines significant parts of both the theoretical and applied research frontiers; see Andersen et al. (2013).