Monday, November 7, 2016

Big Data for Volatility vs.Trend

Although largely uninformative for some purposes, dense data (high-frequency sampling) are highly informative for others.  The massive example of recent decades is volatility estimation.  The basic insight traces at least to Robert Merton's early work. Roughly put, as we sample returns arbitrarily finely, we can infer underlying volatility (quadratic variation) arbitrarily well.

So, what is it for which dense data are "largely uninformative"?  The massive example of recent decades is long-term trend.  Again roughly put and assuming linearity, long-term trend is effectively a line segment drawn between a sample's first and last observations, so for efficient estimation we need tall data (long calendar span), not dense data.

Assembling everything, for estimating yesterday's stock-market volatility you'd love to have yesterday's 1-minute intra-day returns, but for estimating the expected return on the stock market (the slope of a linear log-price trend) you'd much rather have 100 years of annual returns, despite the fact that a naive count would say that 1 day of 1-minute returns is a much "bigger" sample.

So different aspects of Big Data -- in this case dense vs. tall -- are of different value for different things.  Dense data promote accurate volatility estimation, and tall data promote accurate trend estimation.