No Hesitations: Big Data: Tall, Wide, and Dense

Wednesday, April 13, 2016

Big Data: Tall, Wide, and Dense

It strikes me that "tall", "wide", and "dense" might be useful words and conceptualizations of aspects of Big Data relevant in time-series econometrics.

Think of a regression situation, with a (T x K) "X matrix" for T "days" (or whatever) of data for each of K variables. Now imagine sampling intra-day, m times per day. Then X is (mT x K). Big data correspond to huge-X situations arising because one or more of T, K, and m is huge. (Of course there will always be subjectivity associated with "how huge is huge".)

T, K, and m are usefully considered separately.

-- As T gets large we have "tall data" (in reference to the tall X matrix, due to the large number of time periods, i.e., the long calendar span of data)

-- As K gets large we have "wide data" (in reference to the wide X matrix due to the large number of regressors)

-- As m gets large we have "dense data" (in reference to the high-frequency intra-day sampling, regardless of whether the data are tall)

A few examples:

-- Consider 2500 days of 1-minute returns for each of 5000 stocks. The data are tall, wide and dense.

-- Consider 25 days of 1-minute returns for each of 50 stocks. The data are dense, but neither tall nor wide.

-- Consider 2500 days of daily returns for each of 5000 stocks. The data are tall and wide, but not dense.

Econometrics, economics, finance, random rants.

Wednesday, April 13, 2016

Big Data: Tall, Wide, and Dense

No comments:

Post a Comment

Get new posts by email: