Tuesday, July 26, 2016

An important Example of Simultaneously Wide and Dense Data

By the way, related to my last post on wide and dense data, an important example of analysis of data that are both wide and dense is the high-frequency high-dimensional factor modeling of Pelger and Ait-Sahalia and Xiu.  Effectively they treat wide sets of realized volatilities, each of which is constructed from underlying dense data.

Monday, July 25, 2016

The Action is in Wide and/or Tall Data

I recently blogged on varieties of Big Data: (1) tall, (2) wide, and (3) dense.

Presumably tall data are the least interesting insofar as the only way to get a long calendar span is to sit around and wait, in contrast to wide and tall data, which now appear routinely.

But it occurs to me that tall data are the least interesting not only for the above reason, but also because wide data make tall data impossible from a certain perspective. In particular, non-parametric estimation in high dimensions (that is, with wide data) is always subject to the fundamental and inescapable "curse of dimensionality":  the rate at which estimation error vanishes gets hopelessly slow, very quickly, as dimension grows.  [Wonkish readers will recall that the Stone-optimal rate in \(d\) dimensions is \( \sqrt{T^{1- \frac{d}{d+4}}}\).]

The upshot:  As our datasets get wider, they also implicitly get less tall. That's all the more reason to downplay tall data.  The action is in wide and dense data (whether separately or jointly).

Monday, July 18, 2016

The HAC Emperor has no Clothes: Part 2

The time-series kernel-HAC literature seems to have forgotten about pre-whitening. But most of the action is in the pre-whitening, as stressed in my earlier post. In time-series contexts, parametric allowance for good-old ARMA-GARCH disturbances (with AIC order selection, say) is likely to be all that's needed, cleaning out whatever conditional-mean and conditional-variance dynamics are operative, after which there's little/no need for anything else. (And although I say "parametric" ARMA/GARCH, it's actually fully non-parametric from a sieve perspective.)

Instead, people focus on kernel-HAC sans prewhitening, and obsess over truncation lag selection. Truncation lag selection is indeed very important when pre-whitening is forgotten, as too short a lag can lead to seriously distorted inference, as emphasized in the brilliant early work of Kiefer-Vogelsang and in important recent work by Lewis, Lazarus, Stock and Watson. But all of that becomes much less important when pre-whitening is successfully implemented.

[Of course spectra need not be rational, so ARMA is just an approximation to a more general Wold representation (and remember, GARCH(1,1) is just an ARMA(1,1) in squares). But is that really a problem? In econometrics don't we feel comfortable with ARMA approximations 99.9 percent of the time? The only econometrically-interesting process I can think of that doesn't admit a finite-ordered ARMA representation is long memory (fractional integration). But that too can be handled parametrically by introducing just one more parameter, moving from ARMA(p,q) to ARFIMA(p,d,q).]

My earlier post linked to the key early work of Den Haan and Levin, which remains unpublished. I am confident that their basic message remains intact. Indeed recent work revisits and amplifies it in important ways; see Kapetanios and Psaradakis (2016) and new work in progress by Richard Baillie to be presented at the September 2016 NBER/NSF time-series meeting at Columbia ("Is Robust Inference with OLS Sensible in Time Series Regressions?").

Sunday, July 10, 2016

Contemporaneous, Independent, and Complementary

You've probably been in a situation where you and someone else discovered something "contemporaneously and independently". Despite the initial sinking feeling, I've come to realize that there's usually nothing to worry about. 

First, normal-time science has a certain internal momentum -- it simply must evolve in certain ways -- so people often identify and pluck the low-hanging fruit more-or-less simultaneously. 

Second, and crucially, such incidents are usually not just the same discovery made twice. Rather, although intimately-related, the two contributions usually differ in subtle but important ways, rendering them complements, not substitutes.

Here's a good recent example in financial econometrics, working out asymptotics for high-frequency high-dimensional factor models. On the one hand, consider Pelger, and on the other hand consider Ait-Sahalia and Xiu.  There's plenty of room in the world for both, and the whole is even greater than the sum of the (individually-impressive) parts.

Sunday, July 3, 2016

DAG Software

Some time ago I mentioned the DAG (directed acyclical graph) primer by Judea Pearl et al.  As noted in Pearl's recent blog post, a manual will be available with software solutions based on a DAGitty R package.  See http://dagitty.net/primer/

More generally -- that is, quite apart from the Pearl et al. primer -- check out DAGity at http://dagitty.net.  Click on "launch" and play around for a few minutes. Very cool. 

Sunday, June 26, 2016

Regularization for Long Memory

Two earlier regularization posts focused on panel data and generic time series contexts. Now consider a specific time-series context: long memory. For exposition consider the simplest case of a pure long memory DGP,  \( (1-L)^d y_t = \varepsilon_t \) with  \( |d| < 1/2  \).  This \( ARFIMA(0,d,0) \) process is  is \( AR(\infty) \) with very slowly decaying coefficients due to the long memory. If you KNEW the world was was \(ARFIMA(0,d,0)\) you'd just fit \(d\) using GPH or Whittle or whatever, but you're not sure, so you'd like to stay flexible and fit a very long \(AR\) (an \(AR(100) \), say). But such a profligate parameterization is infeasible or at least very wasteful. A solution is to fit the \(AR(100) \) but regularize by estimating with ridge or a LASSO variant, say.

Related, recall the Corsi "HAR" approximation to long memory. It's just a long autoregression subject to coefficient restrictions. So you could do a LASSO estimation, as in Audrino and Knaus (2013). Related analysis and references are in a Humboldt University 2015 master's thesis.)

Finally, note that in all of the above it might be desirable to change the LASSO centering point for shrinage/selection to match the long-memory restriction. (In standard LASSO it's just 0.)

Wednesday, June 22, 2016

Observed Info vs. Estimated Expected Info

All told, after decades of research, it seems that Efron-Hinkley holds up -- observed information dominates estimated expected information for finite-sample MLE inference. It's both easier to calculate and more accurate. Let me know if you disagree.

[Efron, B. and Hinkley, D.V. (1978), "Assessing the Accuracy of the Maximum Likelihood Estimator: Observed Versus Expected Fisher Information", Biometrika, 65, 457–487.]

Tuesday, June 21, 2016

Mixed-Frequency High-Dimensional Time Series

Notice that high dimensions and mixed frequencies go together in time series. (If you're looking at a huge number of series, it's highly unlikely that all will be measured at the same frequency, unless you arbitrarily exclude all frequencies but one.) So high-dim MIDAS vector autoregression (VAR) will play a big role moving forward. The MIDAS literature is starting to go multivariate, with MIDAS VAR's appearing; see Ghysels (2015, in press) and Mikosch and Neuwirth (2016 w.p.)

But the multivariate MIDAS literature is still low-dim rather than high-dim. Next steps will be: 

(1) move to high-dim VAR estimation by using regularization methods (e.g. LASSO variants), 

(2) allow for many observational frequencies (five or six, say), 

(3) allow for the "rough edges" that will invariably arise at the beginning and end of the sample, and 

(4) visualize results using network graphics.

Conditional Dependence and Partial Correlation

In the multivariate normal case, conditional independence is the same as zero partial correlation.  (See below.) That makes a lot of things a lot simpler.  In particular, determining ordering in a DAG is just a matter of assessing partial correlations. Of course in many applications normality may not hold, but still...

Aust. N.Z. J. Stat. 46(4), 2004, 657–664
Kunihiro Baba1∗, Ritei Shibata1 and Masaaki Sibuya2
Keio University and Takachiho University
This paper investigates the roles of partial correlation and conditional correlation as mea-sures of the conditional independence of two random variables. It first establishes a suffi-cientconditionforthecoincidenceofthepartialcorrelationwiththeconditionalcorrelation. The condition is satisfied not only for multivariate normal but also for elliptical, multi-variate hypergeometric, multivariate negative hypergeometric, multinomial and Dirichlet distributions. Such families of distributions are characterized by a semigroup property as a parametric family of distributions. A necessary and sufficient condition for the coinci-dence of the partial covariance with the conditional covariance is also derived. However, a known family of multivariate distributions which satisfies this condition cannot be found, except for the multivariate normal. The paper also shows that conditional independence has no close ties with zero partial correlation except in the case of the multivariate normal distribution; it has rather close ties to the zero conditional correlation. It shows that the equivalence between zero conditional covariance and conditional independence for normal variables is retained by any monotone transformation of each variable. The results suggest that care must be taken when using such correlations as measures of conditional indepen-dence unless the joint distribution is known to be normal. Otherwise a new concept of conditional independence may need to be introduced in place of conditional independence through zero conditional correlation or other statistics.
Keywords: elliptical distribution; exchangeability; graphical modelling; monotone transformation.

Saturday, June 18, 2016

A Little Bit More on Dave Backus

In the days since his passing, lots of wonderful things have been said about Dave Backus. (See, for example, the obituary by Tom Cooley, posted on David Levine's page.) They're all true. But none sufficiently stress what was for me his essence: complete selflessness. We've all had a few good colleagues, even great colleagues, but Dave took it to an entirely different level.

The "Teaching" section of his web page begins, "I have an open-source attitude toward teaching materials". Dave had an open-source attitude toward everything. He lived for team building, cross-fertilization, mentoring, and on and on. A lesser person would have traded the selflessness for a longer c.v., but not Dave. And we're all better off for it.