Two earlier regularization posts focused on panel data and generic time series contexts. Now consider a specific time-series context: long memory. For exposition consider the simplest case of a pure long memory DGP, \( (1-L)^d y_t = \varepsilon_t \) with \( |d| < 1/2 \). This \( ARFIMA(0,d,0) \) process is is \( AR(\infty) \) with very slowly decaying coefficients due to the long memory. If you KNEW the world was was \(ARFIMA(0,d,0)\) you'd just fit \(d\) using GPH or Whittle or whatever, but you're not sure, so you'd like to stay flexible and fit a very long \(AR\) (an \(AR(100) \), say). But such a profligate parameterization is infeasible or at least very wasteful. A solution is to fit the \(AR(100) \) but regularize by estimating with ridge or a LASSO variant, say.

Related, recall the Corsi "HAR" approximation to long memory. It's just a long autoregression subject to coefficient restrictions. So you could do a LASSO estimation, as in Audrino and Knaus (2013). Related analysis and references are in a Humboldt University 2015 master's thesis.)

Finally, note that in all of the above it might be desirable to change the LASSO centering point for shrinage/selection to match the long-memory restriction. (In standard LASSO it's just 0.)

## Sunday, June 26, 2016

## Wednesday, June 22, 2016

### Observed Info vs. Estimated Expected Info

All told, after decades of research, it seems that Efron-Hinkley holds up -- observed information dominates estimated expected information for finite-sample MLE inference. It's both easier to calculate and more accurate. Let me know if you disagree.

[Efron, B. and Hinkley, D.V. (1978), "Assessing the Accuracy of the Maximum Likelihood Estimator: Observed Versus Expected Fisher Information",

*Biometrika*, 65, 457–487.]## Tuesday, June 21, 2016

### Mixed-Frequency High-Dimensional Time Series

Notice that high dimensions and mixed frequencies go together in time series. (If you're looking at a huge number of series, it's highly unlikely that all will be measured at the same frequency, unless you arbitrarily exclude all frequencies but one.) So high-dim MIDAS vector autoregression (VAR) will play a big role moving forward. The MIDAS literature is starting to go multivariate, with MIDAS VAR's appearing; see Ghysels (2015, in press) and Mikosch and Neuwirth (2016 w.p.).

But the multivariate MIDAS literature is still low-dim rather than high-dim. Next steps will be:

(1) move to high-dim VAR estimation by using regularization methods (e.g. LASSO variants),

(2) allow for many observational frequencies (five or six, say),

(3) allow for the "rough edges" that will invariably arise at the beginning and end of the sample, and

(4) visualize results using network graphics.

But the multivariate MIDAS literature is still low-dim rather than high-dim. Next steps will be:

(1) move to high-dim VAR estimation by using regularization methods (e.g. LASSO variants),

(2) allow for many observational frequencies (five or six, say),

(3) allow for the "rough edges" that will invariably arise at the beginning and end of the sample, and

(4) visualize results using network graphics.

### Conditional Dependence and Partial Correlation

In the multivariate normal case, conditional independence is the same as zero partial correlation. (See below.) That makes a lot of things a lot simpler. In particular, determining ordering in a DAG is just a matter of assessing partial correlations. Of course in many applications normality may not hold, but still...

Aust. N.Z. J. Stat. 46(4), 2004, 657–664

PARTIAL CORRELATION AND CONDITIONAL CORRELATION AS MEASURES OF CONDITIONAL INDEPENDENCE

Kunihiro Baba1∗, Ritei Shibata1 and Masaaki Sibuya2

Keio University and Takachiho University

Summary

This paper investigates the roles of partial correlation and conditional correlation as mea-sures of the conditional independence of two random variables. It ﬁrst establishes a sufﬁ-cientconditionforthecoincidenceofthepartialcorrelationwiththeconditionalcorrelation. The condition is satisﬁed not only for multivariate normal but also for elliptical, multi-variate hypergeometric, multivariate negative hypergeometric, multinomial and Dirichlet distributions. Such families of distributions are characterized by a semigroup property as a parametric family of distributions. A necessary and sufﬁcient condition for the coinci-dence of the partial covariance with the conditional covariance is also derived. However, a known family of multivariate distributions which satisﬁes this condition cannot be found, except for the multivariate normal. The paper also shows that conditional independence has no close ties with zero partial correlation except in the case of the multivariate normal distribution; it has rather close ties to the zero conditional correlation. It shows that the equivalence between zero conditional covariance and conditional independence for normal variables is retained by any monotone transformation of each variable. The results suggest that care must be taken when using such correlations as measures of conditional indepen-dence unless the joint distribution is known to be normal. Otherwise a new concept of conditional independence may need to be introduced in place of conditional independence through zero conditional correlation or other statistics.

Keywords: elliptical distribution; exchangeability; graphical modelling; monotone transformation.

Aust. N.Z. J. Stat. 46(4), 2004, 657–664

PARTIAL CORRELATION AND CONDITIONAL CORRELATION AS MEASURES OF CONDITIONAL INDEPENDENCE

Kunihiro Baba1∗, Ritei Shibata1 and Masaaki Sibuya2

Keio University and Takachiho University

Summary

This paper investigates the roles of partial correlation and conditional correlation as mea-sures of the conditional independence of two random variables. It ﬁrst establishes a sufﬁ-cientconditionforthecoincidenceofthepartialcorrelationwiththeconditionalcorrelation. The condition is satisﬁed not only for multivariate normal but also for elliptical, multi-variate hypergeometric, multivariate negative hypergeometric, multinomial and Dirichlet distributions. Such families of distributions are characterized by a semigroup property as a parametric family of distributions. A necessary and sufﬁcient condition for the coinci-dence of the partial covariance with the conditional covariance is also derived. However, a known family of multivariate distributions which satisﬁes this condition cannot be found, except for the multivariate normal. The paper also shows that conditional independence has no close ties with zero partial correlation except in the case of the multivariate normal distribution; it has rather close ties to the zero conditional correlation. It shows that the equivalence between zero conditional covariance and conditional independence for normal variables is retained by any monotone transformation of each variable. The results suggest that care must be taken when using such correlations as measures of conditional indepen-dence unless the joint distribution is known to be normal. Otherwise a new concept of conditional independence may need to be introduced in place of conditional independence through zero conditional correlation or other statistics.

Keywords: elliptical distribution; exchangeability; graphical modelling; monotone transformation.

## Saturday, June 18, 2016

### A Little Bit More on Dave Backus

In the days since his passing, lots of wonderful things have been said about Dave Backus. (See, for example, the obituary by Tom Cooley, posted on David Levine's page.) They're all true. But none sufficiently stress what was for me his essence: complete selflessness. We've all had a few good colleagues, even great colleagues, but Dave took it to an entirely different level.

The "Teaching" section of his web page begins, "I have an open-source attitude toward teaching materials". Dave had an open-source attitude toward

The "Teaching" section of his web page begins, "I have an open-source attitude toward teaching materials". Dave had an open-source attitude toward

*everything*. He lived for team building, cross-fertilization, mentoring, and on and on. A lesser person would have traded the selflessness for a longer c.v., but not Dave. And we're all better off for it.### SoFiE 2016 Hong Kong (and 2017 New York)

Hats off to all those who helped make the Hong Kong SoFiE meeting such a success. Special thanks (in alphabetical order) to Charlotte Chen, Yin-Wong Cheung, Jianqing Fan, Eric Ghysels, Ravi Jagannathan, Yingying Li, Daniel Preve, and Giorgio Valente. The conference web site is here.

Mark your calendars now for what promises to be a very special tenth-anniversary meeting next year in New York, hosted by Rob Engle at NYU's Stern School. The dates are June 20-23, 2017.

## Tuesday, June 14, 2016

### Indicator Saturation Estimation

In an earlier post, "Fixed Effects Without Panel Data", I argued that you could allow for (and indeed estimate) fixed effects in pure cross sections (i.e., no need for panel data) by using regularization estimators like LASSO. The idea is to fit a profligately-parameterized model but then to recover d.f. by regularization.

Note that you can use the same idea in time-series contexts. Even in a pure time series, you can allow for period-by-period time effects, broken polynomial trend with an arbitrary number of breakpoints, etc., via regularization. It turns out that a fascinating small literature on so-called "indicator saturation estimation" pursues this idea. The "indicators" are things like period-by-period time dummies, break-date location dummies, etc., and "saturation" refers to the profligate parameterization. Prominent contributors include David Hendry and Soren Johanssen; see this new paper and those that it cites. (Very cool application, by the way, to detecting historical volcanic eruptions.)

Note that you can use the same idea in time-series contexts. Even in a pure time series, you can allow for period-by-period time effects, broken polynomial trend with an arbitrary number of breakpoints, etc., via regularization. It turns out that a fascinating small literature on so-called "indicator saturation estimation" pursues this idea. The "indicators" are things like period-by-period time dummies, break-date location dummies, etc., and "saturation" refers to the profligate parameterization. Prominent contributors include David Hendry and Soren Johanssen; see this new paper and those that it cites. (Very cool application, by the way, to detecting historical volcanic eruptions.)

## Monday, June 6, 2016

### Fixed Effects Without Panel Data

Consider a pure cross section (CS) of size N. Generally you'd like to allow for individual effects, but you can't, because OLS with a full set of N individual dummies is conceptually infeasible. (You'd exhaust degrees of freedom.) That's usually what motivates the desirability/beauty of panel data -- there you have NxT observations, so including N individual dummies becomes conceptually feasible.

But there's no need to stay with OLS. You can recover d.f. using regularization estimators like ridge (shrinkage) or LASSO (shrinkage and selection). So including a full set of individual dummies, even in a pure CS, is completely feasible! For implementation you just have to select the ridge or lasso penalty parameter, which is reliably done by cross validation (say).

There are two key points. The first is that you can allow for individual fixed effects even in a pure CS; that is, there's no need for panel data. That's what I've emphasized so far.

The second is that the proposed method actually gives

But there's no need to stay with OLS. You can recover d.f. using regularization estimators like ridge (shrinkage) or LASSO (shrinkage and selection). So including a full set of individual dummies, even in a pure CS, is completely feasible! For implementation you just have to select the ridge or lasso penalty parameter, which is reliably done by cross validation (say).

There are two key points. The first is that you can allow for individual fixed effects even in a pure CS; that is, there's no need for panel data. That's what I've emphasized so far.

The second is that the proposed method actually gives

*estimates*of the fixed effects. Sometimes they're just nuisance parameters that can be ignored; indeed standard panel estimation methods "difference them out", so they're not even estimated. But estimates of the fixed effects are*crucial*for forecasting: to forecast y_i, you need not only Mr. i's covariates and estimates of the "slope parameters", but also an estimate of Mr. i's intercept! That's why forecasting is so conspicuously absent from most of the panel literature -- the fixed effects are not estimated, so forecasting is hopeless. Regularized estimation, in contrast, delivers estimates of fixed effects, thereby facilitating forecasting, and you don't even need a panel.## Friday, June 3, 2016

### Causal Estimation and Millions of Lives

This just in from a fine former Ph.D. student. He returned to India many years ago and made his fortune in finance. He's now devoting himself the greater good, working with the Bill and Melinda Gates Foundation.

I reminded him that I'm not likely to be a big help, as I generally don't do causal estimation or experimental design. But he kindly allowed me to post his communication below (abridged and slightly edited). Please post comments for him if you have any suggestions. [As you know, I write this blog more like a newspaper column, neither encouraging nor receiving many comments -- so now's your chance to comment!]

He writes:

I reminded him that I'm not likely to be a big help, as I generally don't do causal estimation or experimental design. But he kindly allowed me to post his communication below (abridged and slightly edited). Please post comments for him if you have any suggestions. [As you know, I write this blog more like a newspaper column, neither encouraging nor receiving many comments -- so now's your chance to comment!]

He writes:

One of the key challenges we face in our work is that causality is not known, and while theory and large scale studies, such as those published in the Lancet, do provide us with some guidance, it is far from clear that they reflect the reality on the ground when we are intervening in field settings with markedly different starting points from those that were used in the studies. However, while we observe the ground situation imperfectly and with large error, the inertia in the underlying system that we are trying to impact is so high that that it would perhaps be safe to say that, unlike in the corporate world, there isn’t a lot of creative destruction going on here. In such a situation it would seem to me that the best way to learn about the “true but unobserved” reality and how to permanently change it and scale the change cost-effectively (such as nurse behavior in facilities) is to go on attempting different interventions which are structured in such a way as to allow for a rapid convergence to the most effective interventions (similar to the famous Runge-Kutta iterative methods for rapidly and efficiently arriving at solutions to differential equations to the desired level of accuracy).

However, while the need is for rapid learning, the most popular methods proceed by collecting months or years of data in both intervention and control settings, and at the end of it all, if done very-very carefully, all that they can tell you is that there were some links (or not) between the interventions and results without giving you any insight into why something happened or what can be done to improve it. In the meanwhile one is expected to hold the intervention steady and almost discard all the knowledge that is continuously being generated and be patient even while lives are being lost because the intervention was not quite designed well. While the problems with such an approach are apparent, the alternative cannot be instinct or gut feeling and a series of uncoordinated actions in the name of “being responsive”.

I am writing to request your help in pointing us to literature that can act as a guide to how we may do this better. ... I have indeed found some ideas in the literature that may be somewhat useful, ... [and] while very interesting and informative, I’m afraid it is not yet clear to me how we will apply these ideas in our actual field settings, and how we will design our Measurement, Learning, and Evaluation approaches differently so that we can actually implement these ideas in difficult on-ground settings in remote parts of our country involving, literally, millions of lives.

However, while the need is for rapid learning, the most popular methods proceed by collecting months or years of data in both intervention and control settings, and at the end of it all, if done very-very carefully, all that they can tell you is that there were some links (or not) between the interventions and results without giving you any insight into why something happened or what can be done to improve it. In the meanwhile one is expected to hold the intervention steady and almost discard all the knowledge that is continuously being generated and be patient even while lives are being lost because the intervention was not quite designed well. While the problems with such an approach are apparent, the alternative cannot be instinct or gut feeling and a series of uncoordinated actions in the name of “being responsive”.

I am writing to request your help in pointing us to literature that can act as a guide to how we may do this better. ... I have indeed found some ideas in the literature that may be somewhat useful, ... [and] while very interesting and informative, I’m afraid it is not yet clear to me how we will apply these ideas in our actual field settings, and how we will design our Measurement, Learning, and Evaluation approaches differently so that we can actually implement these ideas in difficult on-ground settings in remote parts of our country involving, literally, millions of lives.

Subscribe to:
Posts (Atom)