Monday, August 29, 2016

On Credible Cointegration Analyses

I may not know whether some \(I(1)\) variables are cointegrated, but if they are, I often have a very strong view about the likely number and nature of cointegrating combinations. Single-factor structure is common in many areas of economics and finance, so if cointegration is present in an \(N\)-variable system, for example, a natural benchmark is 1 common trend (\(N-1\) cointegrating combinations).  And moreover, the natural cointegrating combinations are almost always spreads or ratios (which of course are spreads in logs). For example, log consumption and log income may or may not be cointegrated, but if they are, then the obvious benchmark cointegrating combination is \((ln C - ln Y)\). Similarly, the obvious benchmark for \(N\) government bond yields \(y\) is \(N-1\) cointegrating combinations, given by term spreads relative to some reference yield; e.g., \(y_2 - y_1\), \(y_3 - y_1\), ..., \(y_N - y_1\).

There's not much literature exploring this perspective. (One notable exception is Horvath and Watson, "Testing for Cointegration When Some of the Cointegrating Vectors are Prespecified", Econometric Theory, 11, 952-984.) We need more.

Sunday, August 21, 2016

More on Big Data and Mixed Frequencies

I recently blogged on Big Data and mixed-frequency data, arguing that Big Data (wide data, in particular) leads naturally to mixed-frequency data.  (See here for the tall data / wide data / dense data taxonomy.)  The obvious just occurred to me, namely that it's also true in the other direction. That is, mixed-frequency situations also lead naturally to Big Data, and with a subtle twist: the nature of the Big Data may be dense rather than wide. The theoretically-pure way to set things up is as a state-space system laid out at the highest observed frequency, appropriately treating most of the lower-frequency data as missing, as in ADS.  By construction, the system is dense if any of the series are dense, as the system is laid out at the highest frequency.

Wednesday, August 17, 2016

On the Evils of Hodrick-Prescott Detrending

[If you're reading this in email, remember to click through on the title to get the math to render.]

Jim Hamilton has a very cool new paper, "Why You Should Never Use the Hodrick-Prescott (HP) Filter".

Of course we've known of the pitfalls of HP ever since Cogley and Nason (1995) brought them into razor-sharp focus decades ago.  The title of the even-earlier Nelson and Kang (1981) classic, "Spurious Periodicity in Inappropriately Detrended Time Series", says it all.  Nelson-Kang made the spurious-periodicity case against polynomial detrending of I(1) series.  Hamilton makes the spurious-periodicity case against HP detrending of many types of series, including I(1).  (Or, more precisely, Hamilton adds even more weight to the Cogley-Nason spurious-periodicity case against HP.)

But the main contribution of Hamilton's paper is constructive, not destructive.  It provides a superior detrending method, based only on a simple linear projection. 

Here's a way to understand what "Hamilton detrending" does and why it works, based on a nice connection to Beveridge-Nelson (1981) detrending not noticed in Hamilton's paper.  

First consider Beveridge-Nelson (BN) trend for I(1) series.  BN trend is just a very long-run forecast based on an infinite past.  [You want a very long-run forecast in the BN environment because the stationary cycle washes out from a very long-run forecast, leaving just the forecast of the underlying random-walk stochastic trend, which is also the current value of the trend since it's a random walk.  So the BN trend at any time is just a very long-run forecast made at that time.]  Hence BN trend is implicitly based on the projection: \(y_t ~ \rightarrow ~ c, ~ y_{t-h}, ~...,~ y_{t-h-p} \), for \(h \rightarrow \infty \) and \(p \rightarrow \infty\).

Now consider Hamilton trend.  It is explicitly based on the projection: \(y_t ~ \rightarrow ~ c, ~ y_{t-h}, ~...,~ y_{t-h-p} \), for \(p = 3 \).  (Hamilton also uses a benchmark of  \(h = 8 \).)

So BN and Hamilton are both "linear projection trends", differing only in choice of \(h\) and \(p\)!  BN takes an infinite forecast horizon and projects on an infinite past.  Hamilton takes a medium forecast horizon and projects on just the recent past.

Much of Hamilton's paper is devoted to defending the choice of \(p = 3 \), which turns out to perform well for a wide range of data-generating processes (not just I(1)).  The BN choice of \(h = p = \infty \), in contrast, although optimal for I(1) series, is less robust to other DGP's.  (And of course estimation of the BN projection as written above is infeasible, which people avoid in practice by assuming low-ordered ARIMA structure.)

Monday, August 15, 2016

More on Nonlinear Forecasting Over the Cycle

Related to my last post, here's a new paper that just arrived from Rachidi Kotchoni and Dalibor Stevanovic, "Forecasting U.S. Recessions and Economic Activity". It's not non-parametric, but it is non-linear. As Dalibor put it, "The method is very simple: predict turning points and recession probabilities in the first step, and then augment a direct AR model with the forecasted probability." Kotchoni-Stevanovic and Guerron-Quintana-Zhong are usefully read together.

Sunday, August 14, 2016

Nearest-Neighbor Forecasting in Times of Crisis

Nonparametric K-nearest-neighbor forecasting remains natural and obvious and potentially very useful, as it has been since its inception long ago.

[Most crudely: Find the K-history closest to the present K-history, see what followed it, and use that as a forecast. Slightly less crudely: Find the N K-histories closest to the present K-history, see what followed each of them, and take an average. There are many obvious additional refinements.]

Overall, nearest-neighbor forecasting remains curiously under-utilized in dynamic econometrics. Maybe that will change. In an interesting recent development, for example, new Federal Reserve System research by Pablo Guerron-Quintana and Molin Zhong puts nearest-neighbor methods to good use for forecasting in times of crisis.

Monday, August 8, 2016

NSF Grants vs. Improved Data

Lots of people are talking about the Cowen-Tabarrok Journal of Economic Perspectives piece, "A Skeptical View of the National Science Foundation’s Role in Economic Research". See, for example, John Cochrane's insightful "A Look in the Mirror".

A look in the mirror indeed. I was a 25-year ward of the NSF, but for the past several years I've been on the run. I bolted in part because the economics NSF reward-to-effort ratio has fallen dramatically for senior researchers, and in part because, conditional on the ongoing existence of NSF grants, I feel strongly that NSF money and "signaling" are better allocated to young assistant and associate professors, for whom the signaling value from NSF support is much higher.

Cowen-Tabarrok make some very good points. But I can see both sides of many of their issues and sub-issues, so I'm not taking sides. Instead let me make just one observation (and I'm hardly the first).

If NSF funds were to be re-allocated, improved data collection and dissemination looks attractive. I'm not talking about funding cute RCTs-of-the-month. Rather, I'm talking about funding increased and ongoing commitment to improving our fundamental price and quantity data (i.e., the national accounts and related statistics). They desperately need to be brought into the new millennium. Just look, for example, at the wealth of issues raised in recent decades by the Conference on Research in Income and Wealth.

Ironically, it's hard to make a formal case (at least for data dissemination as opposed to creation), as Chris Sims has emphasized with typical brilliance. His "The Futility of Cost-Benefit Analysis for Data Dissemination" explains "why the apparently reasonable idea of applying cost-benefit analysis to government programs founders when applied to data dissemination programs." So who knows how I came to feel that NSF funds might usefully be re-allocated to data collection and dissemination. But so be it.

Monday, August 1, 2016

On the Superiority of Observed Information

Earlier I claimed that "Efron-Hinkley holds up -- observed information dominates estimated expected information for finite-sample MLE inference." Several of you have asked for elaboration.

The earlier post grew from a 6 AM Hong Kong breakfast conversation with Per Mykland (with both of us suffering from 12-hour jet lag), so I wanted to get some detail from him before elaborating, to avoid erroneous recollections. But it's basically as I recalled -- mostly coming from the good large-deviation properties of the likelihood ratio. The following is adapted from that conversation and a subsequent email exchange. (Any errors or omissions are entirely mine.)

There was quite a bit of work in the 1980s and 1990s. It was kicked off by Efron and Hinkley (1978). The main message is in their plot on p. 460, suggesting that the observed info was a more accurate estimator. Research gradually focused on the behavior of the likelihood ratio (\(LR\)) statistic and its signed squared root \(R=sgn(\hat{\theta} - \theta ) \sqrt{LR}\), which was seen to have good conditionality properties, local sufficiency, and most crucially, good large-deviation properties.  (For details see Mykland (1999), Mykland (2001), and the references there.)

The large-deviation situation is as follows.  Most statistics have cumulant behavior as in Mykland (1999) eq. (2.1).  In contrast, \(R\) has cumulant behavior as in Mykland (1999) eq. (2.2), which yields the large deviation properties of Mykland (1999) Theorem 1. (Also see Theorems 1 and 2 of Mykland (2001).)