Wednesday, August 17, 2016

On the Evils of Hodrick-Prescott Detrending

[If you're reading this in email, remember to click through on the title to get the math to render.]

Jim Hamilton has a very cool new paper, "Why You Should Never Use the Hodrick-Prescott (HP) Filter".

Of course we've known of the pitfalls of HP ever since Cogley and Nason (1995) brought them into razor-sharp focus decades ago.  The title of the even-earlier Nelson and Kang (1981) classic, "Spurious Periodicity in Inappropriately Detrended Time Series", says it all.  Nelson-Kang made the spurious-periodicity case against polynomial detrending of I(1) series.  Hamilton makes the spurious-periodicity case against HP detrending of many types of series, including I(1).  (Or, more precisely, Hamilton adds even more weight to the Cogley-Nason spurious-periodicity case against HP.)

But the main contribution of Hamilton's paper is constructive, not destructive.  It provides a superior detrending method, based only on a simple linear projection. 

Here's a way to understand what "Hamilton detrending" does and why it works, based on a nice connection to Beveridge-Nelson (1981) detrending not noticed in Hamilton's paper.  

First consider Beveridge-Nelson (BN) trend for I(1) series.  BN trend is just a very long-run forecast based on an infinite past.  [You want a very long-run forecast in the BN environment because the stationary cycle washes out from a very long-run forecast, leaving just the forecast of the underlying random-walk stochastic trend, which is also the current value of the trend since it's a random walk.  So the BN trend at any time is just a very long-run forecast made at that time.]  Hence BN trend is implicitly based on the projection: \(y_t ~ \rightarrow ~ c, ~ y_{t-h}, ~...,~ y_{t-h-p} \), for \(h \rightarrow \infty \) and \(p \rightarrow \infty\).

Now consider Hamilton trend.  It is explicitly based on the projection: \(y_t ~ \rightarrow ~ c, ~ y_{t-h}, ~...,~ y_{t-h-p} \), for \(p = 3 \).  (Hamilton also uses a benchmark of  \(h = 8 \).)

So BN and Hamilton are both "linear projection trends", differing only in choice of \(h\) and \(p\)!  BN takes an infinite forecast horizon and projects on an infinite past.  Hamilton takes a medium forecast horizon and projects on just the recent past.

Much of Hamilton's paper is devoted to defending the choice of \(p = 3 \), which turns out to perform well for a wide range of data-generating processes (not just I(1)).  The BN choice of \(h = p = \infty \), in contrast, although optimal for I(1) series, is less robust to other DGP's.  (And of course estimation of the BN projection as written above is infeasible, which people avoid in practice by assuming low-ordered ARIMA structure.)