Consider a time-series regression with possibly heteroskedastic and/or autocorrelated disturbances,

\( y_t = x_t' \beta + \varepsilon_t \).

A popular approach is to punt on the potentially non-iid disturbance, instead simply running OLS with kernel-based heteroskedasticity and autocorrelation consistent (HAC) standard errors.Punting via kernel-HAC estimation is a bad idea in time series, for several reasons:

(1) [Kernel-HAC is not likely to produce good \(\beta\) estimates.] It stays with OLS and hence gives up on efficient estimation of \(\hat{\beta}\). In huge samples the efficiency loss from using OLS rather than GLS/ML is likely negligible, but time-series samples are often smallish. For example, samples like 1960Q1-2014Q4 are typical in macroeconomics -- just a couple hundred observations of highly-serially-correlated data.

(2) [Kernel-HAC is not likely to produce good \(\beta\) inference.] Its standard errors are not tailored to a specific parametric approximation to \(\varepsilon\) dynamics. Proponents will quickly counter that that's a

*benefit*, not a cost, and in some settings the proponents may be correct. But not in time series settings. In time series, \(\varepsilon\) dynamics are almost always accurately and parsimoniously approximated parametrically (ARMA for conditional mean dynamics in \(\varepsilon\), and GARCH for conditional variance dynamics in \(\varepsilon\)). Hence kernel-HAC standard errors may be unnecessarily unreliable in small samples, even if they're accurate asymptotically. And again, time-series sample sizes are often smallish.

*prediction*, and explicit parametric modeling of dynamic heteroskedasticity and autocorrelation in \(\varepsilon\) can be used for improved prediction of \(y\). Autocorrelation can be exploited for improved point prediction, and dynamic conditional heteroskedasticity can be exploited for improved interval and density prediction. Punt on them and you're potentially leaving a huge amount of money on the table.

The clearly preferable approach is traditional parametric disturbance heteroskedasticty / autocorrelation modeling, with GLS/ML estimation. Simply allow for ARMA(p,q)-GARCH(P,Q) disturbances (say), with p,q, P and Q selected by AIC (say). (In many applications something like AR(3)-GARCH(1,1) or ARMA(1,1)-GARCH(1,1) would be more than adequate.) Note that the traditional approach is actually fully non-parametric when appropriately viewed as a sieve, and moreover it features automatic bandwidth selection.

Kernel-HAC people call the traditional strategy "pre-whitening," to be done prior to kernel-HAC estimation. But the real point is that

*it's all -- or at least mostly all -- in the pre-whitening.*

In closing, I might add that the view expressed here is strongly supported by top-flight research. On my point (2) and my general recommendation, for example, see the insightful work of den Haan and Levin (2000). It fell on curiously deaf ears and remains unpublished many years later. (It's on Wouter den Haan's web site in a section called "Sleeping and Hard to Get"!) In the interim much of the world jumped on the kernel-HAC bandwagon. It's time to jump off.