My earlier post, "Musings on Prediction Under Asymmetric Loss," got me thinking and re-thinking about the predictive conditions under which the conditional mean is optimal, in the sense of minimizing expected loss.

To strip things to the simplest case possible, consider a conditionally-Gaussian process.

(1) Under quadratic loss, the conditional mean is of course optimal. But the conditional mean is also optimal under other loss functions, like absolute-error loss (in general the conditional median is optimal under absolute-error loss, but by symmetry of the conditionally-Gaussian process, the conditional median

*is*the conditional mean).

(2) Under

*asymmetric*loss like linex or linlin, the conditional mean is generally

*not*the optimal prediction. One would naturally expect the optimal forecast to be

*biased*, to lower the probability of making errors of the more hated sign. That intuition is generally correct. More precisely, the following result from Christoffersen and Diebold (1997) obtains:

If \(y_{t}\) is a conditionally Gaussian process and \( L(e_{t+h} )\) is any loss function defined on the \(h\)-step-ahead prediction error \(e_{t+h |t}\), then the \(L\)-optimal predictor is of the form \begin{equation} y_{t+h | t} = \mu _{t+h,t} + \alpha _{t}, \end{equation}where \( \mu _{t+h,t} = E(y_{t+h} | \Omega_t) \), \( \Omega_t = y_t, y_{t-1}, ...\), and \(\alpha _{t}\) depends only on the loss function \(L\) and the conditional prediction-error variance \( var(e _{t+h} | \Omega _{t} )\).

That is, the optimal forecast is a "shifted" version of the conditional mean, where the generally time-varying bias depends only on the loss function (no explanation needed) and on the conditional variance (explanation: when the conditional variance is high, you're more likely to make a large error, including an error of the sign you hate, so under asymmetric loss it's optimal to inject more bias at such times).

*any*loss function. Either it

*is*the optimal prediction, or it's a key ingredient.

But casual readings of (1) and (2) can produce false interpretations. Consider, for example, the following folk theorem: "Under asymmetric loss, the optimal prediction is conditionally biased." The folk theorem is false. But how can that be? Isn't the folk theorem basically just (2)?

Things get really interesting.

To be continued...