Monday, October 6, 2014

Intuition for Prediction Under Bregman Loss

Elements of the Bregman family of loss functions, denoted B(y,ˆy), take the form:
B(y,ˆy)=ϕ(y)ϕ(ˆy)ϕ(ˆy)(yˆy)
where ϕ:YR is any strictly convex function, and Y is the support of Y.

Several readers have asked for intuition for equivalence between the predictive optimality of E[y|F] and Bregman loss function B(y,ˆy).  The simplest answers come from the proof itself, which is straightforward.

First consider B(y,ˆy)E[y|F].  The derivative of expected Bregman loss with respect to ˆy is
ˆyE[B(y,ˆy)]=ˆyB(y,ˆy)f(y|F)dy

=ˆy(ϕ(y)ϕ(ˆy)ϕ(ˆy)(yˆy))f(y|F)dy

=(ϕ(ˆy)ϕ(ˆy)(yˆy)+ϕ(ˆy))f(y|F)dy

=ϕ(ˆy)(E[y|F]ˆy).

Hence the first order condition is
ϕ(ˆy)(E[y|F]ˆy)=0,

so the optimal forecast is the conditional mean, E[y|F].

Now consider E[y|F]B(y,ˆy). It's a simple task of reverse-engineering. We need the f.o.c. to be of the form
const×(E[y|F]ˆy)=0,

so that the optimal forecast is the conditional mean, E[y|F]. Inspection reveals that B(y,ˆy) (and only B(y,ˆy)) does the trick.

One might still want more intuition for the optimality of the conditional mean under Bregman loss, despite its asymmetry.  The answer, I conjecture, is that the Bregman family is not asymmetric! At least not for an appropriate definition of asymmetry in the general L(y,ˆy) case, which is more complicated and subtle than the L(e) case.  Asymmetric loss plots like those in Patton (2014), on which I reported last week, are for fixed y (in Patton's case, y=2 ), whereas for a complete treatment we need to look across all y. More on that soon.

[I would like to thank -- without implicating -- Minchul Shin for helpful discussions.]

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.