Monday, August 1, 2016

On the Superiority of Observed Information

Earlier I claimed that "Efron-Hinkley holds up -- observed information dominates estimated expected information for finite-sample MLE inference." Several of you have asked for elaboration.

The earlier post grew from a 6 AM Hong Kong breakfast conversation with Per Mykland (with both of us suffering from 12-hour jet lag), so I wanted to get some detail from him before elaborating, to avoid erroneous recollections. But it's basically as I recalled -- mostly coming from the good large-deviation properties of the likelihood ratio. The following is adapted from that conversation and a subsequent email exchange. (Any errors or omissions are entirely mine.)

There was quite a bit of work in the 1980s and 1990s. It was kicked off by Efron and Hinkley (1978). The main message is in their plot on p. 460, suggesting that the observed info was a more accurate estimator. Research gradually focused on the behavior of the likelihood ratio (\(LR\)) statistic and its signed squared root \(R=sgn(\hat{\theta} - \theta ) \sqrt{LR}\), which was seen to have good conditionality properties, local sufficiency, and most crucially, good large-deviation properties.  (For details see Mykland (1999), Mykland (2001), and the references there.)

The large-deviation situation is as follows.  Most statistics have cumulant behavior as in Mykland (1999) eq. (2.1).  In contrast, \(R\) has cumulant behavior as in Mykland (1999) eq. (2.2), which yields the large deviation properties of Mykland (1999) Theorem 1. (Also see Theorems 1 and 2 of Mykland (2001).)