Friday, August 30, 2013

Econometrica, Heteroskedasticity and the Greeks

Over at Leisure of the Theory ClassRicky Vohra asks why Econometrica is spelled with a "c" rather than a "k." As he notes:
The journal Econometrica is spelt as I just wrote. The journal Biometrika, however, has a "'k" in place of the letter "c." Biometrika has it right, and Pearson its founder credits Edgeworth for this, " a good German he knew that the Greek 'k' is not a modern 'c', and, if any of you at any time wonder where the 'k' in Biometrika comes from, I will frankly confess that I stole it from Edgeworth. Whenever you see that 'k' call to mind dear old Edgeworth."
Ricky (and Edgeworth and Pearson) are correct. Interestingly, Hu McCulloch has a long-ago note on the same topic, in Hu's case dealing with heteroscedasticity vs. heteroskedasticity. It's cutely titled "Heteros*edasticity," and he makes clear that "*" should be "k," for the same Greek root reasons. McCulloch's note is little-known, or perhaps well-known but widely-ignored by philistine econometricians, as "c" is still often used despite the note's publication in one of the profession's most elite and visible journals. You guessed it -- such superb irony -- Econometrica!

Check it out: McCulloch, J. Huston, "On Heteros*edasticity," Econometrica, 53, 1985, p. 483. It's less than one page, authoritative and also humorous, beginning with "The most pressing issue in econometric orthography today is whether heteros*edasticity should be spelled with a 'c' or with a 'k'."

Sunday, August 25, 2013

Exponential Smoothing Again: Structural Change

Here's another fascinating example of the ongoing and surprisingly modern magic of exponential smoothing (ES).

In my last post I asked you to read the latest from Neil Shephard, on stochastic volatility and exponential smoothing. Now read the latest from Hashem PesaranAndreas Pick and Mikhail Pranovich (P^3), "Optimal Forecasting in the Presence of Structural Breaks" (forthcoming, Journal of Econometrics), on structural change and yes, again, exponential smoothing.

Let's strip things to a starkly simple stylized case. (The basic idea generalizes to much richer environments.) Consider a time-series forecasting situation with a one-shot structural break in all model parameters at a known past time. Should you simply discard the pre-break data when estimating your model? Your first reaction might be yes, as the pre-break regime is irrelevant moving forward, and your goal is forecasting.

But the correct answer is "not necessarily." Of course using the full sample will produce a mongrel estimation blend of pre-break and post-break parameters. That is, using the full sample will produce biased estimates of the relevant post-break parameters. But using the full sample may greatly reduce variance, so estimation mean-squared error of post-break parameters may be lower, perhaps much lower, when estimating using the full sample, which may then translate into lower out-of-sample mean-squared prediction error (MSPE) when using the full-sample estimated parameters.

Suppose, to take a stark example, that the break is miniscule, and that it's near the end of a very long sample. The cost of full-sample estimation is then injection of miniscule bias in estimates of the relevant post-break parameters, whereas the benefit is massive variance reduction. That's a very favorable tradeoff under quadratic loss.

Fine. Good insight. (Interesting historical note: Hashem mentions that it originated in a conversation with the late Benoit Mandelbrot.)  But there's much more, and here's where it gets really interesting. Just as it's sub-optimal simply to discard the pre-break data, it's also sub-optimal simply to keep it. It turns out, quite intuitively, that you want to keep the pre-break data but downweight it, and the MSPE-optimal weight-decay scheme turns out to be exponential! In less-rigid forecasting environments involving continuous structural evolution (also considered by P^3), that basically amounts to exponential smoothing. Very, very cool.

Strangely, P^3 don't attempt to connect to the well-known and clearly-related work of Mike Clements and David Hendry (CH), which P^3 clarify and extend significantly. In several papers, CH take ES as an exogenously existing method and ask why it's often hard to beat, and closely related, why the martingale "no change" forecast is often hard to beat. See, e.g., section 7 of their survey, "Forecasting with Breaks," in Elliott, Granger and Timmermann, (eds.) Handbook of Economic Forecasting, 2005, Elsevier. (Sorry, not even a working paper version available free online, thanks to Elsevier's greed.) The CH answer is that breaks happen often in economics, and that ES performs well because it adapts to breaks quickly. P^3 instead ask what approach is optimal in various break environments and arrive endogenously at ES. Moreover, the P^3 results are interestingly nuanced. For example, ES is closest to optimality in situations of continuous structural changenot in situations of discrete breaks as emphasized by CH. In any event, the P^3 and CH results are marvelously complementary.

Exponential smoothing is again alive and well, from yet another, and very different, perspective.

Monday, August 19, 2013

Exponential Smoothing and Stochastic Volatility

Exponential smoothing is alive and well, and evolving. For the latest, check out Neil Shephard's important 2013 working paper, "Martingale Unobserved Component Models." (Fortunately for North America, the link to Neil's home page will soon be outdated -- he's moving to Harvard as I write this. Congratulations to Harvard's Departments of Economics and Statistics, and of course to Neil as well.)

In part the paper is interesting because it provides useful perspective on state-space modeling, filtering and estimation from the early linear/Gaussian days of Kalman filtering to the recent nonlinear/non-Gaussian days of particle filtering. There's also some interesting personal reflection. (Background: the paper is for a forthcoming Andrew Harvey Festschrift, and Neil was Andrew's student.)

But the paper's original contribution is even more interesting. It puts exponential smoothing in fresh and fascinating perspective, by considering it in a stochastic volatility (SV) environment.

As is well known, exponential smoothing (ES) is closely related to state-space models of unobserved components. In particular, ES is the MSE-optimal filter when the data-generating process is a latent random walk signal buried in white noise measurement error. The optimal smoothing parameter, moreover, depends only on the signal / noise ratio (that is, the random walk error variance relative to the measurement error variance).

Neil endows the errors with SV, in which case the signal / noise ratio and hence the optimal smoothing parameter are time-varying. The particle filter facilitates both optimal parameter estimation and optimal tracking of the time-varying volatility, making for real-time ES with an optimally time-varying smoothing parameter.  Very cool, both in principle and practice!

More generally, it's interesting that ES remains alive and useful and still the focus of important research, some half-century after its introduction. Seemingly-naive methods sometimes reveal themselves to be sophisticated and adaptable.

Monday, August 12, 2013

Krugman's "Very Serious Person" (VSP)

Paul Krugman's term "VSP" is simply wonderful: so concise and apt, capturing a personage previously vaguely sensed but never fully grasped. And of course it's funny too. Hence it's even better than classics from decades past, like WASP (coined, by the way, by the late great Penn sociologist, E. Digby Baltzell), which was concise and apt but not funny. I give Krugman a sociological gold star just for coining the term.

For me the term resonates broadly, describing generic inside-the-beltway types, especially economist types, as in Krugman's usage referring to Larry Summers, "He’s been carefully cultivating an image as a Very Serious Person" (31 July 2013).]

Sadly, the Washington VSPdom sucks away some of the finest scientific talent in economics. The reason is misguided professorial benefit-cost comparisons that naively inflate the benefits of "helping the world" and deflate the costs of abandoning research. A top researcher doing path-breaking research is helping the world! And who's helping the world more -- a top researcher doing path-breaking research, or that same researcher transformed into a dark-suited VSP roaming the halls of the Old Executive Office Building, jockeying for Very Serious opportunities to attend Very Serious meetings to discuss Very Serious things?

I'm grateful to Krugman for doing his benefit-cost calculations correctly, refusing to let VSPdom suck him away. His punditry -- like it or hate it -- is immensely more socially valuable than whatever he might contribute as Big Kahuna at the Department of Whatever. Here's to more top academics joining the resistance.

Sunday, August 11, 2013

Universities, Parking, and Bonding

Ya gotta love Bill Barnett's email sig:

"A university is `a series of individual faculty entrepreneurs held together by a common grievance over parking.`

Clark Kerr, President, University of California, 1958-1967."

I refuse to drive to work anymore, for a variety of reasons, but still I relate.

Just a little too long to Tweet.  Sorry...

Saturday, August 10, 2013

Congrats to Marc Nerlove, AEA Distinguished Fellow

Congratulations to Marc Nerlove, American Economic Association Distinguished Fellow, as formally announced in the current issue of the American Economic Review. Marc was my Penn Ph.D. advisor in the 1980s and Penn colleague in the 1990s.

The American Economic Association's blurb is actually quite good. I've reproduced it below, with a few typos corrected. For interesting additional information, see Marc's Econometric Theory interview by Eric Ghysels, also Marc's student.

In a career spanning 58 years and counting, Marc Nerlove developed widely used econometric methods in the course of addressing important empirical problems. In early research, he developed dynamic models of producer supply that enabled economists to distinguish and to quantify lags due to costs of adjustment and lags due to expectations of future events. In a series of influential papers, he applied these tools to the dynamics of agricultural supply and created a template that continues to be used on a wide scale in studies around the world. His framework made it possible to identify both short-run and long-run elasticities of supply in response to product price. 

Nerlove pioneered the development of modern time series methods including the application of spectral analysis to aggregate economic time series and the development of unobserved components and time series factor models that formalized the Burns-Mitchell decompositions into trend, cycle, and irregular components. This research stimulated the time series index models by Sargent, Sims, Geweke, Engle, Stock, Watson, and others. 

Nerlove’s research on the electricity industry in the early 1960s was the first application of duality theory to estimate production functions. He estimated cost functions and from them obtained estimates of firm technology. His magisterial book, Estimation and Identification of Cobb-Douglas Production Functions, helped to introduce the concept of partial identification into econometrics and is a prototype for synthesizing economics and statistics to address important economic questions. 

Nerlove pioneered the analysis of panel data in econometrics. His fundamental work with Balestra and his subsequent solo research developed widely used frameworks for analyzing dynamic models for panel data in the presence of individual-specific temporally persistent unobservables. The research arose from a practical problem in analyzing and interpreting estimates of the demand for durable goods. 

Nerlove, with Razin, has also done basic research on economic demography and life cycle fertility in dynamic equilibrium settings with overlapping generations.

Throughout his long and distinguished career, Nerlove has exemplified the best in applied economics. He brings rigor to the study of important economic problems. He developed empirically relevant econometric tools and showed by example the importance of using economics and econometrics to analyze economic data. Marc Nerlove’s appointment as Distinguished Fellow of the American Economic Association recognizes his outstanding contributions to economics and econometrics.

Monday, August 5, 2013

Still More on the Strange American Estimator: Indirect Inference, MLE and the Particle Filter

In my last post I praised indirect inference (IE) for its ease-of-use: just simulate the model and fit a simple auxiliary model to the simulated and real-world data, after which evaluation of the objective is immediate. In contrast, likelihood analysis and MLE can be challenging, as the likelihood may be difficult to derive and evaluate.

Some might wonder whether that’s a completely fair assessment in modern time-series contexts. In particular, one might claim that evaluation of the likelihood is now as trivial as simulating. As Andrew Harvey and others have emphasized for decades, for any linear model cast in finite-dimensional state-space form one can simply run the Kalman filter and then evaluate the Gaussian likelihood via a prediction-error decomposition. And much more recently, thanks to path-breaking work by Arnaud Doucet and others (e.g., JRSS B, 2010, 1-33), filtering now also provides full likelihood analysis in general non-linear / non-Gaussian environments. In particular, so-called "particle MCMC" -- a simulation method! -- does the trick. So it would seem that likelihood analysis is made trivial by simulation, just as IE is made trivial by simulation. 

Hence we can dispense with comparatively-inefficient IE, right?

Whoaaa…not so fast. The points I made in an earlier post remain valid.

First, IE simulation is good-old “model simulation,” typically simple and always a good check of model understanding. Successful particle MCMC, in contrast, is a different and often-recalcitrant simulation beast.

Second, even if particle MCMC does make MLE as mechanical as simple model simulation (and again, that’s not at all clear), desirable consistency properties under misspecification are generally more easily achieved for IE. Under misspecification, the necessity of thinking hard about which moments to match, or which auxiliary model to use, is a good thing.