Sunday, August 20, 2017

Bayesian Random Projection (More on Terabytes of Economic Data)

Some additional thoughts related to Serena Ng's World Congress piece (earlier post here, with a link to her paper):

The key newish dimensionality-reduction strategies that Serena emphasizes are random projection and leverage score sampling.  In a regression context both are methods for optimally approximating an NxK "X matrix" with an Nxk X matrix, where k<<K. They are very different and there are many issues. Random projection delivers a smaller X matrix with columns that are linear combinations of those of the original X matrix, as for example with principal-component regression, which can sometimes make for difficult interpretation.  Leverage score sampling, in contrast, delivers a smaller X matrix with columns that are simply a subset of those of those of the original X matrix, which feels cleaner but has issues of its own.

Anyway, a crucial observation is that for successful predictive modeling we don't need deep interpretation, so random projection is potentially just fine -- if it works, it works, and that's an empirical matter.  Econometric extensions  (e.g., to VAR's) and evidence (e.g., to macro forecasting) are just now emerging, and the results appear encouraging.  An important recent contribution in that regard is Koop, Korobilis, and Pettenuzzo (in press), which significantly extends and applies earlier work of Guhaniyogi and Dunson (2015) on Bayesian random projection ("compression").  Bayesian compression fits beautifully in a MCMC framework (again see Koop et al.), including model averaging across multiple random projections, attaching greater weight to projections that forecast well.  Very exciting!