Monday, February 19, 2018

More on Neural Nets and ML

I earlier mentioned Matt Taddy's "The Technological Elements of Artificial Intelligence" (ungated version here).

Among other things the paper has good perspective on the past and present of neural nets. (Read:  his views mostly, if not exactly, match mine...)  

Here's my personal take on some of the history vis a vis econometrics:

Econometricians lost interest in NN's in the 1990's. The celebrated Hal White et al. proof of NN non-parametric consistency as NN width (number of neurons) gets large at an appropriate rate was ultimately underwhelming, insofar as it merely established for NN's what had been known for decades for various other non-parametric estimators (kernel, series, nearest-neighbor, trees, spline, etc.). That is, it seemed that there was nothing special about NN's, so why bother? 

But the non-parametric consistency focus was all on NN width; no one thought or cared much about NN depth. Then, more recently, people noticed that adding NN depth (more hidden layers) could be seriously helpful, and the "deep learning" boom took off. 

Here are some questions/observations on the new "deep learning":

1.  Adding NN depth often seems helpful, insofar as deep learning often seems to "work" in various engineering applications, but where/what are the theorems? What can be said rigorously about depth?

2. Taddy emphasizes what might be called two-step deep learning. In the first step, "pre-trained" hidden layer nodes are obtained based on unsupervised learning (e.g., principle components (PC)) from various sets of variables. And then the second step proceeds as usual. That's very similar to the age-old idea of PC regression. Or, in multivariate dynamic environments and econometrics language, "factor-augmented vector autoregression" (FAVAR), as in Bernanke et al. (2005). So, are modern implementations of deep NN's effectively just nonlinear FAVAR's? If so, doesn't that also seem underwhelming, in the sense of -- dare I say it -- there being nothing really new about deep NN's?

3. Moreover, PC regressions and FAVAR's have issues of their own relative to one-step procedures like ridge or LASSO.  See this and this