Thursday, February 3, 2022

Mind-Boggling Machine Learning for Portfolio Allocation

 If you do nothing else today, check out, "The Virtue of Complexity in Machine Learning Portfolios," by Kelly, Malamud, and Zhou, and some of the related percolating recent statistics literature like Hastie et al (2020).  As the authors say in the intro:

In this paper, we study behavior of portfolios in the high model complexity regime where the number of predictors exceeds the number of observations (P > T). In this case, standard regression logic no longer holds because the regressor inverse covariance matrix is not defined. However, the pseudo-inverse is defined, and it corresponds to a limiting ridge regression with infinitesimal shrinkage, or the “ridgeless” limit. An emergent statistics and machine learning literature shows that, in the high complexity regime, ridgeless regression can achieve accurate out-of-sample forecasts despite fitting the training data perfectly. This seemingly counterintuitive phenomenon is sometimes called “benign overfit” (Bartlett et al., 2020; Tsigler and Bartlett, 2020). 

We analyze related phenomena in the context of machine learning portfolios. We establish the striking theoretical result that market timing strategies based on ridgeless least squares predictions generate positive Sharpe ratio improvements for arbitrarily high levels of model complexity. Stated more plainly, when the true data generating process (DGP) is highly complex—i.e., it has many more parameters than there are training data observations—one might think that a timing strategy based on ridgeless regression is bound to fail. After all, it exactly fits the training data with zero error. Surprisingly, this intuition is wrong. We show that strategies based on extremely high-dimensional models can thrive out-of-sample, even with minimal ridge regularization.

If the paper had been written by almost anyone other than Kelly et al., I might have stopped reading and tossed it in the trash, given the seemingly-preposterous claims.  But they turn out to be true!  Or so it seems, buttressed by solid theory, extensive Monte Carlo, and extensive real empirical work.  Amazing!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.