Below are the slides from my discussion of Helene Rey et al., "Answering the Queen: Machine Learning and Financial Crises", which I gave a few days ago at a fine NBER IFM meeting (program and clickable papers here). I also discussed it in June at the BIS annual research meeting in Zurich. The key development since the earlier mid-summer draft is that they actually implemented a real-time financial crisis prediction analysis for France using vintage data, as opposed to quasi-real-time using final-revised data. Moving to real time of course somewhat degrades the quasi-real-time results, but they largely hold up. Very impressive. Therefore I now offer suggestions for improving evaluation credibility in the remaining cases where vintage datasets are not yet available. On the other hand, I also note how subtle but important look-ahead biases can creep in even when vintage data are available and used. I conclude that the only fully-convincing evaluation involves implementing their approach moving forward, recording the results, and building up a true track record.
Sunday, October 27, 2019
Online Learning vs. TVP Forecast Combination
[This post is based on the first slide (below) of a discussion of Helene Rey et al., which I gave a few days ago at a fine NBER IFM meeting (program and clickable papers here). The paper is fascinating and impressive, and I'll blog on it separately next time. But the slide below is more of a side rant on general issues, and I skipped it in the discussion of Rey et al. to be sure to have time to address their particular issues.]
Quite a while ago I blogged here on the ex ante expected loss minimization that underlies traditional econometric/statistical forecast combination, vs. the ex post regret minimization that underlies "online learning" and related "machine learning" methods. Nothing has changed. That is, as regards ex post regret minimization, I'm still intrigued, but I'm still not persuaded.
And there's another thing that bothers me. As implemented, ML-style online learning and traditional econometric-style forecast combination with time-varying parameters (TVPs) are almost identical: just projection (regression) of realizations on forecasts, reading off the combining weights as the regression coefficients. OF COURSE we can generalize to allow for time-varying combining weights, non-linear combinations, regularization in high dimensions, etc., and hundreds of econometrics papers have addressed and explored those issues. Yet the ML types seem to think they invented everything, and too many economists are buying it. Rey et al., for example, don't so much as mention the econometric forecast combination literature, which by now occupies large chapters of leading textbooks, like Elliott and Timmermann at the bottom of the slide below.
Quite a while ago I blogged here on the ex ante expected loss minimization that underlies traditional econometric/statistical forecast combination, vs. the ex post regret minimization that underlies "online learning" and related "machine learning" methods. Nothing has changed. That is, as regards ex post regret minimization, I'm still intrigued, but I'm still not persuaded.
And there's another thing that bothers me. As implemented, ML-style online learning and traditional econometric-style forecast combination with time-varying parameters (TVPs) are almost identical: just projection (regression) of realizations on forecasts, reading off the combining weights as the regression coefficients. OF COURSE we can generalize to allow for time-varying combining weights, non-linear combinations, regularization in high dimensions, etc., and hundreds of econometrics papers have addressed and explored those issues. Yet the ML types seem to think they invented everything, and too many economists are buying it. Rey et al., for example, don't so much as mention the econometric forecast combination literature, which by now occupies large chapters of leading textbooks, like Elliott and Timmermann at the bottom of the slide below.
Thursday, October 24, 2019
Volatility and Risk Institute
NYU's Volatility Institute is expanding into the Volatility and Risk Institute (VRI). The four key initiatives are Climate Risk (run by Johannes Stroebel), Cyber Risk (run by Randal Milch), Financial Risk (run by Viral Acharya), and Geopolitical Risk (run by Thomas Philippon). Details here. This is a big deal. Great to see climate given such obvious and appropriate prominence. And notice how interconnected are climate, financial, and geopolitical risks.
The following is adapted from an email from Rob Engle and Dick Berner:
The Volatility Institute and its V-lab have, for the past decade, assessed risk through the lens of financial volatility, providing real-time measurement and forecasts of volatility and correlations for a wide spectrum of financial assets, and SRISK, a powerful measure of the resilience of the global financial system. Adopting an interdisciplinary approach, the VRI will build on that foundation to better assess newly emerging nonfinancial and financial risks facing today’s business leaders and policymakers, including climate-related, cyber/operational and geopolitical risks, as well as the interplay among them.
The VRI will be co-directed by two NYU Stern faculty: Nobel Laureate Robert Engle, Michael Armellino Professor of Management and Financial Services and creator of the V-lab; and Richard Berner, Professor of Management Practice and former Director of the Office of Financial Research, established by the Dodd–Frank Wall Street Reform and Consumer Protection Act to help promote financial stability by delivering high-quality financial data, standards and analysis to policymakers and the public.
The VRI will serve as the designated hub to facilitate, support and promote risk-related research, and external and internal engagement among scholars, practitioners and policymakers. To realize its interdisciplinary potential, the VRI will engage the expertise of faculty across New York University, including at the Courant Institute of Mathematical Sciences, Law School, Tandon School of Engineering, Wagner School of Public Policy and Wilf Family Department of Politics in the Faculty of Arts & Science.
Saturday, October 19, 2019
Missing data in Factor Models
Serena Ng's site is back. Her new paper with Jushan Bai, "Matrix Completion, Counterfactuals, and Factor Analysis of Missing Data", which I blogged about earlier, is now up on arXiv, here. Closely related, Marcus Pelger just sent me his new paper with Ruoxuan Xiong, "Large Dimensional Latent Factor Modeling with Missing
Observations and Applications to Causal Inference", which I look forward to reading. One is dated Oct 15 and one is dated Oct 16. Science is rushing forward!
Saturday, October 12, 2019
Interval Prediction
Last time I blogged on Serena's amazing presentation from Per's Chicago meeting,
https://fxdiebold.blogspot.com/2019/10/large-dimensional-factor-analysis-with.html
But I was equally blown away by Rina's amazing "Predictive inference with the jackknife+".
Rina Foygel Barber∗ , Emmanuel J. Canes† , Aaditya Ramdas‡ , Ryan J. Tibshirani‡§
https://arxiv.org/pdf/1905.02928.pdf.
Correctly calibrated prediction intervals despite arbitrary model misspecification!
Of course I'm left with lots of questions. They have nice correct-coverage theorems. What about length? I would like theorems (not just simulations) as regards shortest length intervals with guaranteed correct coverage. Their results seem to require iid or similar exchangability environments. What about heteroskedastic environments where prediction error variance depends on covariates? What about time series environments?
Then, quite amazingly, "Distributional conformal prediction" by Victor Chernozukov et al., arrived in my mailbox.
https://arxiv.org/pdf/1909.07889.pdf
It is similarly motivated and may address some of my questions.
Anyway, great developments for interval prediction!
https://fxdiebold.blogspot.com/2019/10/large-dimensional-factor-analysis-with.html
But I was equally blown away by Rina's amazing "Predictive inference with the jackknife+".
Rina Foygel Barber∗ , Emmanuel J. Canes† , Aaditya Ramdas‡ , Ryan J. Tibshirani‡§
https://arxiv.org/pdf/1905.02928.pdf.
Correctly calibrated prediction intervals despite arbitrary model misspecification!
Of course I'm left with lots of questions. They have nice correct-coverage theorems. What about length? I would like theorems (not just simulations) as regards shortest length intervals with guaranteed correct coverage. Their results seem to require iid or similar exchangability environments. What about heteroskedastic environments where prediction error variance depends on covariates? What about time series environments?
Then, quite amazingly, "Distributional conformal prediction" by Victor Chernozukov et al., arrived in my mailbox.
https://arxiv.org/pdf/1909.07889.pdf
It is similarly motivated and may address some of my questions.
Anyway, great developments for interval prediction!
Monday, October 7, 2019
Carbon Offsets
At the end of a recently-received request that I submit my receipts from a conference trip last week:
... Let me know if you'd like us to purchase carbon offsets for your miles, and deduct the amount (or any amount) from your reimbursement. We’ll do it on your behalf. Thank You! For example: NYC-Chicago round trip = $5.72. Oxford-Chicago round trip = $35.77. Nantes - Chicago round trip = $37.08 Bergen - Chicago round trip = $35.44A first for me! Ironically, I spoke on some of my new climate econometrics work with Glenn Rudebusch. (It was not a climate conference per se, and mine was the only climate paper.)
Sunday, October 6, 2019
Large Dimensional Factor Analysis with Missing Data
Back from the very strong Stevanovich meeting. Program and abstracts here. One among many highlights was:
Large Dimensional Factor Analysis with Missing Data
Presented by Serena Ng, (Columbia, Dept. of Economics)
Abstract:
This paper introduces two factor-based imputation procedures that will fill
missing values with consistent estimates of the common component. The
first method is applicable when the missing data are bunched. The second
method is appropriate when the data are missing in a staggered or
disorganized manner. Under the strong factor assumption, it is shown that
the low rank component can be consistently estimated but there will be at
least four convergence rates, and for some entries, re-estimation can
accelerate convergence. We provide a complete characterization of the
sampling error without requiring regularization or imposing the missing at
random assumption as in the machine learning literature. The
methodology can be used in a wide range of applications, including
estimation of covariances and counterfactuals.
This paper just blew me away. Re-arrange the X columns to get all the "complete cases across people" (tall block) in the leftmost columns, and re-arrange the X rows to get all the "complete cases across variables" (wide block) in the topmost rows. The intersection is the "balanced" block in the upper left. Then iterate on the tall and wide blocks to impute the missing data in the bottom right "missing data" block. The key figure that illustrates the procedure provided a real "eureka moment" for me. Plus they have a full asymptotic theory as opposed to just worst-case bounds.
Kudos!
I'm not sure whether the paper is circulating yet, and Serena's web site vanished recently (not her fault -- evidently Google made a massive error), but you'll soon be able to get the paper one way or another.
Large Dimensional Factor Analysis with Missing Data
Presented by Serena Ng, (Columbia, Dept. of Economics)
Abstract:
This paper introduces two factor-based imputation procedures that will fill
missing values with consistent estimates of the common component. The
first method is applicable when the missing data are bunched. The second
method is appropriate when the data are missing in a staggered or
disorganized manner. Under the strong factor assumption, it is shown that
the low rank component can be consistently estimated but there will be at
least four convergence rates, and for some entries, re-estimation can
accelerate convergence. We provide a complete characterization of the
sampling error without requiring regularization or imposing the missing at
random assumption as in the machine learning literature. The
methodology can be used in a wide range of applications, including
estimation of covariances and counterfactuals.
This paper just blew me away. Re-arrange the X columns to get all the "complete cases across people" (tall block) in the leftmost columns, and re-arrange the X rows to get all the "complete cases across variables" (wide block) in the topmost rows. The intersection is the "balanced" block in the upper left. Then iterate on the tall and wide blocks to impute the missing data in the bottom right "missing data" block. The key figure that illustrates the procedure provided a real "eureka moment" for me. Plus they have a full asymptotic theory as opposed to just worst-case bounds.
Kudos!
I'm not sure whether the paper is circulating yet, and Serena's web site vanished recently (not her fault -- evidently Google made a massive error), but you'll soon be able to get the paper one way or another.