Wednesday, April 30, 2014
Student Advice II: How to Give a Seminar
Check out Jesse Shapiro's view. He calls it "How to Give an Applied Micro Talk," but it's basically relevant everywhere. Of course reasonable people might quibble with a few things said, or regret the absence of a few things unsaid, but overall it's apt and witty. I love the "No pressure though" (you'll have to find it for yourself).
Monday, April 28, 2014
More on Kaggle Forecasting Competitions: Performance Assessment and Forecast Combination
Here are a few more thoughts on Kaggle competitions, continuing my earlier Kaggle post.
It's a shame that Kaggle doesn't make available (post-competition) the test-sample data and the set of test-sample forecasts submitted. If they did, then lots of interesting things could be explored. For example:
(1) Absolute aspects of performance. What sorts of out-of-sample accuracy are attainable across different fields of application? How are the forecast errors distributed, both over time and across forecasters within a competition, and also across competitions -- Gaussian, fat-tailed, skewed? Do the various forecasts pass Mincer-Zarnowitz tests?
(2) Relative aspects of performance. Within and across competitions, what is the distribution of accuracy across forecasters? Are accuracy differences across forecasters statistically significant? Related, is the winner statistically significantly more accurate than a simple benchmark?
(3) Combination. What about combining forecasts? Can one regularly find combinations that outperform all or most individual forecasts? What combining methods perform best? (Simple averages or medians could be explored instantly. Exploration of "optimal" combinations would require estimating weights from part of the test sample.)
It's a shame that Kaggle doesn't make available (post-competition) the test-sample data and the set of test-sample forecasts submitted. If they did, then lots of interesting things could be explored. For example:
(1) Absolute aspects of performance. What sorts of out-of-sample accuracy are attainable across different fields of application? How are the forecast errors distributed, both over time and across forecasters within a competition, and also across competitions -- Gaussian, fat-tailed, skewed? Do the various forecasts pass Mincer-Zarnowitz tests?
(2) Relative aspects of performance. Within and across competitions, what is the distribution of accuracy across forecasters? Are accuracy differences across forecasters statistically significant? Related, is the winner statistically significantly more accurate than a simple benchmark?
(3) Combination. What about combining forecasts? Can one regularly find combinations that outperform all or most individual forecasts? What combining methods perform best? (Simple averages or medians could be explored instantly. Exploration of "optimal" combinations would require estimating weights from part of the test sample.)
Friday, April 25, 2014
Yield Curve Modeling Update
An earlier post, DNS/AFNS Yield Curve Modeling FAQs, ended with:
"What next? Job 1 is flexible incorporation of stochastic volatility, moving from \(A_0(N)\) to \(A_x(N)\) for \(x>0\), as bond yields are most definitely conditionally heteroskedastic. Doing so is important for everything from estimating time-varying risk premia to forming correctly-calibrated interval and density forecasts. Work along those lines is starting to appear. Christensen-Lopez-Rudebusch (2010), Creal-Wu (2013) and Mauabbi (2013) are good recent examples."
Good news. Creal-Wu (2013) is now Creal-Wu (2014), revised and extended to allow both spanned and unspanned stochastic volatility. Really nice stuff.
"What next? Job 1 is flexible incorporation of stochastic volatility, moving from \(A_0(N)\) to \(A_x(N)\) for \(x>0\), as bond yields are most definitely conditionally heteroskedastic. Doing so is important for everything from estimating time-varying risk premia to forming correctly-calibrated interval and density forecasts. Work along those lines is starting to appear. Christensen-Lopez-Rudebusch (2010), Creal-Wu (2013) and Mauabbi (2013) are good recent examples."
Good news. Creal-Wu (2013) is now Creal-Wu (2014), revised and extended to allow both spanned and unspanned stochastic volatility. Really nice stuff.
Monday, April 21, 2014
On Kaggle Forecasting Competitions, Part 1: The Hold-Out Sample(s)
Kaggle competitions are potentially pretty cool. Kaggle supplies in-sample data ("training data"), and you build a model and forecast out-of-sample data that they withhold ("test data"). The winner gets a significant prize, often $100,000.00 or more. Kaggle typically runs several such competitions simultaneously.
The Kaggle paradigm is clever because it effectively removes the ability for modelers to peek at the test data, which is a key criticism of model-selection procedures that claim to insure against finite-sample over-fitting by use of split samples. (See my earlier post, Comparing Predictive Accuracy, Twenty Years Later, and the associated paper of the same name.)
Well, sort of. Actually, Kaggle partly reveals part of the test data. In the time before a competition deadline, participants are typically allowed to submit one forecast per day, which Kaggle scores against part of the test data. Then, when the deadline arrives, forecasts are actually scored against the remaining test data. Suppose, for example, that there are 100 observations in total. Kaggle gives you 1, ..., 60 (training) and holds out 61, ..., 100 (test). But each day before the deadline, you can submit a forecast for 61, ..., 75, which they score against the held-out realization of 61,..., 75 and use to update the "leaderboard." Then when the deadline arrives, you submit your forecast for 61, .., 100, but they score it only against the truly held-out realizations 76, ..., 100. So honesty is enforced for 76, ..., 100 (good) , but convoluted games are played with 61, ..., 75 (bad). Is having a leaderboard really that important? Why not cut the games? Simply give people 1, ..., 75 and ask them to forecast 76, ..., 100.
To be continued.
The Kaggle paradigm is clever because it effectively removes the ability for modelers to peek at the test data, which is a key criticism of model-selection procedures that claim to insure against finite-sample over-fitting by use of split samples. (See my earlier post, Comparing Predictive Accuracy, Twenty Years Later, and the associated paper of the same name.)
Well, sort of. Actually, Kaggle partly reveals part of the test data. In the time before a competition deadline, participants are typically allowed to submit one forecast per day, which Kaggle scores against part of the test data. Then, when the deadline arrives, forecasts are actually scored against the remaining test data. Suppose, for example, that there are 100 observations in total. Kaggle gives you 1, ..., 60 (training) and holds out 61, ..., 100 (test). But each day before the deadline, you can submit a forecast for 61, ..., 75, which they score against the held-out realization of 61,..., 75 and use to update the "leaderboard." Then when the deadline arrives, you submit your forecast for 61, .., 100, but they score it only against the truly held-out realizations 76, ..., 100. So honesty is enforced for 76, ..., 100 (good) , but convoluted games are played with 61, ..., 75 (bad). Is having a leaderboard really that important? Why not cut the games? Simply give people 1, ..., 75 and ask them to forecast 76, ..., 100.
To be continued.
Friday, April 18, 2014
Monday, April 14, 2014
Frequentists vs. Bayesians on the Exploding Sun
Time for something light. Check out xkcd.com, "A webcomic of romance, sarcasm, math, and language," written by a literate former NASA engineer. Really fine stuff. Thanks to my student M.D. for introducing me to it. Here's one on Fisher vs. Bayes:
Monday, April 7, 2014
Point Forecast Accuracy Evaluation
Here's a new one for your reading pleasure. Interesting history. Minchul and I went in trying to escape the expected loss minimization paradigm. We came out realizing that we hadn't escaped, but simultaneously, that not all loss functions are created equal. In particular, there's a direct and natural connection between our stochastic error divergence (SED) and absolute-error loss, elevating the status of absolute-error loss in our minds and perhaps now making it our default benchmark of choice. Put differently, "quadratic loss is for squares." (Thanks to Roger Koenker for the cute mantra.)
Diebold, F.X. and Shin, M. (2014), "Assessing Point Forecast Accuracy by Stochastic Divergence from Zero," PIER Working Paper 14-011, Department of Economics, University of Pennsylvania.
Abstract: We propose point forecast accuracy measures based directly on the divergence of the forecast-error c.d.f. F(e) from the unit step function at 0, and we explore several variations on the basic theme. We also provide a precise characterization of the relationship between our approach of stochastic error divergence (SED) minimization and the conventional approach of expected loss minimization. The results reveal a particularly strong connection between SED and absolute-error loss and generalizations such as the ``check function" loss that underlies quantile regression.
Diebold, F.X. and Shin, M. (2014), "Assessing Point Forecast Accuracy by Stochastic Divergence from Zero," PIER Working Paper 14-011, Department of Economics, University of Pennsylvania.
Abstract: We propose point forecast accuracy measures based directly on the divergence of the forecast-error c.d.f. F(e) from the unit step function at 0, and we explore several variations on the basic theme. We also provide a precise characterization of the relationship between our approach of stochastic error divergence (SED) minimization and the conventional approach of expected loss minimization. The results reveal a particularly strong connection between SED and absolute-error loss and generalizations such as the ``check function" loss that underlies quantile regression.
Subscribe to:
Posts (Atom)