Here are a few more thoughts on Kaggle competitions, continuing my earlier Kaggle post.
It's a shame that Kaggle doesn't make available (post-competition) the test-sample data and the set of test-sample forecasts submitted. If they did, then lots of interesting things could be explored. For example:
(1) Absolute aspects of performance. What sorts of out-of-sample accuracy are attainable across different fields of application? How are the forecast errors distributed, both over time and across forecasters within a competition, and also across competitions -- Gaussian, fat-tailed, skewed? Do the various forecasts pass Mincer-Zarnowitz tests?
(2) Relative aspects of performance. Within and across competitions, what is the distribution of accuracy across forecasters? Are accuracy differences across forecasters statistically significant? Related, is the winner statistically significantly more accurate than a simple benchmark?
(3) Combination. What about combining forecasts? Can one regularly find combinations that outperform all or most individual forecasts? What combining methods perform best? (Simple averages or medians could be explored instantly. Exploration of "optimal" combinations would require estimating weights from part of the test sample.)
It's a shame that Kaggle doesn't make available (post-competition) the test-sample data and the set of test-sample forecasts submitted. If they did, then lots of interesting things could be explored. For example:
(1) Absolute aspects of performance. What sorts of out-of-sample accuracy are attainable across different fields of application? How are the forecast errors distributed, both over time and across forecasters within a competition, and also across competitions -- Gaussian, fat-tailed, skewed? Do the various forecasts pass Mincer-Zarnowitz tests?
(2) Relative aspects of performance. Within and across competitions, what is the distribution of accuracy across forecasters? Are accuracy differences across forecasters statistically significant? Related, is the winner statistically significantly more accurate than a simple benchmark?
(3) Combination. What about combining forecasts? Can one regularly find combinations that outperform all or most individual forecasts? What combining methods perform best? (Simple averages or medians could be explored instantly. Exploration of "optimal" combinations would require estimating weights from part of the test sample.)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.