[Notice that I changed the title from "Machine Learning vs. Econometrics" to "Machine Learning and Econometrics", as the two are complements, not competitors, as this post will begin to emphasize. But I've kept the numbering, so this is number five. For others click on Machine Learning at right.]
Thanks for the overwhelming response to my last post, on Angrist-Pischke (AP). I'll have more to say on AP a few posts from now, but first I need to set the stage.
A key observation is that statistical machine learning (ML) and time-series econometrics/statistics (TS) are largely about modeling, and they largely have the same foundational perspective. Some of the key ingredients are:
-- George Box got it right: "All models are false; some are useful", so search for good approximating models, not "truth".
-- Be explicit about the loss function, that is, about what defines a "good approximating model" (e.g., 1-step-ahead out-of-sample mean-squared forecast error)
-- Respect and optimize that loss function in model selection (e.g., BIC)
-- Respect and optimize that loss function in estimation (e.g., least squares)
-- Respect and optimize that loss function in forecast construction (e.g., Wiener-Kolmogorov-Kalman)
-- Respect and optimize that loss function in forecast evaluation, comparison, and combination (e.g., Mincer-Zarnowitz evaluations, Diebold-Mariano comparisons, Granger-Ramanathan combinations).
So time-series econometrics should embrace ML -- and it is. Just look at recent work like this.
Sunday, February 26, 2017
Sunday, February 19, 2017
Econometrics: Angrist and Pischke are at it Again
Check out the new Angrist-Pischke (AP), "Undergraduate Econometrics Instruction: Through Our Classes, Darkly".
I guess I have no choice but to weigh in. The issues are important, and my earlier AP post, "Mostly Harmless Econometrics?", is my all-time most popular.
Basically AP want all econometrics texts to look a lot more like theirs. But their books and their new essay unfortunately miss (read: dismiss) half of econometrics.
Here's what AP get right:
(Goal G1) One of the major goals in econometrics is predicting the effects of exogenous "treatments" or "interventions" or "policies". Phrased in the language of estimation, the question is "If I intervene and give someone a certain treatment \({\partial x}, x \in X\), what is my minimum-MSE estimate of her \(\ \partial y\)?" So we are estimating the partial derivative \({\partial y / \partial x}\).
AP argue the virtues and trumpet the successes of a "design-based" approach to G1. In my view they make many good points as regards G1: discontinuity designs, dif-in-dif designs, and other clever modern approaches for approximating random experiments indeed take us far beyond "Stones'-age" approaches to G1. (AP sure turn a great phrase...). And the econometric simplicity of the design-based approach is intoxicating: it's mostly just linear regression of \(y\) on \(x\) and a few cleverly-chosen control variables -- you don't need a full model -- with White-washed standard errors. Nice work if you can get it. And yes, moving forward, any good text should feature a solid chapter on those methods.
Here's what AP miss/dismiss:
(Goal G2) The other major goal in econometrics is predicting \(y\). In the language of estimation, the question is "If a new person \(i\) arrives with covariates \(X_i\), what is my minimum-MSE estimate of her \(y_i\)? So we are estimating a conditional mean \(E(y | X) \), which in general is very different from estimating a partial derivative \({\partial y / \partial x}\).
The problem with the AP paradigm is that it doesn't work for goal G2. Modeling nonlinear functional form is important, as the conditional mean function \(E(y | X) \) may be highly nonlinear in \(X\); systematic model selection is important, as it's not clear a priori what subset of \(X\) (i.e., what model) might be most useful for approximating \(E(y | X) \); detecting and modeling heteroskedasticity is important (in both cross sections and time series), as it's the key to accurate interval and density prediction; detecting and modeling serial correlation is crucially important in time-series contexts, as "the past" is the key conditioning information for predicting "the future"; etc., etc, ...
I guess I have no choice but to weigh in. The issues are important, and my earlier AP post, "Mostly Harmless Econometrics?", is my all-time most popular.
Basically AP want all econometrics texts to look a lot more like theirs. But their books and their new essay unfortunately miss (read: dismiss) half of econometrics.
Here's what AP get right:
(Goal G1) One of the major goals in econometrics is predicting the effects of exogenous "treatments" or "interventions" or "policies". Phrased in the language of estimation, the question is "If I intervene and give someone a certain treatment \({\partial x}, x \in X\), what is my minimum-MSE estimate of her \(\ \partial y\)?" So we are estimating the partial derivative \({\partial y / \partial x}\).
AP argue the virtues and trumpet the successes of a "design-based" approach to G1. In my view they make many good points as regards G1: discontinuity designs, dif-in-dif designs, and other clever modern approaches for approximating random experiments indeed take us far beyond "Stones'-age" approaches to G1. (AP sure turn a great phrase...). And the econometric simplicity of the design-based approach is intoxicating: it's mostly just linear regression of \(y\) on \(x\) and a few cleverly-chosen control variables -- you don't need a full model -- with White-washed standard errors. Nice work if you can get it. And yes, moving forward, any good text should feature a solid chapter on those methods.
Here's what AP miss/dismiss:
(Goal G2) The other major goal in econometrics is predicting \(y\). In the language of estimation, the question is "If a new person \(i\) arrives with covariates \(X_i\), what is my minimum-MSE estimate of her \(y_i\)? So we are estimating a conditional mean \(E(y | X) \), which in general is very different from estimating a partial derivative \({\partial y / \partial x}\).
The problem with the AP paradigm is that it doesn't work for goal G2. Modeling nonlinear functional form is important, as the conditional mean function \(E(y | X) \) may be highly nonlinear in \(X\); systematic model selection is important, as it's not clear a priori what subset of \(X\) (i.e., what model) might be most useful for approximating \(E(y | X) \); detecting and modeling heteroskedasticity is important (in both cross sections and time series), as it's the key to accurate interval and density prediction; detecting and modeling serial correlation is crucially important in time-series contexts, as "the past" is the key conditioning information for predicting "the future"; etc., etc, ...
(Notice how often "model" and "modeling" appear in the above paragraph. That's precisely what AP dismiss, even in their abstract, which very precisely, and incorrectly, declares that "Applied econometrics ...[now prioritizes]... the estimation of specific causal effects and empirical policy analysis over general models of outcome determination".)
The AP approach to goal G2 is to ignore it, in a thinly-veiled attempt to equate econometrics exclusively with G1, which nicely feathers the AP nest. Sorry guys, but no one's buying it. That's why the textbooks continue to feature G2 tools and techniques so prominently, as well they should.
The AP approach to goal G2 is to ignore it, in a thinly-veiled attempt to equate econometrics exclusively with G1, which nicely feathers the AP nest. Sorry guys, but no one's buying it. That's why the textbooks continue to feature G2 tools and techniques so prominently, as well they should.
Monday, February 13, 2017
Predictive Loss vs. Predictive Regret
It's interesting to contrast two prediction paradigms.
A. The universal statistical/econometric approach to prediction:
Take a stand on a loss function and find/use a predictor that minimizes conditionally expected loss. Note that this is an absolute standard. We minimize loss, not some sort of relative loss.
B. An alternative approach to prediction, common in certain communities/literatures:
Take a stand on a loss function and find/use a predictor that minimizes regret. Note that this is a relative standard. Regret minimization is relative loss minimization, i.e., striving to do no worse than others.
Approach A strikes me as natural and appropriate, whereas B strikes me as as quirky and "behavioral". That is, it seems to me that we generally want tools that perform well, not tools that merely perform no worse than others.
There's also another issue, the ex ante nature of A (standing in the present, conditioning on available information, looking forward) vs. the ex post nature of B (standing in the future, looking backward). Approach A again seems more natural and appropriate.
A. The universal statistical/econometric approach to prediction:
Take a stand on a loss function and find/use a predictor that minimizes conditionally expected loss. Note that this is an absolute standard. We minimize loss, not some sort of relative loss.
B. An alternative approach to prediction, common in certain communities/literatures:
Take a stand on a loss function and find/use a predictor that minimizes regret. Note that this is a relative standard. Regret minimization is relative loss minimization, i.e., striving to do no worse than others.
Approach A strikes me as natural and appropriate, whereas B strikes me as as quirky and "behavioral". That is, it seems to me that we generally want tools that perform well, not tools that merely perform no worse than others.
There's also another issue, the ex ante nature of A (standing in the present, conditioning on available information, looking forward) vs. the ex post nature of B (standing in the future, looking backward). Approach A again seems more natural and appropriate.
Sunday, February 5, 2017
Data for the People
Data for the People, by Andreas Weigend, is coming out this week, or maybe it came out last week. Andreas is a leading technologist (at least that's the most accurate one-word description I can think of), and I have valued his insights ever since we were colleagues at NYU almost twenty years ago. Since then he's moved on to many other things; see http://www.weigend.com.
Andreas challenges prevailing views about data creation and "data privacy". Rather than perpetuating a romanticized view of data privacy, he argues that we need increased data transparency, combined with increased data literacy, so that people can take command of their own data. Drawing on his work with numerous firms, he proposes six "data rights":
-- The right to access data
-- The right to amend data
-- The right to blur data
-- The right to port data
-- The right to inspect data refineries
-- The right to experiment with data refineries
Check out Data for the People at http://ourdata.com.
[Acknowledgment: Parts of this post were adapted from the book's web site.]
Andreas challenges prevailing views about data creation and "data privacy". Rather than perpetuating a romanticized view of data privacy, he argues that we need increased data transparency, combined with increased data literacy, so that people can take command of their own data. Drawing on his work with numerous firms, he proposes six "data rights":
-- The right to access data
-- The right to amend data
-- The right to blur data
-- The right to port data
-- The right to inspect data refineries
-- The right to experiment with data refineries
[Acknowledgment: Parts of this post were adapted from the book's web site.]
Subscribe to:
Posts (Atom)