Sunday, February 19, 2017

Econometrics: Angrist and Pischke are at it Again

Check out the new Angrist-Pischke (AP), "Undergraduate Econometrics Instruction: Through Our Classes, Darkly".

I guess I have no choice but to weigh in. The issues are important, and my earlier AP post, "Mostly Harmless Econometrics?", is my all-time most popular.

Basically AP want all econometrics texts to look a lot more like theirs. But their books and their new essay unfortunately miss (read: dismiss) half of econometrics.

Here's what AP get right:

(G1) One of the major goals in econometrics is predicting the effects of exogenous "treatments" or "interventions" or "policies". Phrased in the language of estimation, the question is "If I intervene and give someone a certain treatment \({\partial x}, x \in X\), what is my minimum-MSE estimate of \(\ \partial y\)?" So we are estimating the partial derivative \({\partial y / \partial x}\).

AP argue the virtues and trumpet the successes of a "design-based" approach to G1. In my view they make many good points as regards G1: discontinuity designs, dif-in-dif designs, and other clever modern approaches for approximating random experiments indeed take us far beyond "Stones'-age" approaches to G1. 
(AP sure turn a great phrase...). And the econometric simplicity of the design-based approach is intoxicating: it's mostly just linear regression of \(y\) on \(x\) and a few cleverly-chosen control variables -- you don't need a full model -- with White-washed standard errors. Nice work if you can get it. And yes, moving forward, any good text should feature a solid chapter on those methods.

Here's what AP miss/dismiss:

(G2) The other major goal in econometrics is predicting \(y\). In the language of estimation, the question is "If a new person \(i\) arrives with covariates \(X_i\), what is my minimum-MSE estimate of her \(y_i\)? So we are estimating a conditional mean \(E(y | X) \), which in general is very different from estimating a partial derivative \({\partial y / \partial x}\).

The problem with the AP paradigm is that it doesn't work for goal G2. Modeling nonlinear functional form is important, as the conditional mean function \(E(y | X) \) may be highly nonlinear in \(X\); systematic model selection is important, as it's not clear a priori what subset of \(X\) (i.e., what model) might be most important for approximating \(E(y | X) \); detecting and modeling heteroskedasticity is important (in both cross sections and time series), as it's the key to accurate interval and density prediction; detecting and modeling serial correlation is crucially important in time-series contexts, as "the past" is the key conditioning information for predicting "the future"; etc., etc, ... 


(Notice how often "model" and "modeling" appear in the above paragraph. That's precisely what AP dismiss, even in their abstract, which very precisely, and incorrectly, declares that "Applied econometrics ...[now prioritizes]... the estimation of specific causal effects and empirical policy analysis over general models of outcome determination".)

The AP approach to goal G2 is to ignore it, in a thinly-veiled attempt to equate econometrics exclusively with G1. Sorry guys, but no one's buying it. That's why the textbooks continue to feature G2 tools and techniques so prominently, as well they should.

Monday, February 13, 2017

Predictive Loss vs. Predictive Regret

It's interesting  to contrast two prediction paradigms.

A.  The universal statistical/econometric approach to prediction:  
Take a stand on a loss function and find/use a predictor that minimizes conditionally expected loss.  Note that this is an absolute standard.  We minimize loss, not some sort of relative loss.

B.  An alternative approach to prediction, common in certain communities/literatures:
Take a stand on a loss function and find/use a predictor that minimizes regret.  Note that this is a relative standard.  Regret minimization is relative loss minimization, i.e., striving to do no worse than others.

Approach A strikes me as natural and appropriate, whereas B strikes me as as quirky and "behavioral".  That is, it seems to me that we generally want tools that perform well, not tools that merely perform no worse than others.

There's also another issue, the ex ante nature of A (standing in the present, conditioning on available information, looking forward) vs. the ex post nature of B (standing in the future, looking backward).  Approach A again seems more natural and appropriate.

Sunday, February 5, 2017

Data for the People

Data for the People, by Andreas Weigend, is coming out this week, or maybe it came out last week. Andreas is a leading technologist (at least that's the most accurate one-word description I can think of), and I have valued his insights ever since we were colleagues at NYU almost twenty years ago. Since then he's moved on to many other things; see http://www.weigend.com

Andreas challenges prevailing views about data creation and "data privacy". Rather than perpetuating a romanticized view of data privacy, he argues that we need increased data transparency, combined with increased data literacy, so that people can take command of their own data. Drawing on his work with numerous firms, he proposes six "data rights":

-- The right to access data
-- The right to amend data
-- The right to blur data
-- The right to port data
-- The right to inspect data refineries
-- The right to experiment with data refineries

Check out Data for the People at http://ourdata.com.


[Acknowledgment: Parts of this post were adapted from the book's web site.]

Monday, January 30, 2017

Randomization Tests for Regime Switching

I have always been fascinated by distribution-free non-parametric tests, or randomization tests, or Monte Carlo tests -- whatever you want to call them.  (For example, I used some in ancient work like Diebold-Rudebusch 1992.)  They seem almost too good to be true: exact finite-sample tests without distributional assumptions!  They also still seem curiously underutilized in econometrics, notwithstanding, for example, the path-breaking and well-known contributions over many decades by Jean-Marie Dufour, Marc Hallin, and others.

For the latest, see the fascinating new contribution by Jean-Marie Dufour and Richard Luger. They show how to use randomization to perform simple tests of the null of linearity against the alternative of Markov switching in dynamic environments.  That's a very hard problem (nuisance parameters not identified under the null, singular information matrix under the null), and several top researchers have wrestled with it (e.g., GarciaHansen, Carasco-Hu-Ploberger). Randomization delivers tests that are exact, distribution-free, and simple. And power looks pretty good too. 

Monday, January 23, 2017

Bayes Stifling Creativity?

Some twenty years ago, a leading Bayesian econometrician startled me during an office visit at Penn. We were discussing Bayesian vs. frequentist approaches to a few things, when all of a sudden he declared that "There must be something about Bayesian analysis that stifles creativity.  It seems that frequentists invent all the great stuff, and Bayesians just trail behind, telling them how to do it right".

His characterization rings true in certain significant respects, which is why it's so funny.  But the intellectually interesting thing is that it doesn't have to be that way.  As Chris Sims notes in a recent communication: 
... frequentists are in the habit of inventing easily computed, intuitively appealing estimators and then deriving their properties without insisting that the method whose properties they derive is optimal.  ... Bayesians are more likely to go from model to optimal inference, [but] they don't have to, and [they] ought to work more on Bayesian analysis of methods based on conveniently calculated statistics.

See Chris' thought-provoking unpublished paper draft, "Understanding Non-Bayesians". 

[As noted on Chris' web site, he wrote that paper for the Oxford University Press Handbook of Bayesian Econometrics, but he "withheld [it] from publication there because of the Draconian copyright agreement that OUP insisted on --- forbidding posting even a late draft like this one on a personal web site."] 

Monday, January 16, 2017

Impulse Responses From Smooth Local Projections

Check out Barnichon-Brownlees (2017) (BB).  As proposed and developed in Jorda (2005), they estimate impulse-response functions (IRF's) directly by projecting outcomes on estimates of structural shocks at various horizons, as opposed to inverting a fitted autoregression.  The BB enhancement relative to Jorda is the effective incorporation of a smoothness prior in IRF estimation.  (Notice that the traditional approach of inverting a low-ordered autoregression automatically promotes IRF smoothness.)  In my view, smoothness is a natural IRF shrinkage direction, and BB convincingly show that it's likely to enhance estimation efficiency relative to Jorda's original approach. I always liked the idea of attempting to go after IRF's directly, and Jorda/BB seems appealing.

Friday, January 13, 2017

Math Rendering Problem Fixed

The problem with math rendering in the recent post, "All of Machine Learning in One Expression", is now fixed (I hope).  That is, the math should now look like math, not LaTeX code, on all devices. 

Monday, January 9, 2017

All of Machine Learning in One Expression

Sendhil Mullainathan gave an entertaining plenary talk on machine learning (ML) in finance, in Chicago last Saturday at the annual American Finance Association (AFA) meeting. (Many hundreds of people, standing room only -- great to see.) Not much new relative to the posts here, for example, but he wasn't trying to deliver new results. Rather he was trying to introduce mainstream AFA financial economists to the ML perspective. 

[Of course ML perspective and methods have featured prominently in time-series econometrics for many decades, but many of the recent econometric converts to ML (and audience members at the AFA talk) are cross-section types, not used to thinking much about things like out-of-sample predictive accuracy, etc.]

Anyway, one cute and memorable thing -- good for teaching -- was Sendhil's suggestion that one can use the canonical penalized estimation problem as a taxonomy for much of ML.  Here's my quick attempt at fleshing out that suggestion.

Consider estimating a parameter vector \( \theta \) by solving the penalized estimation problem,

\( \hat{\theta} = argmin_{\theta} \sum_{i} L (y_i - f(x_i, \theta) ) ~~s.t.~~ \gamma(\theta) \le c , \)

or equivalently in Lagrange multiplier form,

\( \hat{\theta} = argmin_{\theta} \sum_{i} L (y_i - f(x_i, \theta) ) + \lambda \gamma(\theta) . \)

(1) \( f(x_i, \theta) \) is about the modeling strategy (linear, parametric non-linear, non-parametric non-linear (series, trees, nearest-neighbor, kernel, ...)).

(2) \( \gamma(\theta) \) is about the type of regularization. (Concave penalty functions non-differentiable at the origin produce selection to zero, smooth convex penalties produce shrinkage toward 0, the LASSO penalty is both concave and convex, so it both selects and shrinks, ...)

(3) \( \lambda \) is about the strength of regularization.

(4) \( L(y_i - f(x_i, \theta) ) \) is about predictive loss (quadratic, absolute, asymmetric, ...).

Many ML schemes emerge as special cases. To take just one well-known example, linear regression with regularization by LASSO and regularization strength chosen to optimize out-of-sample predictive MSE corresponds to (1) \( f(x_i, \theta)\) linear, (2) \( \gamma(\theta) = \sum_j |\theta_j| \), (3) \( \lambda \) cross-validated, and (4) \( L(y_i - f(x_i, \theta) ) = (y_i - f(x_i, \theta) )^2 \).


Tuesday, January 3, 2017

Torpedoing Econometric Randomized Controlled Trials

A very Happy New Year to all!

I get no pleasure from torpedoing anything, and "torpedoing" is likely exaggerated, but nevertheless take a look at "A Torpedo Aimed Straight at HMS Randomista". It argues that many econometric randomized controlled trials (RCT's) are seriously flawed -- not even internally valid -- due to their failure to use double-blind randomization. At first the non-double-blind critique may sound cheap and obvious, inviting you to roll your eyes and say "get over it". But ultimately it's not.

Note the interesting situation. Everyone these days is worried about external validity (extensibility), under the implicit assumption that internal validity has been achieved (e.g., see this earlier post). But the 
non-double-blind critique makes clear that even internal validity may be dubious in econometric RCT's as typically implemented.

The underlying research paper, "Behavioural Responses and the Impact of New Agricultural Technologies: Evidence from a Double-Blind Field Experiment in Tanzania", by Bulte et al., was published in 2014 in the American Journal of Agricultural Economics. Quite an eye-opener
.

Here's the abstract:

Randomized controlled trials in the social sciences are typically not double-blind, so participants know they are “treated” and will adjust their behavior accordingly. Such effort responses complicate the assessment of impact. To gauge the potential magnitude of effort responses we implement an open RCT and double-blind trial in rural Tanzania, and randomly allocate modern and traditional cowpea seed-varieties to a sample of farmers. Effort responses can be quantitatively important––for our case they explain the entire “treatment effect on the treated” as measured in a conventional economic RCT. Specifically, harvests are the same for people who know they received the modern seeds and for people who did not know what type of seeds they got, but people who knew they received the traditional seeds did much worse. We also find that most of the behavioral response is unobserved by the analyst, or at least not readily captured using coarse, standard controls.

Sunday, December 18, 2016

Holiday Haze

File:Happy Holidays (5318408861).jpg
Your dedicated blogger is about to vanish in the holiday haze, returning early in the new year. Meanwhile, all best wishes for the holidays.  If you're at ASSA Chicago, I hope you'll come to the Penn Economics party, Sat. Jan. 7, 6:00-8:00, Sheraton Grand Chicago, Mayfair Room.  Thanks so much for your past, present and future support.



[Photo credit:  Public domain, by Marcus Quigmire, from Florida, USA (Happy Holidays  Uploaded by Princess Mérida) [CC-BY-SA-2.0 (http://creativecommons.org/licenses/by-sa/2.0)], via Wikimedia Commons]