tag:blogger.com,1999:blog-44559727330119454412017-03-23T12:13:57.663-07:00No HesitationsFrancis X. Diebold's BlogFrancis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.comBlogger277125tag:blogger.com,1999:blog-4455972733011945441.post-72942558904658369912017-03-21T07:00:00.000-07:002017-03-23T12:13:57.673-07:00Forecasting and "As-If" Discounting<span style="font-size: large;">Check out the fascinating and creative new paper, "<a href="http://www.nber.org/papers/w23254.pdf">Myopia and Discounting</a>", by Xavier Gabaix and David Laibson.</span><br /><span style="font-size: large;"><br /></span> <span style="font-size: large;">From their abstract (slightly edited):</span><br /><blockquote class="tr_bq"><span style="font-size: large;">We assume that perfectly patient agents estimate the value of future events by generating noisy, unbiased simulations and combining those signals with priors to form posteriors. These posterior expectations exhibit as-if discounting: agents make choices as if they were maximizing a stream of known utils weighted by a discount function. This as-if discount function reflects the fact that estimated utils are a combination of signals and priors, so average expectations are optimally shaded toward the mean of the prior distribution, generating behavior that partially mimics the properties of classical time preferences. When the simulation noise has variance that is linear in the event's horizon, the as-if discount function is hyperbolic.</span></blockquote><span style="font-size: large;">Among other things, then, they provide a rational foundation for the "myopia" associated with hyperbolic discounting.</span><br /><span style="font-size: large;"><br /></span> <span style="font-size: large;">Note that in the Gabaix-Laibson environment everything depends on how forecast error variance behaves as a function of forecast horizon \(h\). But we know a lot about that. For example, in linear covariance-stationary \(I(0)\) environments, optimal forecast error variance grows with \(h\) at a decreasing rate, approaching the unconditional </span><span style="font-size: large;">variance from below. Hence it cannot grow linearly with \(h\), which is what produces hyperbolic as-if discounting. In contrast, in non-stationary \(I(1)\) environments, optimal forecast error variance <i>does</i> eventually grow linearly with \(h\). In a random walk, for example, \(h\)-step-ahead optimal forecast error variance is just \(h \sigma^2\), where \( \sigma^2\) is the innovation variance. </span><span style="font-size: large;">It would be fascinating to put people in \(I(1)\) vs. \(I(0)\) laboratory environments and see if hyperbolic as-if discounting arises in \(I(1)\) cases but not in \(I(0)\) cases.</span><br /><script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script><br />Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-71093442258277265362017-03-19T15:08:00.000-07:002017-03-20T10:35:24.377-07:00ML and Metrics VIII: The New Predictive Econometric Modeling<span style="font-size: large;">[Click on "Machine Learning" at right for earlier "Machine Learning and Econometrics" posts.]</span><span style="font-size: large;"><br style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;" /><br /> We econometricians need -- and have always had -- cross section and time series ("micro econometrics" and "macro/financial econometrics"), causal estimation and predictive modeling, structural and non-structural. And all continue to thrive.<br /><br />But there's a new twist, happening now, making this an unusually exciting time in econometrics. P</span><span style="font-size: large;">redictive</span><span style="font-size: large;"> e</span><span style="font-size: large;">conometric modeling is not only alive and well, but also blossoming anew, this time at the interface of micro-econometrics and machine learning. A fine example is the new Kleinberg, Lakkaraju, Leskovic, Ludwig and <a href="http://scholar.harvard.edu/sendhil">Mullainathan</a> paper, “Human Decisions and Machine Predictions”, <a href="http://scholar.harvard.edu/files/sendhil/files/w23180.pdf">NBER Working Paper 23180</a> (February 2017)</span><span style="font-size: large;">.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">Good predictions promote good decisions, and </span><span style="font-size: large;">e</span><span style="font-size: large;">conometrics is ultimately about helping people to make good decisions. Hence the new developments, driven by advances in machine learning, are most welcome contributions to a long and distinguished </span><span style="font-size: large;">predictive econometric modeling</span><span style="font-size: large;"> </span><span style="font-size: large;">tradition.</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-53320733275930700342017-03-13T05:17:00.000-07:002017-03-13T05:17:15.747-07:00ML and Metrics VII: Cross-Section Non-Linearities<span style="font-size: large;">[Click on "Machine Learning" at right for earlier "Machine Learning and Econometrics" posts.]</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">Th</span><span style="font-size: large;">e predictive modeling perspective needs not only to be </span><span style="font-size: large;">respected and </span><span style="font-size: large;">embraced in econometrics (as <a href="http://fxdiebold.blogspot.com/2017/02/econometrics-angrist-and-pischke-are-at.html">it routinely <i>is</i></a>, notwithstanding the <a href="http://www.nber.org/papers/w23144?utm_campaign=ntw&utm_medium=email&utm_source=ntw">Angrist-Pischke revisionist agenda</a>), but also to be <i>enhanced </i>by incorporating elements of statistical machine learning (ML). This is particularly true for cross-section econometrics insofar as time-series econometrics is already well ahead in that regard. </span><span style="font-size: large;"> For example, </span><span style="font-size: large;">although <a href="http://fxdiebold.blogspot.com/2017/03/machine-learning-and-econometrics-vi.html">flexible non-parametric ML approaches to estimating conditional-mean functions don't add much to time-series econometrics</a>, they may add lots to cross-section econometric regression and classification analyses, where conditional mean functions may be highly nonlinear for a variety of reasons. Of course econometricians are well aware of traditional non-parametric issues/approaches, especially kernel and series methods, and they have made many contributions, but there's still much more to be learned from ML.</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-10242011694030504662017-03-06T12:39:00.002-08:002017-03-10T07:27:01.146-08:00ML and Metrics VI: A Key Difference Between ML and TS Econometrics<span style="font-size: large;">[Click on "Machine Learning" at right for earlier "Machine Learning and Econometrics" posts.]<br /><br /> Continuing:<br /><br /> So then, statistical machine learning (ML) and </span><span style="font-size: large;">time series econometrics (TS) </span><span style="font-size: large;">have lots in common. But there's also an interesting difference: ML's emphasis on flexible nonparametric modeling of conditional-mean nonlinearity doesn't play a big role in TS. </span><br /><span style="font-size: large;"><br /> Of course there are the traditional TS conditional-mean nonlinearities: smooth non-linear trends, seasonal shifts, and so on. But there's very little evidence of important conditional-mean nonlinearity in the covariance-stationary (de-trended, de-seasonalized) dynamics of most economic time series. Not that people haven't tried hard -- really hard -- to find it, with nearest neighbors, neural nets, random forests, and lots more. </span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">So it's no accident that things like linear autoregressions remain overwhelmingly dominant in TS. Indeed I can think of only one type of conditional-mean nonlinearity that has emerged as repeatedly important for (at least some) economic time series: <a href="https://www.ssc.wisc.edu/~bhansen/718/Hamilton1989.pdf">Hamilton-style Markov-switching dynamics</a>.<br /><br /> [Of course there's a non-linear elephant in the room: <a href="http://www.econ.uiuc.edu/~econ536/Papers/engle82.pdf">E</a><a href="http://www.econ.uiuc.edu/~econ536/Papers/engle82.pdf">ngle-style GARCH-type dynamics</a>. They're tremendously important in financial econometrics, and sometimes also in macro-econometrics, but they're about conditional variances, not conditional means.]<br /><br /> So there are basically only two important non-linear models in TS, and only one of them speaks to conditional-mean dynamics. And crucially, they're both very tightly parametric, closely tailored to specialized features of economic and financial data.<br /><br /> Now let's step back and assemble things:<br /><br /> ML emphasizes approximating non-linear conditional-mean functions in highly-flexible non-parametric fashion. That turns out to be doubly unnecessary in TS: There's just not much conditional-mean non-linearity to worry about, and when there occasionally is, it's typically of a highly-specialized nature best approximated in highly-specialized (tightly-parametric) fashion.</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-90919520025890276852017-02-26T10:01:00.001-08:002017-03-03T04:46:11.278-08:00Machine Learning and Econometrics V: Similarities to Time Series<span style="font-size: large;">[Notice that I changed the title from "Machine Learning vs. Econometrics" to "Machine Learning <i>and</i> Econometrics", as the two are complements, not competitors, as this post will begin to emphasize. But I've kept the numbering, so this is number five. For others click on Machine Learning at right.]</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">Thanks for the overwhelming response to <a href="http://fxdiebold.blogspot.com/2017/02/econometrics-angrist-and-pischke-are-at.html">my last post</a>, on Angrist-Pischke (AP). I'll have more to say on AP a few posts from now, but first I need to set the stage.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">A key observation is that statistical machine learning (ML) and time-series econometrics/statistics (TS) are largely about modeling, and they largely have the same foundational perspective. Some of the key ingredients are:</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">-- George Box got it right: "All models are false; some are useful", so search for good approximating models, not "truth".</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">-- Be explicit about the loss function, that is, about what defines a "good approximating model" (e.g., 1-step-ahead out-of-sample mean-squared forecast error)</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">-- Respect and optimize that loss function in model selection (e.g., BIC)</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">-- Respect and optimize that loss function in estimation (e.g., least squares)</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">-- Respect and optimize that loss function in forecast construction (e.g., Wiener-Kolmogorov-Kalman)</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">-- Respect and optimize that loss function in forecast evaluation, comparison, and combination (e.g., Mincer-Zarnowitz evaluations, Diebold-Mariano comparisons, Granger-Ramanathan combinations).</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">So time-series econometrics should <i>embrace</i> ML -- and it <i>is</i>. Just look at <a href="http://www.ssc.upenn.edu/~fdiebold/Warren2017/Program/Program.pdf">recent work like this</a>.</span><br /><br />Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-88730565590693653232017-02-19T16:16:00.001-08:002017-03-21T07:02:46.393-07:00Econometrics: Angrist and Pischke are at it Again<span style="font-size: large;">Check out the new Angrist-Pischke (AP), "<a href="http://www.nber.org/papers/w23144?utm_campaign=ntw&utm_medium=email&utm_source=ntw">Undergraduate Econometrics Instruction: Through Our Classes, Darkly</a>". <br /><br />I guess I have no choice but to weigh in. The issues are important, and my earlier AP post, "<a href="http://fxdiebold.blogspot.com/2015/01/mostly-harmless-econometrics.html">Mostly Harmless Econometrics?</a>", is my all-time most popular.<br /><br />Basically AP want all econometrics texts to look a lot more like theirs. But their books and their new essay unfortunately miss (read: dismiss) half of econometrics. <br /><br />Here's what AP get right:<br /><br />(Goal G1) One of the major goals in econometrics is predicting the effects of exogenous "treatments" or "interventions" or "policies". Phrased in the language of estimation, the question is "If I intervene and give someone a certain treatment \({\partial x}, x \in X\), what is my minimum-MSE estimate of her \(\ \partial y\)?" So we are estimating the partial derivative \({\partial y / \partial x}\).<br /><br />AP argue the virtues and trumpet the successes of a "design-based" approach to G1. In my view they make many good points as regards G1: discontinuity designs, dif-in-dif designs, and other clever modern approaches for approximating random experiments indeed take us far beyond "Stones'-age" approaches to G1. </span><span style="font-size: large;">(AP sure turn a great phrase...)</span><span style="font-size: large;">. And the econometric simplicity of the design-based approach is intoxicating: it's mostly just linear regression of \(y\) on \(x\) and a few cleverly-chosen control variables -- you don't need a full model -- with White-washed standard errors. Nice work if you can get it. And yes, moving forward, any good text should feature a solid chapter on those methods.</span><br /><span style="font-size: large;"><br />Here's what AP miss/dismiss:<br /><br />(Goal G2) The other major goal in econometrics is predicting \(y\). In the language of estimation, the question is "If a new person \(i\) arrives with covariates \(X_i\), what is my minimum-MSE estimate of her \(y_i\)? So we are estimating a conditional mean \(E(y | X) \), which in general is very different from estimating a partial derivative \({\partial y / \partial x}\).<br /><br />The problem with the AP paradigm is that it doesn't work for goal G2. Modeling nonlinear functional form is important, as the conditional mean function \(E(y | X) \) may be highly nonlinear in \(X\); systematic model selection is important, as it's not clear a priori what subset of \(X\) (i.e., what model) might be most important for approximating \(E(y | X) \); detecting and modeling heteroskedasticity is important (in both cross sections and time series), as it's the key to accurate interval and density prediction; detecting and modeling serial correlation is crucially important in time-series contexts, as "the past" is the key conditioning information for predicting "the future"; etc., etc, ... </span><br /><div><span style="font-size: large;"><br /></span></div><div><span style="font-size: large;">(Notice how often "model" and "modeling" appear in the above paragraph. That's precisely what AP dismiss, even in their abstract, which very precisely, and incorrectly, declares that "Applied econometrics ...[now prioritizes]... the estimation of specific causal effects and empirical policy analysis over general models of outcome determination".)<br /><br />The AP approach to goal G2 is to ignore it, in a thinly-veiled attempt to equate econometrics exclusively with G1. Sorry guys, but no one's buying it. That's why the textbooks continue to feature G2 tools and techniques so prominently, as well they should.</span></div><script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script><br /><br /><br />Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-20463757957031506512017-02-13T06:23:00.001-08:002017-02-15T05:11:59.570-08:00Predictive Loss vs. Predictive Regret<span style="font-size: large;">It's interesting to contrast two prediction paradigms.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">A. The universal statistical/econometric approach to prediction: </span><br /><span style="font-size: large;">Take a stand on a loss function and find/use a predictor that minimizes conditionally expected loss. Note that this is an <i>absolute </i>standard. We minimize loss, not some sort of relative loss.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">B. An alternative approach to prediction, common in certain communities/literatures:</span><br /><span style="font-size: large;">Take a stand on a loss function and find/use a predictor that minimizes regret. Note that this is a <i>relative </i>standard. Regret minimization is relative loss minimization, i.e., striving to do no worse than others.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">Approach A strikes me as natural and appropriate, whereas B strikes me as as quirky and "behavioral". That is, it seems to me that we generally want tools that perform well, not tools that merely perform no worse than others.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">There's also another issue, the <i>ex ante</i> nature of A (standing in the present, conditioning on available information, looking forward) vs. the <i>ex post</i> nature of B (standing in the future, looking backward). Approach A again seems more natural and appropriate.</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-74933111159306944322017-02-05T10:49:00.000-08:002017-02-15T05:08:58.294-08:00Data for the People<span style="font-size: large;"><i><a href="http://ourdata.squarespace.com/">Data for the People</a></i>, by Andreas Weigend, is coming out this week, or maybe it came out last week. Andreas is a leading technologist (at least that's the most accurate one-word description I can think of), and I have valued his insights ever since we were colleagues at NYU almost twenty years ago. Since then he's moved on to many other things; see <a href="http://www.weigend.com/">http://www.weigend.com</a>. </span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">Andreas challenges prevailing views about data creation and "data privacy". Rather than perpetuating a romanticized view of data privacy, he argues that we need increased data transparency, combined with increased data literacy, so that people can take command of their own data. Drawing on his work with numerous firms, he proposes six "data rights":</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">-- The right to access data</span><br /><span style="font-size: large;">-- The right to amend data</span><br /><span style="font-size: large;">-- The right to blur data</span><br /><span style="font-size: large;">-- The right to port data</span><br /><span style="font-size: large;">-- The right to inspect data refineries</span><br /><span style="font-size: large;">-- The right to experiment with data refineries</span><br /><div><br /></div><span style="font-size: large;">Check out <i>Data for the People</i> at <a href="http://ourdata.com/">http://ourdata.com</a>.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;"><br /></span><span style="font-size: large;">[Acknowledgment: Parts of this post were adapted from the book's web site.]</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-52084362822939192472017-01-30T05:34:00.000-08:002017-02-15T05:08:12.273-08:00Randomization Tests for Regime Switching<span style="font-size: large;">I have always been fascinated by distribution-free non-parametric tests, or randomization tests, or Monte Carlo tests -- whatever you want to call them. (For example, I used some in ancient work like <a href="http://www.ssc.upenn.edu/~fdiebold/papers2/Diebold-Rudebusch%20(1992).pdf">Diebold-Rudebusch 1992</a>.) They seem almost too good to be true: exact finite-sample tests without distributional assumptions! They also still seem curiously underutilized in econometrics, notwithstanding, for example, the path-breaking and well-known contributions over many decades by <a href="https://dl.dropboxusercontent.com/u/11900540/Web_Site_JMDufour/dufour.html">Jean-Marie Dufour</a>, <a href="http://ecares.ulb.ac.be/index.php?option=com_comprofiler&task=userProfile&user=114&Itemid=263">Marc Hallin</a>, and others.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">For the latest, see the <a href="http://www.cireqmontreal.com/wp-content/uploads/cahiers/15-2016-cah.pdf">fascinating new contribution by Jean-Marie Dufour and Richard Luger</a>. They show how to use randomization to perform simple tests of the null of linearity against the alternative of Markov switching in dynamic environments. That's a very hard problem (nuisance parameters not identified under the null, singular information matrix under the null), and several top researchers have wrestled with it (e.g., <a href="http://econpapers.repec.org/article/ieriecrev/v_3a39_3ay_3a1998_3ai_3a3_3ap_3a763-88.htm">Garcia</a>, <a href="http://www.ssc.wisc.edu/~bhansen/papers/jae_92.pdf">Hansen</a>, <a href="http://onlinelibrary.wiley.com/doi/10.3982/ECTA8609/abstract">Carasco-Hu-Ploberger</a>). Randomization delivers tests that are exact, distribution-free, and <i>simple</i>. And power looks pretty good too. </span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-50893845230278469442017-01-23T04:43:00.000-08:002017-02-15T04:58:21.149-08:00Bayes Stifling Creativity?<span style="font-size: large;">Some twenty years ago, a leading Bayesian econometrician startled me during an office visit at Penn. We were discussing Bayesian vs. frequentist approaches to a few things, when all of a sudden he declared that "There must be something about Bayesian analysis that stifles creativity. It seems that frequentists invent all the great stuff, and Bayesians just trail behind, telling them how to do it right".</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">His </span><span style="font-size: large;">characterization rings true in certain significant respects, which is why it's so funny. But the intellectually interesting thing is that it doesn't have to be that way. As Chris Sims notes in a recent communication: </span><br /><blockquote class="tr_bq"><span style="font-size: large;">...</span><span style="font-size: large;"> frequentists are in the habit of inventing easily computed, intuitively appealing estimators and then deriving their properties without insisting that the method whose properties they derive is optimal. ... Bayesians are more likely to go from model to optimal inference, [but] they don't have to, and [they] ought to work more on Bayesian analysis of methods based on conveniently calculated statistics.</span></blockquote><span style="font-size: large;"><br /></span><span style="font-size: large;">See Chris' thought-provoking unpublished paper draft, "<a href="http://sims.princeton.edu/yftp/UndrstndgNnBsns/GewekeBookChpter.pdf">Understanding Non-Bayesians</a>". </span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">[As noted on <a href="http://www.princeton.edu/~sims/">Chris' web site</a>, he wrote that paper for </span><span style="font-size: large;">the Oxford University Press <i>Handbook of Bayesian Econometrics</i>, but he "withheld [it] from publication there because of the Draconian copyright agreement that OUP insisted on --- forbidding posting even a late draft like this one on a personal web site."] </span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-86513355031212118632017-01-16T08:39:00.000-08:002017-02-15T05:04:45.635-08:00Impulse Responses From Smooth Local Projections<span style="font-size: large;">Check out <a href="https://papers.ssrn.com/sol3/papers2.cfm?abstract_id=2892508">Barnichon-Brownlees (2017)</a> (BB). As proposed and developed in <a href="https://www.aeaweb.org/articles?id=10.1257/0002828053828518">Jorda (2005)</a>, they estimate impulse-response functions (IRF's) directly by projecting outcomes on estimates of structural shocks at various horizons, as opposed to inverting a fitted autoregression. The BB enhancement relative to Jorda is the effective incorporation of a smoothness prior in IRF estimation. (Notice that the traditional approach of inverting a low-ordered autoregression automatically promotes IRF smoothness.) In my view, smoothness is a natural IRF shrinkage direction, and BB convincingly show that it's likely to enhance estimation efficiency relative to Jorda's original approach. I always liked the idea of attempting to go after IRF's directly, and Jorda/BB seems appealing.</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-69423797027523341472017-01-13T05:22:00.003-08:002017-01-13T05:26:34.113-08:00Math Rendering Problem Fixed<span style="font-size: large;">The problem with math rendering in the recent post, <a href="http://fxdiebold.blogspot.com/2017/01/all-of-machine-learning-in-one.html">"All of Machine Learning in One Expression"</a>, is now fixed (I hope). That is, the math should now look like math, not LaTeX code, on all devices. </span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-73025042786134546212017-01-09T07:23:00.001-08:002017-03-13T03:56:51.406-07:00All of Machine Learning in One Expression<span style="font-size: large;"><a href="http://scholar.harvard.edu/sendhil">Sendhil Mullainathan</a> gave an entertaining plenary talk on </span><span style="font-size: large;">machine learning (ML) in finance,</span><span style="font-size: large;"> in Chicago last Saturday at the annual American Finance Association (AFA) meeting. (Many hundreds of people, standing room only -- great to see.) Not much new relative to the posts </span><a href="http://fxdiebold.blogspot.com/search/label/Machine%20learning" style="font-size: x-large;">here</a><span style="font-size: large;">, for example, but he wasn't trying to deliver new results. Rather he was trying to introduce mainstream AFA financial economists to the ML perspective. </span><br /><span style="font-size: large;"><br />[Of course ML perspective and methods have featured prominently in time-series econometrics for many decades, but many of the recent econometric converts to ML (and audience members at the AFA talk) are cross-section types, not used to thinking much about things like out-of-sample predictive accuracy, etc.]<br /><br />Anyway, one cute and memorable thing -- good for teaching -- was Sendhil's suggestion that one can use the canonical penalized estimation problem as a taxonomy for much of ML. Here's my quick attempt at fleshing out that suggestion.<br /><br />Consider estimating a parameter vector \( \theta \) by solving the penalized estimation problem,<br /><br />\( \hat{\theta} = argmin_{\theta} \sum_{i} L (y_i - f(x_i, \theta) ) ~~s.t.~~ \gamma(\theta) \le c , \)<br /><br />or equivalently in Lagrange multiplier form,<br /><br />\( \hat{\theta} = argmin_{\theta} \sum_{i} L (y_i - f(x_i, \theta) ) + \lambda \gamma(\theta) . \)<br /><br />(1) \( f(x_i, \theta) \) is about the modeling strategy (linear, parametric non-linear, non-parametric non-linear (series, trees, nearest-neighbor, kernel, ...)).<br /><br />(2) \( \gamma(\theta) \) is about the type of regularization. (Concave penalty functions non-differentiable at the origin produce selection to zero, smooth convex penalties produce shrinkage toward 0, the LASSO penalty is both concave and convex, so it both selects and shrinks, ...)<br /><br />(3) \( \lambda \) is about the strength of regularization.<br /><br />(4) \( L(y_i - f(x_i, \theta) ) \) is about predictive loss (quadratic, absolute, asymmetric, ...).<br /><br />Many ML schemes emerge as special cases. To take just one well-known example, linear regression with regularization by LASSO and regularization strength chosen to optimize out-of-sample predictive MSE corresponds to (1) \( f(x_i, \theta)\) linear, (2) \( \gamma(\theta) = \sum_j |\theta_j| \), (3) \( \lambda \) cross-validated, and (4) \( L(y_i - f(x_i, \theta) ) = (y_i - f(x_i, \theta) )^2 \).</span><br /><script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script><br />Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-68613431361412722952017-01-03T05:46:00.001-08:002017-02-15T05:00:43.739-08:00Torpedoing Econometric Randomized Controlled Trials<span style="font-size: large;">A very Happy New Year to all! <br /><br />I get no pleasure from torpedoing anything, and "torpedoing" is likely exaggerated, but nevertheless take a look at <a href="https://boringdevelopment.wordpress.com/2014/04/09/a-torpedo-aimed-straight-at-h-m-s-randomista/">"A Torpedo Aimed Straight at HMS Randomista"</a>. It argues that many econometric randomized controlled trials (RCT's) are seriously flawed -- not even <i>internally </i>valid -- due to their failure to use double-blind randomization. At first the non-double-blind critique may sound cheap and obvious, inviting you to roll your eyes and say "get over it". But ultimately it's not.<br /><br /> Note the interesting situation. Everyone these days is worried about <i>external </i>validity (extensibility), under the implicit <i>assumption </i>that internal validity has been achieved (e.g., see this <a href="http://fxdiebold.blogspot.com/2016/12/varieties-of-rct-extensibility.html">earlier post</a>). But the </span><span style="font-size: large;">non-double-blind</span><span style="font-size: large;"> critique makes clear that </span><span style="font-size: large;">e</span><span style="font-size: large;">ven internal validity may be dubious in econometric RCT's as typically implemented.</span><br /><span style="font-size: large;"><br />The underlying research paper, "<a href="https://www.aae.wisc.edu/events/papers/DeptSem/2013/di%20falco.01.25.pdf">Behavioural Responses and the Impact of New Agricultural Technologies: Evidence from a Double-Blind Field Experiment in Tanzania</a>", by Bulte <i>et al</i>., was published in 2014 in the <i>American Journal of Agricultural Economics</i>. Quite an eye-opener</span><span style="font-size: large;">.</span><br /><span style="font-size: large;"><br />Here's the abstract:<br /><br />Randomized controlled trials in the social sciences are typically not double-blind, so participants know they are “treated” and will adjust their behavior accordingly. Such effort responses complicate the assessment of impact. To gauge the potential magnitude of effort responses we implement an open RCT and double-blind trial in rural Tanzania, and randomly allocate modern and traditional cowpea seed-varieties to a sample of farmers. Effort responses can be quantitatively important––for our case they explain the entire “treatment effect on the treated” as measured in a conventional economic RCT. Specifically, harvests are the same for people who know they received the modern seeds and for people who did not know what type of seeds they got, but people who knew they received the traditional seeds did much worse. We also find that most of the behavioral response is unobserved by the analyst, or at least not readily captured using coarse, standard controls.</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-2052738795235839652016-12-18T05:24:00.001-08:002016-12-18T05:27:34.148-08:00Holiday Haze<div style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img alt="File:Happy Holidays (5318408861).jpg" height="265" src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/6a/Happy_Holidays_%285318408861%29.jpg/800px-Happy_Holidays_%285318408861%29.jpg" width="400" /></div><span style="font-size: large;">Your dedicated blogger is about to vanish in the holiday haze, returning early in the new year. Meanwhile, all best wishes for the holidays. If you're at ASSA Chicago, I hope you'll come to the Penn Economics party, Sat. Jan. 7, 6:00-8:00, Sheraton Grand Chicago, Mayfair Room. Thanks so much for your past, present and future support.</span><br /><br /><br /><br />[Photo credit: Public domain, by Marcus Quigmire, from Florida, USA (Happy Holidays Uploaded by Princess Mérida) [CC-BY-SA-2.0 (http://creativecommons.org/licenses/by-sa/2.0)], via Wikimedia Commons]<br /><br /><br /><br />Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-83064914608384184862016-12-11T11:27:00.002-08:002017-01-03T05:35:30.705-08:00Varieties of RCT Extensibility<span style="font-size: large;">Even internally-valid RCT's have issues. They reveal the treatment effect only for the precise experiment performed and situation studied. Consider, for example, a study of the effects of fertilizer on crop yield, done for region X during a heat wave. Even if internally valid, the estimated treatment effect is that of fertilizer on crop yield in region X <i>during a heat wave</i>. The results do not necessarily generalize -- and in this example surely do not generalize -- to times of ``normal" weather, even in region X. And of course, for a variety of reasons, they may not generalize to regions other than X, even in heat waves.<br /><br /> Note the interesting <i>time-series</i> dimension to the failure of external validity (extensibility) in the example above. (The estimate is obtained during this year's heat wave, but next year may be "normal", or "cool". And this despite the lack of any true structural change. But of course there could be true structural change, which would only make matters worse.) This contrasts with the usual <i>cross-sectional</i> focus of extensibility discussions (e.g., we get effect e in region X, but what effect would we get in region Z?) <br /><br /> In essence, we'd like panel data, to account both for cross-section effects and time-series effects, but most RCT's unfortunately have only a single cross section.<br /><br /> Mark Rosenzweig and Chris Udry have a fascinating new paper, "<a href="http://www.econ.yale.edu/~cru2/pdf/evsw.pdf">Extenal Validity in a Stochastic World</a>", that grapples with some of the time-series extensibility issues raised above.</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-13250029154430434232016-12-05T07:17:00.000-08:002016-12-06T04:35:31.389-08:00Exogenous vs. Endogenous Volatility Dynamics<span style="font-size: large;">I always thought putting exogenous volatility dynamics in macro-model shocks was a cop-out. Somehow it seemed more satisfying for volatility to be determined endogenously, in equilibrium. Then I came around: We allow for shocks with exogenous conditional-mean dynamics (e.g., AR(1)), so why shouldn't we allow for shocks with exogenous conditional-volatility dynamics? Now I might shift back, at least in part, thanks to new work by Sydney Ludvigson, Sai Ma, and Serena Ng, "Uncertainty and Business Cycles: Exogenous Impulse or Endogenous Response?", which attempts to sort things out. The October 2016 version is <a href="https://static1.squarespace.com/static/54397369e4b0446f66937a73/t/58040df56b8f5b14de4793d1/1476660726359/ucc.pdf">here</a>. It turns out that real (macro) volatility appears largely endogenous, whereas nominal (financial market) volatility appears largely exogenous. </span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-17960475241021843812016-11-28T04:20:00.000-08:002016-12-01T04:31:56.772-08:00Gary Gorton, Harald Uhlig, and the Great Crisis<span style="font-size: large;"><a href="https://www.amazon.com/Slapped-Invisible-Hand-Management-Association/dp/0199734151/ref=asap_bc?ie=UTF8">Gary Gorton has made clear that the financial crisis of 2007 was in essence a traditional banking panic</a>, not unlike those of the ninetheeth century. A key corollary is that the root cause of the Panic of 2007 can't be something relatively new, like "Too Big to Fail". (See <a href="http://fxdiebold.blogspot.com/2015/08/on-great-financial-panic-of-2007.html">this</a>.) Lots of people blame residential mortgage-backed securities (RMBS's), but they're also too new. Interestingly, <a href="http://economics.sas.upenn.edu/events/tba-money-macro-workshop-17">in new work Juan Ospina and Harald Uhlig examine RBMS's directly</a>. Sure enough, and contrary to popular impression, they performed quite well through the crisis.</span><br /><span style="font-size: large;"><br /></span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-37572953351335408682016-11-20T18:09:00.005-08:002016-11-22T04:57:49.940-08:00Dense Data for Long Memory<span style="font-size: large;">From <a href="http://fxdiebold.blogspot.com/2016/11/big-data-for-volatility-vstrend.html">the last post</a>, you might think that efficient learning about low-frequency phenomena requires tall data. Certainly efficient estimation of trend, as stressed in the last post, <i>does</i> require tall data. But it turns out that efficient estimation of other aspects of low-frequency dynamics sometimes requires only dense data. In particular, consider a pure long memory, or "fractionally integrated", process, \( (1-L)^d x_t = \epsilon_t \), 0 < \( d \) < 1/2. (See, for example, <a href="http://www.sciencedirect.com/science/article/pii/0304407695017321">this</a> or <a href="http://www.springer.com/us/book/9783642355110">this</a>.) In a general \( I(d) \) process, \(d\) governs only low-frequency behavior (the rate of decay of long-lag autocorrelations toward zero, or equivalently, the rate of explosion of low-frequency spectra toward infinity), so tall data are needed for efficient estimation of \(d\). But in a pure long-memory process, one parameter (\(d\)) governs behavior at <i>all</i> frequencies, including arbitrarily low frequencies, due to the self-similarity ("scaling law") of pure long memory. Hence for pure long memory a short but dense sample can be as informative about \(d\) as a tall sample. (And pure long memory often appears to be a highly-accurate approximation to financial asset return volatilities, as for example in <a href="http://www.ssc.upenn.edu/~fdiebold/papers/paper43/abdl4.pdf">ABDL</a>.)</span><br /><script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script><br />Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-30430349232399056992016-11-07T06:04:00.003-08:002016-11-08T11:09:25.725-08:00Big Data for Volatility vs.Trend<span style="font-size: large;">Although largely uninformative for some purposes, <a href="http://fxdiebold.blogspot.com/2016/04/big-data-tall-wide-and-dense.html">dense data</a> (high-frequency sampling) are highly informative for others. The massive example of recent decades is volatility estimation. The basic insight traces at least to <a href="https://en.wikipedia.org/wiki/Robert_C._Merton">Robert Merton's</a> early work. Roughly put, as we sample returns arbitrarily finely, we can infer underlying volatility (quadratic variation) arbitrarily well.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">So, what is it for which dense data are "largely <i>un</i>informative"? The massive example of recent decades is long-term trend. Again roughly put and assuming linearity, long-term trend is effectively a line segment drawn between a sample's first and last observations, so for efficient estimation we need <a href="http://fxdiebold.blogspot.com/2016/04/big-data-tall-wide-and-dense.html">tall data</a> (long calendar span), not dense data.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">Assembling everything, for estimating yesterday's stock-market volatility you'd love to have yesterday's 1-minute intra-day returns, but for estimating the expected return on the stock market (the slope of a linear log-price trend)</span><span style="font-size: large;"> you'd much rather have 100 years of annual returns, despite the fact that a naive count would say that 1 day of 1-minute returns is a much "bigger" sample.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">So different aspects of Big Data -- in this case dense vs. tall -- are of different value for different things. Dense data promote accurate volatility estimation, and tall data promote accurate trend estimation.</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-10015601929226541542016-11-03T10:01:00.002-07:002016-11-08T13:30:00.712-08:00StatPrize<span style="font-size: large;">Check out this new prize, <a href="http://statprize.org/">http://statprize.org/</a> (Thanks, Dave Giles, for informing me via your tweet.) It should be USD 1 Million, ahead of the Nobel, as statistics is a key part (arguably <i>the </i>key part) of the foundation on which every science builds. <br /><br /> And obviously check out <a href="https://www.stats.ox.ac.uk/people/associate_staff/david_cox">David Cox</a>, the first winner. Every time I've given an Oxford econometrics seminar, he has shown up. It's humbling that <i>he </i>evidently thinks he might have something to learn from <i>me</i>. What an amazing scientist, and what an amazing gentleman.<br /><br /> And also obviously, the new StatPrize can't help but remind me of Ted Anderson's recent passing, not to mention the earlier but recent passings, for example, of Herman Wold, Edmond Mallinvaud, and Arnold Zellner. Wow -- sometimes the Stockholm gears just grind too slowly. Moving forward, StatPrize will presumably make such econometric recognition failures less likely.</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-91884835799783076592016-10-31T08:31:00.001-07:002016-11-03T11:31:25.539-07:00Econometric Analysis of Recurrent Events<br /><div style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><span style="font-size: large;"><a href="http://press.princeton.edu/titles/10744.html"><img alt="bookjacket" height="320" src="https://press.princeton.edu/images/j10744.gif" width="206" /></a></span></div><span style="font-size: large;">Don Harding and Adrian Pagan have a <a href="http://press.princeton.edu/titles/10744.html">fascinating new book</a> (HP) that just arrived in the snail mail. Partly HP has a retro feel (think: <a href="http://www.nber.org/chapters/c2148.pdf">Bry-Boshan (BB)</a>) and partly it has a futurist feel (think: taking BB to wildly new places). Notwithstanding the assertion in the conclusion of HP's first chapter (<a href="http://press.princeton.edu/chapters/s10744.pdf">here</a>), I remain of the <a href="http://press.princeton.edu/TOCs/c6636.html">Diebold-Rudebusch view</a></span><span style="font-size: large;"> </span><span style="font-size: large;">that <a href="http://econweb.ucsd.edu/~jhamilto/">Hamilton-style</a> Markov switching remains the most compelling way to think about nonlinear business-cycle events like "expansions" and "recessions" and "peaks" and "troughs". At the very least, however, HP has significantly heightened my awareness and appreciation of alternative approaches. Definitely worth a very serious read.</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-62336304135383577842016-10-24T05:45:00.001-07:002016-11-08T05:15:59.530-08:00Machine Learning vs. Econometrics, IV<span style="font-size: large;">Some of my recent posts on this topic emphasized that (1) machine learning (ML) tends to focus on non-causal prediction, whereas econometrics and statistics (E/S) has both non-causal and causal parts, and (2) E/S tends to be more concerned with probabilistic assessment of forecast uncertainty. Here are some related thoughts.</span><br /><div><span style="font-size: large;"><br /></span></div><div><span style="font-size: large;">As for (1), it's wonderful to see the ML and E/S literatures beginning to cross-fertilize, driven in significant part by E/S. Names like Athey, Chernozukov, and Imbens come immediately to mind. See, for example, the material <a href="https://people.stanford.edu/athey/research#econometric">here</a> under "Econometric Theory and Machine Learning", and <a href="http://web.mit.edu/~vchern/www/#veryhighpredict">here</a> under "Big Data: Post-Selection Inference for Causal Effects" and "Big Data: Prediction Methods". </span><br /><div><span style="font-size: large;"><br /></span></div></div><div><span style="font-size: large;">As for (2) but staying with causal prediction, note that the traditional econometric approach treats causal prediction as an estimation problem (whether by instrumental variables, fully-structural modeling, or whatever...) and focuses not only on point estimates, but also on inference (standard errors, etc.) and hence implicitly on interval prediction of causal effects (by inverting the test statistics). Similarly, <a href="http://www1.american.edu/academic.depts/ksb/finance_realestate/rhauswald/fin673/673mat/MacKinlay%20(1997),%20Event%20Studies%20in%20Economics%20and%20Finance.pdf">the financial-econometric "event study" approach</a>, which directly compares forecasts of what would have happened in the absence of an intervention to what happened <i>with</i> the intervention, also focuses on inference for the treatment effect, and hence implicitly on interval prediction.</span></div>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-27734816210543660522016-10-16T15:22:00.001-07:002016-10-18T08:54:29.967-07:00Machine Learning vs. Econometrics, III<span style="font-size: large;">I emphasized <a href="http://fxdiebold.blogspot.com/2016/10/machine-learning-vs-econometrics-i.html">here</a> that both machine learning (ML) and econometrics (E) prominently feature <i>prediction</i>, one distinction being that ML tends to focus on non-causal prediction, whereas a significant part of E focuses on causal prediction. So they're both focused on prediction, but there's a non-causal vs. causal distinction. [Alternatively, as Dean Foster notes, you can think of both ML and E as focused on <i>estimation</i>, but with different estimands. ML tends to focus on estimating conditional expectations, whereas the causal part of E focuses on estimating partial derivatives.]</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">In any event, there's another key distinction between much of ML and Econmetrics/Statistics (E/S): E/S tends to be more concerned with <i>probabilistic assessment of uncertainty</i>. Whereas ML is often satisfied with <i>point</i> forecasts, E/S often wants <i>interval</i>, and ultimately <i>density</i>, forecasts.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">There are at least two classes of reasons for the difference. </span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">First, E/S recognizes that uncertainty is often of intrinsic economic interest. Think market risk, credit risk, counter-party risk, systemic risk, inflation risk, business cycle risk, etc.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">Second, E/S is evidently uncomfortable with ML's implicit certainty-equivalence approach of simply plugging point forecasts into decision rules obtained under perfect foresight. Evidently the linear-quadratic-Gaussian world in which certainty equivalence holds resonates less than completely with E/S types. That sounds right to me. [By the way, see my earlier piece on <a href="http://fxdiebold.blogspot.com/2014/08/musings-on-optimal-prediction-under.html">optimal prediction under asymmetric loss</a>.]</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-19772012804688258442016-10-10T04:40:00.001-07:002016-12-07T05:21:45.105-08:00Machine Learning vs. Econometrics, II<span style="font-size: large;"><a href="http://fxdiebold.blogspot.com/2016/10/machine-learning-vs-econometrics-i.html">My last post</a> focused on one key distinction between machine learning (ML) and econometrics (E): non-causal ML prediction vs. causal E prediction. I promised later to highlight another, even more important, distinction. I'll get there in the next post.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">But first let me note a key <i>similarity</i>. ML vs. E in terms of non-causal vs. causal prediction is really only comparing ML to "half" of E (the causal part). The other part of E (and of course statistics, so let's call it E/S), going back a century or so, focuses on non-causal prediction, just like ML. The leading example is time-series E/S. Just take a look at an E/S text like Elliott and Timmermann (contents and first chapter <a href="http://press.princeton.edu/titles/10740.html#reviews">here</a>; index <a href="https://www.amazon.com/Economic-Forecasting-Graham-Elliott/dp/0691140138/ref=sr_1_1?ie=UTF8&qid=1475237551&sr=8-1&keywords=elliott+timmermann">here</a>). A lot of it looks like parts of ML. But it's not "E/S people chasing ML ideas"; rather, E/S has been in the game for decades, often well ahead of ML.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">For this reason the E/S crowd sometimes wonders whether "ML" and "data science" are just the same old wine in a new bottle. (The joke goes, Q: What is a "data scientist"? A: A statistician who lives in San Francisco.) ML/DataScience is <i>not </i>the same old wine, but it's a blend, and a significant part of the blend is indeed E/S.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">To be continued...</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0