tag:blogger.com,1999:blog-44559727330119454412017-01-22T03:39:32.054-08:00No HesitationsFrancis X. Diebold's BlogFrancis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.comBlogger267125tag:blogger.com,1999:blog-4455972733011945441.post-86513355031212118632017-01-16T08:39:00.000-08:002017-01-19T07:41:24.886-08:00Impulse Responses From Smooth Local Projections<span style="font-size: large;">Check out <a href="https://papers.ssrn.com/sol3/papers2.cfm?abstract_id=2892508">Barnichon-Brownlees (2017)</a> (BB). As proposed and developed in <a href="https://www.aeaweb.org/articles?id=10.1257/0002828053828518">Jorda (2005)</a>, they estimate impulse-response functions (IRF's) directly by projecting outcomes on estimates of structural shocks at various horizons, as opposed to inverting a fitted autoregression. The BB enhancement relative to Jorda is the effective incorporation of a smoothness prior in IRF estimation. (Notice that the traditional approach of inverting a low-ordered autoregression automatically promotes IRF smoothness.) In my view, smoothness is a natural IRF shrinkage direction, and BB convincingly show that it's likely to enhance estimation efficiency relative to Jorda's original approach. I always liked the idea of attempting to go after IRF's directly, and Jorda/BB seems appealing.</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-69423797027523341472017-01-13T05:22:00.003-08:002017-01-13T05:26:34.113-08:00Math Rendering Problem Fixed<span style="font-size: large;">The problem with math rendering in the recent post, <a href="http://fxdiebold.blogspot.com/2017/01/all-of-machine-learning-in-one.html">"All of Machine Learning in One Expression"</a>, is now fixed (I hope). That is, the math should now look like math, not LaTeX code, on all devices. </span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-73025042786134546212017-01-09T07:23:00.001-08:002017-01-20T05:54:37.208-08:00All of Machine Learning in One Expression<span style="font-size: large;"><a href="http://scholar.harvard.edu/sendhil">Sendhil Mullainathan</a> gave an entertaining plenary talk on </span><span style="font-size: large;">machine learning (ML) in finance,</span><span style="font-size: large;"> in Chicago last Saturday at the annual American Finance Association (AFA) meeting. (Many hundreds of people, standing room only -- great to see.) Not much new relative to the posts </span><a href="http://fxdiebold.blogspot.com/search/label/Machine%20learning" style="font-size: x-large;">here</a><span style="font-size: large;">, for example, but he wasn't trying to deliver new results. Rather he was trying to introduce mainstream AFA financial economists to the ML perspective. </span><br /><span style="font-size: large;"><br />[Of course ML perspective and methods have featured prominently in time-series econometrics for many decades, but many of the recent econometric converts to ML (and audience members at the AFA talk) are cross-section types, not used to thinking much about things like out-of-sample predictive accuracy, etc.]<br /><br />Anyway, one cute and memorable thing -- good for teaching -- was Sendhil's suggestion that one can use the canonical penalized estimation problem as a taxonomy for much of ML. Here's my quick attempt at fleshing out that suggestion.<br /><br />Consider estimating a parameter vector \( \theta \) by solving the penalized estimation problem,<br /><br />\( \hat{\theta} = argmin_{\theta} \sum_{i} L (y_i - f(x_i, \theta) ) ~~s.t.~~ \gamma(\theta) \le c , \)<br /><br />or equivalently in Lagrange multiplier form,<br /><br />\( \hat{\theta} = argmin_{\theta} \sum_{i} L (y_i - f(x_i, \theta) ) + \lambda \gamma(\theta) . \)<br /><br />(1) \( f(x_i, \theta) \) is about the modeling strategy (linear, parametric non-linear, non-parametric non-linear (series, trees, nearest-neighbor, kernel, ...)).<br /><br />(2) \( \gamma(\theta) \) is about the type of regularization. (Concave penalty functions non-differentiable at the origin produce selection to zero, smooth convex penalties produce shrinkage toward 0, the LASSO penalty is both concave and convex, so it both selects and shrinks, ...)<br /><br />(3) \( \lambda \) is about the strength of regularization.<br /><br />(4) \( L(y_i - f(x_i, \theta) ) \) is about predictive loss (quadratic, absolute, asymmetric, ...).<br /><br />Many ML schemes emerge as special cases. To take just one well-known example, linear regression with regularization by LASSO and regularization strength chosen to optimize out-of-sample predictive MSE corresponds to (1) \( f(x_i, \theta)\) linear, (2) \( \gamma(\theta) = \sum_j |\theta_j| \), (3) \( \lambda \) cross-validated, and (4) \( L(y_i - f(x_i, \theta) ) = (y_i - f(x_i, \theta) )^2 \).</span><br /><script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script><br />Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-68613431361412722952017-01-03T05:46:00.001-08:002017-01-04T04:50:44.570-08:00Torpedoing Econometric Randomized Controlled Trials<span style="font-size: large;">A very Happy New Year to all! <br /><br />I get no pleasure from torpedoing anything, and "torpedoing" is likely exaggerated, but nevertheless take a look at <a href="https://boringdevelopment.wordpress.com/2014/04/09/a-torpedo-aimed-straight-at-h-m-s-randomista/">"A Torpedo Aimed Straight at HMS Randomista"</a>. It argues that many econometric randomized controlled trials (RCT's) are seriously flawed -- not even <i>internally </i>valid -- due to their failure to use double-blind randomization. At first the non-double-blind critique may sound cheap and obvious, inviting you to roll your eyes and say "get over it". But ultimately it's not.<br /><br /> Note the interesting situation. Everyone these days is worried about <i>external </i>validity (extensibility), under the implicit <i>assumption </i>that internal validity has been achieved (e.g., see this <a href="http://fxdiebold.blogspot.com/2016/12/varieties-of-rct-extensibility.html">earlier post</a>). But the </span><span style="font-size: large;">non-double-blind</span><span style="font-size: large;"> critique makes clear that </span><span style="font-size: large;">e</span><span style="font-size: large;">ven internal validity may be dubious in econometric RCT's as typically implemented.</span><br /><span style="font-size: large;"><br />The underlying research paper, "<a href="https://www.aae.wisc.edu/events/papers/DeptSem/2013/di%20falco.01.25.pdf">Behavioural Responses and the Impact of New Agricultural Technologies: Evidence from a Double-Blind Field Experiment in Tanzania</a>", by Bulte <i>et al</i>., was published in 2014 in the <i>American Journal of Agricultural Economics</i>. Quite an eye-opener</span><span style="font-size: large;">.</span><br /><span style="font-size: large;"><br />Here's the abstract:<br /><br />Randomized controlled trials in the social sciences are typically not double-blind, so participants know they are “treated” and will adjust their behavior accordingly. Such effort responses complicate the assessment of impact. To gauge the potential magnitude of effort responses we implement an open RCT and double-blind trial in rural Tanzania, and randomly allocate modern and traditional cowpea seed-varieties to a sample of farmers. Effort responses can be quantitatively important––for our case they explain the entire “treatment effect on the treated” as measured in a conventional economic RCT. Specifically, harvests are the same for people who know they received the modern seeds and for people who did not know what type of seeds they got, but people who knew they received the traditional seeds did much worse. We also find that most of the behavioral response is unobserved by the analyst, or at least not readily captured using coarse, standard controls.</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-2052738795235839652016-12-18T05:24:00.001-08:002016-12-18T05:27:34.148-08:00Holiday Haze<div style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img alt="File:Happy Holidays (5318408861).jpg" height="265" src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/6a/Happy_Holidays_%285318408861%29.jpg/800px-Happy_Holidays_%285318408861%29.jpg" width="400" /></div><span style="font-size: large;">Your dedicated blogger is about to vanish in the holiday haze, returning early in the new year. Meanwhile, all best wishes for the holidays. If you're at ASSA Chicago, I hope you'll come to the Penn Economics party, Sat. Jan. 7, 6:00-8:00, Sheraton Grand Chicago, Mayfair Room. Thanks so much for your past, present and future support.</span><br /><br /><br /><br />[Photo credit: Public domain, by Marcus Quigmire, from Florida, USA (Happy Holidays Uploaded by Princess Mérida) [CC-BY-SA-2.0 (http://creativecommons.org/licenses/by-sa/2.0)], via Wikimedia Commons]<br /><br /><br /><br />Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-83064914608384184862016-12-11T11:27:00.002-08:002017-01-03T05:35:30.705-08:00Varieties of RCT Extensibility<span style="font-size: large;">Even internally-valid RCT's have issues. They reveal the treatment effect only for the precise experiment performed and situation studied. Consider, for example, a study of the effects of fertilizer on crop yield, done for region X during a heat wave. Even if internally valid, the estimated treatment effect is that of fertilizer on crop yield in region X <i>during a heat wave</i>. The results do not necessarily generalize -- and in this example surely do not generalize -- to times of ``normal" weather, even in region X. And of course, for a variety of reasons, they may not generalize to regions other than X, even in heat waves.<br /><br /> Note the interesting <i>time-series</i> dimension to the failure of external validity (extensibility) in the example above. (The estimate is obtained during this year's heat wave, but next year may be "normal", or "cool". And this despite the lack of any true structural change. But of course there could be true structural change, which would only make matters worse.) This contrasts with the usual <i>cross-sectional</i> focus of extensibility discussions (e.g., we get effect e in region X, but what effect would we get in region Z?) <br /><br /> In essence, we'd like panel data, to account both for cross-section effects and time-series effects, but most RCT's unfortunately have only a single cross section.<br /><br /> Mark Rosenzweig and Chris Udry have a fascinating new paper, "<a href="http://www.econ.yale.edu/~cru2/pdf/evsw.pdf">Extenal Validity in a Stochastic World</a>", that grapples with some of the time-series extensibility issues raised above.</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-13250029154430434232016-12-05T07:17:00.000-08:002016-12-06T04:35:31.389-08:00Exogenous vs. Endogenous Volatility Dynamics<span style="font-size: large;">I always thought putting exogenous volatility dynamics in macro-model shocks was a cop-out. Somehow it seemed more satisfying for volatility to be determined endogenously, in equilibrium. Then I came around: We allow for shocks with exogenous conditional-mean dynamics (e.g., AR(1)), so why shouldn't we allow for shocks with exogenous conditional-volatility dynamics? Now I might shift back, at least in part, thanks to new work by Sydney Ludvigson, Sai Ma, and Serena Ng, "Uncertainty and Business Cycles: Exogenous Impulse or Endogenous Response?", which attempts to sort things out. The October 2016 version is <a href="https://static1.squarespace.com/static/54397369e4b0446f66937a73/t/58040df56b8f5b14de4793d1/1476660726359/ucc.pdf">here</a>. It turns out that real (macro) volatility appears largely endogenous, whereas nominal (financial market) volatility appears largely exogenous. </span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-17960475241021843812016-11-28T04:20:00.000-08:002016-12-01T04:31:56.772-08:00Gary Gorton, Harald Uhlig, and the Great Crisis<span style="font-size: large;"><a href="https://www.amazon.com/Slapped-Invisible-Hand-Management-Association/dp/0199734151/ref=asap_bc?ie=UTF8">Gary Gorton has made clear that the financial crisis of 2007 was in essence a traditional banking panic</a>, not unlike those of the ninetheeth century. A key corollary is that the root cause of the Panic of 2007 can't be something relatively new, like "Too Big to Fail". (See <a href="http://fxdiebold.blogspot.com/2015/08/on-great-financial-panic-of-2007.html">this</a>.) Lots of people blame residential mortgage-backed securities (RMBS's), but they're also too new. Interestingly, <a href="http://economics.sas.upenn.edu/events/tba-money-macro-workshop-17">in new work Juan Ospina and Harald Uhlig examine RBMS's directly</a>. Sure enough, and contrary to popular impression, they performed quite well through the crisis.</span><br /><span style="font-size: large;"><br /></span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-37572953351335408682016-11-20T18:09:00.005-08:002016-11-22T04:57:49.940-08:00Dense Data for Long Memory<span style="font-size: large;">From <a href="http://fxdiebold.blogspot.com/2016/11/big-data-for-volatility-vstrend.html">the last post</a>, you might think that efficient learning about low-frequency phenomena requires tall data. Certainly efficient estimation of trend, as stressed in the last post, <i>does</i> require tall data. But it turns out that efficient estimation of other aspects of low-frequency dynamics sometimes requires only dense data. In particular, consider a pure long memory, or "fractionally integrated", process, \( (1-L)^d x_t = \epsilon_t \), 0 < \( d \) < 1/2. (See, for example, <a href="http://www.sciencedirect.com/science/article/pii/0304407695017321">this</a> or <a href="http://www.springer.com/us/book/9783642355110">this</a>.) In a general \( I(d) \) process, \(d\) governs only low-frequency behavior (the rate of decay of long-lag autocorrelations toward zero, or equivalently, the rate of explosion of low-frequency spectra toward infinity), so tall data are needed for efficient estimation of \(d\). But in a pure long-memory process, one parameter (\(d\)) governs behavior at <i>all</i> frequencies, including arbitrarily low frequencies, due to the self-similarity ("scaling law") of pure long memory. Hence for pure long memory a short but dense sample can be as informative about \(d\) as a tall sample. (And pure long memory often appears to be a highly-accurate approximation to financial asset return volatilities, as for example in <a href="http://www.ssc.upenn.edu/~fdiebold/papers/paper43/abdl4.pdf">ABDL</a>.)</span><br /><script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script><br />Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-30430349232399056992016-11-07T06:04:00.003-08:002016-11-08T11:09:25.725-08:00Big Data for Volatility vs.Trend<span style="font-size: large;">Although largely uninformative for some purposes, <a href="http://fxdiebold.blogspot.com/2016/04/big-data-tall-wide-and-dense.html">dense data</a> (high-frequency sampling) are highly informative for others. The massive example of recent decades is volatility estimation. The basic insight traces at least to <a href="https://en.wikipedia.org/wiki/Robert_C._Merton">Robert Merton's</a> early work. Roughly put, as we sample returns arbitrarily finely, we can infer underlying volatility (quadratic variation) arbitrarily well.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">So, what is it for which dense data are "largely <i>un</i>informative"? The massive example of recent decades is long-term trend. Again roughly put and assuming linearity, long-term trend is effectively a line segment drawn between a sample's first and last observations, so for efficient estimation we need <a href="http://fxdiebold.blogspot.com/2016/04/big-data-tall-wide-and-dense.html">tall data</a> (long calendar span), not dense data.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">Assembling everything, for estimating yesterday's stock-market volatility you'd love to have yesterday's 1-minute intra-day returns, but for estimating the expected return on the stock market (the slope of a linear log-price trend)</span><span style="font-size: large;"> you'd much rather have 100 years of annual returns, despite the fact that a naive count would say that 1 day of 1-minute returns is a much "bigger" sample.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">So different aspects of Big Data -- in this case dense vs. tall -- are of different value for different things. Dense data promote accurate volatility estimation, and tall data promote accurate trend estimation.</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-10015601929226541542016-11-03T10:01:00.002-07:002016-11-08T13:30:00.712-08:00StatPrize<span style="font-size: large;">Check out this new prize, <a href="http://statprize.org/">http://statprize.org/</a> (Thanks, Dave Giles, for informing me via your tweet.) It should be USD 1 Million, ahead of the Nobel, as statistics is a key part (arguably <i>the </i>key part) of the foundation on which every science builds. <br /><br /> And obviously check out <a href="https://www.stats.ox.ac.uk/people/associate_staff/david_cox">David Cox</a>, the first winner. Every time I've given an Oxford econometrics seminar, he has shown up. It's humbling that <i>he </i>evidently thinks he might have something to learn from <i>me</i>. What an amazing scientist, and what an amazing gentleman.<br /><br /> And also obviously, the new StatPrize can't help but remind me of Ted Anderson's recent passing, not to mention the earlier but recent passings, for example, of Herman Wold, Edmond Mallinvaud, and Arnold Zellner. Wow -- sometimes the Stockholm gears just grind too slowly. Moving forward, StatPrize will presumably make such econometric recognition failures less likely.</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-91884835799783076592016-10-31T08:31:00.001-07:002016-11-03T11:31:25.539-07:00Econometric Analysis of Recurrent Events<br /><div style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><span style="font-size: large;"><a href="http://press.princeton.edu/titles/10744.html"><img alt="bookjacket" height="320" src="https://press.princeton.edu/images/j10744.gif" width="206" /></a></span></div><span style="font-size: large;">Don Harding and Adrian Pagan have a <a href="http://press.princeton.edu/titles/10744.html">fascinating new book</a> (HP) that just arrived in the snail mail. Partly HP has a retro feel (think: <a href="http://www.nber.org/chapters/c2148.pdf">Bry-Boshan (BB)</a>) and partly it has a futurist feel (think: taking BB to wildly new places). Notwithstanding the assertion in the conclusion of HP's first chapter (<a href="http://press.princeton.edu/chapters/s10744.pdf">here</a>), I remain of the <a href="http://press.princeton.edu/TOCs/c6636.html">Diebold-Rudebusch view</a></span><span style="font-size: large;"> </span><span style="font-size: large;">that <a href="http://econweb.ucsd.edu/~jhamilto/">Hamilton-style</a> Markov switching remains the most compelling way to think about nonlinear business-cycle events like "expansions" and "recessions" and "peaks" and "troughs". At the very least, however, HP has significantly heightened my awareness and appreciation of alternative approaches. Definitely worth a very serious read.</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-62336304135383577842016-10-24T05:45:00.001-07:002016-11-08T05:15:59.530-08:00Machine Learning vs. Econometrics, IV<span style="font-size: large;">Some of my recent posts on this topic emphasized that (1) machine learning (ML) tends to focus on non-causal prediction, whereas econometrics and statistics (E/S) has both non-causal and causal parts, and (2) E/S tends to be more concerned with probabilistic assessment of forecast uncertainty. Here are some related thoughts.</span><br /><div><span style="font-size: large;"><br /></span></div><div><span style="font-size: large;">As for (1), it's wonderful to see the ML and E/S literatures beginning to cross-fertilize, driven in significant part by E/S. Names like Athey, Chernozukov, and Imbens come immediately to mind. See, for example, the material <a href="https://people.stanford.edu/athey/research#econometric">here</a> under "Econometric Theory and Machine Learning", and <a href="http://web.mit.edu/~vchern/www/#veryhighpredict">here</a> under "Big Data: Post-Selection Inference for Causal Effects" and "Big Data: Prediction Methods". </span><br /><div><span style="font-size: large;"><br /></span></div></div><div><span style="font-size: large;">As for (2) but staying with causal prediction, note that the traditional econometric approach treats causal prediction as an estimation problem (whether by instrumental variables, fully-structural modeling, or whatever...) and focuses not only on point estimates, but also on inference (standard errors, etc.) and hence implicitly on interval prediction of causal effects (by inverting the test statistics). Similarly, <a href="http://www1.american.edu/academic.depts/ksb/finance_realestate/rhauswald/fin673/673mat/MacKinlay%20(1997),%20Event%20Studies%20in%20Economics%20and%20Finance.pdf">the financial-econometric "event study" approach</a>, which directly compares forecasts of what would have happened in the absence of an intervention to what happened <i>with</i> the intervention, also focuses on inference for the treatment effect, and hence implicitly on interval prediction.</span></div>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-27734816210543660522016-10-16T15:22:00.001-07:002016-10-18T08:54:29.967-07:00Machine Learning vs. Econometrics, III<span style="font-size: large;">I emphasized <a href="http://fxdiebold.blogspot.com/2016/10/machine-learning-vs-econometrics-i.html">here</a> that both machine learning (ML) and econometrics (E) prominently feature <i>prediction</i>, one distinction being that ML tends to focus on non-causal prediction, whereas a significant part of E focuses on causal prediction. So they're both focused on prediction, but there's a non-causal vs. causal distinction. [Alternatively, as Dean Foster notes, you can think of both ML and E as focused on <i>estimation</i>, but with different estimands. ML tends to focus on estimating conditional expectations, whereas the causal part of E focuses on estimating partial derivatives.]</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">In any event, there's another key distinction between much of ML and Econmetrics/Statistics (E/S): E/S tends to be more concerned with <i>probabilistic assessment of uncertainty</i>. Whereas ML is often satisfied with <i>point</i> forecasts, E/S often wants <i>interval</i>, and ultimately <i>density</i>, forecasts.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">There are at least two classes of reasons for the difference. </span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">First, E/S recognizes that uncertainty is often of intrinsic economic interest. Think market risk, credit risk, counter-party risk, systemic risk, inflation risk, business cycle risk, etc.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">Second, E/S is evidently uncomfortable with ML's implicit certainty-equivalence approach of simply plugging point forecasts into decision rules obtained under perfect foresight. Evidently the linear-quadratic-Gaussian world in which certainty equivalence holds resonates less than completely with E/S types. That sounds right to me. [By the way, see my earlier piece on <a href="http://fxdiebold.blogspot.com/2014/08/musings-on-optimal-prediction-under.html">optimal prediction under asymmetric loss</a>.]</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-19772012804688258442016-10-10T04:40:00.001-07:002016-12-07T05:21:45.105-08:00Machine Learning vs. Econometrics, II<span style="font-size: large;"><a href="http://fxdiebold.blogspot.com/2016/10/machine-learning-vs-econometrics-i.html">My last post</a> focused on one key distinction between machine learning (ML) and econometrics (E): non-causal ML prediction vs. causal E prediction. I promised later to highlight another, even more important, distinction. I'll get there in the next post.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">But first let me note a key <i>similarity</i>. ML vs. E in terms of non-causal vs. causal prediction is really only comparing ML to "half" of E (the causal part). The other part of E (and of course statistics, so let's call it E/S), going back a century or so, focuses on non-causal prediction, just like ML. The leading example is time-series E/S. Just take a look at an E/S text like Elliott and Timmermann (contents and first chapter <a href="http://press.princeton.edu/titles/10740.html#reviews">here</a>; index <a href="https://www.amazon.com/Economic-Forecasting-Graham-Elliott/dp/0691140138/ref=sr_1_1?ie=UTF8&qid=1475237551&sr=8-1&keywords=elliott+timmermann">here</a>). A lot of it looks like parts of ML. But it's not "E/S people chasing ML ideas"; rather, E/S has been in the game for decades, often well ahead of ML.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">For this reason the E/S crowd sometimes wonders whether "ML" and "data science" are just the same old wine in a new bottle. (The joke goes, Q: What is a "data scientist"? A: A statistician who lives in San Francisco.) ML/DataScience is <i>not </i>the same old wine, but it's a blend, and a significant part of the blend is indeed E/S.</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">To be continued...</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-16846786173017402192016-10-02T16:55:00.001-07:002016-10-13T04:46:41.831-07:00Machine Learning vs. Econometrics, I<span style="font-size: large;">[If you're reading this in email, remember to click through on the title to get the math to render.]</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">Machine learning (ML) is almost always centered on prediction; think "\(\hat{y}\)". Econometrics (E) is often, but not always, centered on prediction. Instead it's also often interested on estimation and associated inference; think </span><span style="font-size: large;">"\(\hat{\beta}\)"</span><span style="font-size: large;">. <br /><br />Or so the story usually goes. But that misses the real distinction. <i>Both </i>ML and E as described above are centered on prediction. The key difference is that ML focuses on non-causal prediction (if a new person \(i\) arrives with covariates \(X_i\), what is my minimium-MSE guess of her \(y_i\)?), whereas the part of econometrics highlighted above focuses on causal prediction (if I intervene and give person \(i\) a certain treatment, what is my minimum-MSE guess of \(\Delta y_i\)?). </span><span style="font-size: large;">It just happens that, assuming linearity, a "minimum-MSE guess of \(\Delta y_i\)" is the same as a "minimum-MSE estimate of \(\beta_i\)".</span><span style="font-size: large;"><br /><br />So there is a ML vs. E distinction here, but it's not "prediction vs. estimation" -- <i>it's all prediction</i>. Instead, the issue is non-causal prediction vs. causal prediction.</span><span style="font-size: large;"><br /></span><br /><div><span style="font-size: large;"><br /></span> <span style="font-size: large;">But there's another ML vs. E difference that's even more fundamental. </span><span style="font-size: large;">TO BE CONTINUED...</span></div><script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script><br />Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-31148701496405757612016-09-26T04:05:00.000-07:002016-11-03T11:32:06.050-07:00Fascinating Conference at Chicago<span style="font-size: large;">I just returned from the University of Chicago conference, "<a href="http://bfi.uchicago.edu/events/machine-learning-what%E2%80%99s-it-economics">Machine Learning: What's in it for Economics?</a>" Lots of cool things percolating. I'm teaching a Penn Ph.D. course later this fall on aspects of the ML/econometrics interface. Feeling really charged.</span><br /><div><span style="font-size: large;"><br /></span></div><div><span style="font-size: large;">By the way, hadn't yet been to the new Chicago economics "cathedral" (<a href="http://architecture.uchicago.edu/locations/department_of_economics_and_becker_friedman_institute/">Saieh Hall for Economics</a>) and <a href="http://bfi.uchicago.edu/">Becker-Friedman Institute</a>. Wow. What an institution, both intellectually and physically.</span></div><div><br /></div>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-42043104134219858852016-09-20T08:27:00.002-07:002016-10-27T06:18:38.000-07:00On "Shorter Papers"<span style="font-size: large;">Journals should not corral shorter papers into sections like "Shorter Papers". Doing so sends a subtle (actually unsubtle) message that shorter papers are </span><span style="font-size: large;">basically second-class citizens, </span><span style="font-size: large;">somehow less good, or less important, or less something -- not just less long -- than longer papers. I</span><span style="font-size: large;">f a paper is above the bar, then it's above the bar, and regardless of its length it should then be published simply as a paper, not a "shorter paper", or a "note", or anything else. Many shorter papers are much more important than the vast majority of longer papers.</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-57236874513858201972016-09-12T05:03:00.000-07:002016-09-12T05:07:22.082-07:00Time-Series Econometrics and Climate Change<div><span style="font-size: large;">It's exciting to see time series econometrics contributing to the climate change discussion. </span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">Check out the upcoming CREATES conference, "Econometric Models of Climate Change", <a href="http://www.climateeconometrics.org/conference/">here</a>.<br /></span><br /><span style="font-size: large;">Here are a few good examples of recent time-series climate research, in chronological order. (There are many more. Look through the reference lists, for example, in the 2016 and 2017 papers below.)</span></div><div><span style="font-size: large;"><br /></span></div><span style="font-size: large;"><a href="http://link.springer.com/article/10.1007%2Fs10584-006-9062-1">Jim Stock et al. (2009)</a> in <i>Climatic Change.</i><br /><br /><a href="http://www.nature.com/ngeo/journal/v6/n12/full/ngeo1999.html">Pierre Perron et al. (2013)</a> in <i>Nature.</i><br /><br /><a href="http://www.nature.com/ngeo/journal/v9/n4/abs/ngeo2670.html">Peter Phillips et al. (2016)</a> in <i>Nature.</i><br /><br /><a href="http://econ.au.dk/fileadmin/site_files/filer_oekonomi/Working_Papers/CREATES/2015/rp15_28.pdf">Proietti and Hillebrand (2017)</a>, forthcoming in <i>Journal of the Royal Statistical Society.</i><br /></span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-71895062965993129372016-09-06T04:01:00.000-07:002016-10-27T06:19:32.764-07:00Inane Journal "Impact Factors"<span style="font-size: large;">Why are journals so obsessed with "impact factors"? (The five-year impact factor is average citations/article in a five-year window.) They're often calculated to three decimal places, and publishers trumpet victory when they go from (say) 1.225 to 1.311! It's hard to think of a dumber statistic, or dumber over-interpretation. Are the numbers after the decimal point anything more than noise, and for that matter, a</span><span style="font-size: large;">re the numbers <i>before</i> the decimal much more than noise?</span><br /><span style="font-size: large;"><br />Why don't journals instead use the same citation indexes used for individuals? The leading index seems to be the <i>h</i>-index, which is the largest integer <i>h</i> such that an individual has <i>h</i> papers, each cited at least <i>h</i> times. I don't know who cooked up the <i>h</i>-index, and </span><span style="font-size: large;">surely it has issues too, but the gurus love it, and in my experience it tells the truth. <br /><br />Even better, why not stop obsessing over clearly-insufficient statistics of any kind? I propose instead looking at what I'll call a "citation signature plot" (CSP), simply plotting the number of cites for the most-cited paper, the number of cites for the second-most-cited paper, and so on. (Use whatever window(s) you want.) The CSP reveals everything, instantly and visually. How high is the CSP for the top papers? How quickly, and with what pattern, does it approach zero? etc., etc. It's all there. <br /><br />Google-Scholar CSP's are easy to make for individuals, and they're tremendously informative. They'd be only slightly harder to make for journals. I'd love to see some.</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-11033470606973831162016-08-29T02:40:00.001-07:002016-09-21T03:55:20.151-07:00On Credible Cointegration Analyses<span style="font-size: large;">I may not know whether some \(I(1)\) variables are cointegrated, but if they are, I often have a very strong view about the likely number and nature of cointegrating combinations. Single-factor structure is common in many areas of economics and finance, so if cointegration is present in an \(N\)-variable system, for example, a natural benchmark is 1 common trend (\(N-1\) cointegrating combinations). And moreover, the natural cointegrating combinations are almost always spreads or ratios (which of course are spreads in logs). For example, log consumption and log income may or may not be cointegrated, but if they <i>are</i>, then the obvious benchmark cointegrating combination is \((ln C - ln Y)\). Similarly, the obvious benchmark for </span><span style="font-size: large;">\(N\) government bond yields \(y\)</span><span style="font-size: large;"> </span><span style="font-size: large;">is \(N-1\) cointegrating combinations, given by term spreads relative to some reference yield; e.g., \(y_2 - y_1\), \(y_3 - y_1\), ..., \(y_N - y_1\).</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">There's not much literature exploring this perspective. (One notable exception is <a href="http://www.princeton.edu/~mwatson/papers/Horvath_Watson_ET_1995.pdf">Horvath and Watson, "Testing for Cointegration When Some of the Cointegrating Vectors are Prespecified", <i>Econometric Theory</i></a><a href="http://www.princeton.edu/~mwatson/papers/Horvath_Watson_ET_1995.pdf">, 11, 952-984</a>.) We need more.</span><br /><script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-23327326063434707052016-08-21T00:14:00.003-07:002016-08-21T00:14:45.654-07:00More on Big Data and Mixed Frequencies<span style="font-size: large;">I recently <a href="http://fxdiebold.blogspot.ch/2016/06/mixed-frequency-high-dimensional-time.html">blogged on Big Data and mixed-frequency data</a>, arguing that Big Data (wide data, in particular) leads naturally to mixed-frequency data. (See <a href="http://fxdiebold.blogspot.ch/2016/04/big-data-tall-wide-and-dense.html">here</a> for the tall data / wide data / dense data taxonomy.) The obvious just occurred to me, namely that it's also true in the other direction. That is, mixed-frequency situations also lead naturally to Big Data, and with a subtle twist: the nature of the Big Data may be dense rather than wide. The theoretically-pure way to set things up is as a state-space system laid out at the highest observed frequency, appropriately treating most of the lower-frequency data as missing, as in <a href="https://www.philadelphiafed.org/research-and-data/real-time-center/business-conditions-index">ADS</a>. By construction, the system is dense if any of the series are dense, as the system is laid out at the highest frequency.</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-83703561163285467892016-08-17T06:45:00.000-07:002016-08-21T00:22:35.181-07:00On the Evils of Hodrick-Prescott Detrending<span style="font-size: large;">[If you're reading this in email, remember to click through on the title to get the math to render.]</span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">Jim Hamilton has a very cool new paper, "<a href="http://econweb.ucsd.edu/~jhamilto/hp.pdf">Why You Should Never Use the Hodrick-Prescott (HP) Filter</a>".</span><br /><span style="font-size: large;"><br /></span> <span style="font-size: large;">Of course we've known of the pitfalls of HP ever since <a href="https://ideas.repec.org/a/eee/dyncon/v19y1995i1-2p253-278.html">Cogley and Nason (1995)</a> brought them into razor-sharp focus decades ago. The title of the even-earlier <a href="http://www.uta.edu/faculty/crowder/papers/Nelson_Kang.pdf">Nelson and Kang (1981)</a> classic, "Spurious Periodicity in Inappropriately Detrended Time Series", says it all. Nelson-Kang made the spurious-periodicity case against polynomial detrending of I(1) series. Hamilton makes the spurious-periodicity case against HP detrending of many types of series, including I(1). (Or, more precisely, Hamilton </span><span style="font-size: large;">adds even more weight to the Cogley-Nason spurious-periodicity case against HP.</span><span style="font-size: large;">)</span><br /><span style="font-size: large;"><br /></span> <span style="font-size: large;">But the main contribution of Hamilton's paper is constructive, not destructive. It provides a superior detrending method, based only on a simple linear projection. </span><br /><span style="font-size: large;"><br /></span><span style="font-size: large;">Here's a way to understand what "Hamilton detrending" does and why it works, based on a nice connection to <a href="http://www.uh.edu/~cmurray/courses/econ_7395/Beveridge%20Nelson.pdf">Beveridge-Nelson (1981)</a> detrending not noticed in Hamilton's paper. </span><br /><span style="font-size: large;"><br /></span> <span style="font-size: large;">First consider Beveridge-Nelson (BN) trend for I(1) series. BN trend is just a very long-run forecast based on an infinite past. [You want a very long-run forecast in the BN environment because the stationary cycle washes out from a very long-run forecast, leaving just the forecast of the underlying random-walk stochastic trend, which is also the current value of the trend since it's a random walk. So the BN trend at any time is just a very long-run forecast made at that time.] Hence BN trend is implicitly based on the projection: </span><span style="font-size: large;">\(y_t ~ \rightarrow ~ c, ~ y_{t-h}, ~...,~ y_{t-h-p} \), </span><span style="font-size: large;">for \(h \rightarrow \infty \) and \(p \rightarrow \infty\).</span><br /><span style="font-size: large;"><br /></span> <span style="font-size: large;">Now consider Hamilton trend. It is <i>explicitly </i>based on the projection: </span><span style="font-size: large;">\(y_t ~ \rightarrow ~ c, ~ y_{t-h}, ~...,~ y_{t-h-p} \), </span><span style="font-size: large;">for \(p = 3 \). (Hamilton also uses a benchmark of \(h = 8 \).)</span><br /><span style="font-size: large;"><br /></span> <span style="font-size: large;">So BN and Hamilton are both "linear projection trends", differing only in choice of \(h\) and \(p\)</span><span style="font-size: large;">! BN takes an infinite forecast horizon and projects on an infinite past. Hamilton takes a medium forecast horizon and projects on just the recent past.</span><br /><span style="font-size: large;"><br /></span> <span style="font-size: large;">Much of Hamilton's paper is devoted to defending the choice of \(p = 3 \), which </span><span style="font-size: large;">turns out to perform well for a wide range of data-generating processes (not just I(1)). The BN choice of \(h = p = \infty \), in contrast, although optimal for I(1) series, is less robust to other DGP's. (And of course estimation of the BN projection as written above is infeasible, which people avoid in practice by assuming low-ordered ARIMA structure.)</span><br /><script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-7419678127081367122016-08-15T04:04:00.001-07:002016-08-15T04:04:14.084-07:00More on Nonlinear Forecasting Over the Cycle<span style="font-size: large;">Related to <a href="http://fxdiebold.blogspot.com/2016/08/nearest-neighbor-forecasting-in-times.html">my last post</a>, here's a new paper that just arrived from <a href="http://www.stevanovic.uqam.ca/KS_Recession.pdf">Rachidi Kotchoni and Dalibor Stevanovic, "Forecasting U.S. Recessions and Economic Activity"</a>. It's not non-parametric, but it is non-linear. As Dalibor put it, "The method is very simple: predict turning points and recession probabilities in the first step, and then augment a direct AR model with the forecasted probability." <a href="http://www.stevanovic.uqam.ca/KS_Recession.pdf">Kotchoni-Stevanovic</a> and <a href="https://297a577f-a-62cb3a1a-s-sites.googlegroups.com/site/pabloaguerronquintana/forecast.pdf?attachauth=ANoY7cqvwmTNrc35SoW-yYJpQAMR1lWq_kpwlI7exqWlLUkMoVH7yYms_-7DpklydIm4OncIkBzGdkh1GfHfQygHy_BWtZp-kGweBegCclHckzJw-bn7c85hwJ2kczPag7Kzt2YfbhJqEjvaQJ9t2HLnYFaRosAHTt1_n6et3rLY2uAVEy6j4BwR6Z7P3kymuvNni2VO1bzK0klvKFhnrWHuXUl1TotFChGX9ADKp9NDveLU-Jc_TRQ%3D&attredirects=1">Guerron-Quintana-Zhong</a> are usefully read together.</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0tag:blogger.com,1999:blog-4455972733011945441.post-76967318608426400922016-08-14T13:13:00.001-07:002016-08-14T13:16:43.771-07:00Nearest-Neighbor Forecasting in Times of Crisis<span style="font-size: large;">Nonparametric K-nearest-neighbor forecasting remains natural and obvious and potentially very useful, as it has been since its inception long ago. <br /><br />[Most crudely: Find the K-history closest to the present K-history, see what followed it, and use that as a forecast. Slightly less crudely: Find the N K-histories closest to the present K-history, see what followed each of them, and take an average. There are many obvious additional refinements.]<br /><br />Overall, nearest-neighbor forecasting remains curiously under-utilized in dynamic econometrics. Maybe that will change. In an interesting recent development, for example, <a href="https://297a577f-a-62cb3a1a-s-sites.googlegroups.com/site/pabloaguerronquintana/forecast.pdf?attachauth=ANoY7cqvwmTNrc35SoW-yYJpQAMR1lWq_kpwlI7exqWlLUkMoVH7yYms_-7DpklydIm4OncIkBzGdkh1GfHfQygHy_BWtZp-kGweBegCclHckzJw-bn7c85hwJ2kczPag7Kzt2YfbhJqEjvaQJ9t2HLnYFaRosAHTt1_n6et3rLY2uAVEy6j4BwR6Z7P3kymuvNni2VO1bzK0klvKFhnrWHuXUl1TotFChGX9ADKp9NDveLU-Jc_TRQ%3D&attredirects=1">new Federal Reserve System research by Pablo Guerron-Quintana and Molin Zhong</a> puts nearest-neighbor methods to good use for forecasting in times of crisis.</span>Francis Dieboldhttps://plus.google.com/104011662239494052073noreply@blogger.com0