## Monday, January 16, 2017

### Impulse Responses From Smooth Local Projections

Check out Barnichon-Brownlees (2017) (BB). As proposed and developed in Jorda (2005), they estimate impulse-response functions (IRF's) directly by projecting outcomes on estimates of structural shocks at various horizons, as opposed to inverting a fitted autoregression. The BB enhancement relative to Jorda is the effective incorporation of a smoothness prior in IRF estimation. (Notice that the traditional approach of inverting a low-ordered autoregression automatically promotes IRF smoothness.) In my view, smoothness is a natural IRF shrinkage direction, and BB convincingly show that it's likely to enhance estimation efficiency relative to Jorda's original approach. I always liked the idea of attempting to go after IRF's directly, and Jorda/BB seems appealing.

## Friday, January 13, 2017

### Math Rendering Problem Fixed

The problem with math rendering in the recent post, "All of Machine Learning in One Expression", is now fixed (I hope). That is, the math should now look like math, not LaTeX code, on all devices.

## Monday, January 9, 2017

### All of Machine Learning in One Expression

Sendhil Mullainathan gave an entertaining plenary talk on machine learning (ML) in finance, in Chicago last Saturday at the annual American Finance Association (AFA) meeting. (Many hundreds of people, standing room only -- great to see.) Not much new relative to the posts here, for example, but he wasn't trying to deliver new results. Rather he was trying to introduce mainstream AFA financial economists to the ML perspective.

[Of course ML perspective and methods have featured prominently in time-series econometrics for many decades, but many of the recent econometric converts to ML (and audience members at the AFA talk) are cross-section types, not used to thinking much about things like out-of-sample predictive accuracy, etc.]

Anyway, one cute and memorable thing -- good for teaching -- was Sendhil's suggestion that one can use the canonical penalized estimation problem as a taxonomy for much of ML. Here's my quick attempt at fleshing out that suggestion.

Consider estimating a parameter vector \( \theta \) by solving the penalized estimation problem,

\( \hat{\theta} = argmin_{\theta} \sum_{i} L (y_i - f(x_i, \theta) ) ~~s.t.~~ \gamma(\theta) \le c , \)

or equivalently in Lagrange multiplier form,

\( \hat{\theta} = argmin_{\theta} \sum_{i} L (y_i - f(x_i, \theta) ) + \lambda \gamma(\theta) . \)

(1) \( f(x_i, \theta) \) is about the modeling strategy (linear, parametric non-linear, non-parametric non-linear (series, trees, nearest-neighbor, kernel, ...)).

(2) \( \gamma(\theta) \) is about the type of regularization. (Concave penalty functions non-differentiable at the origin produce selection to zero, smooth convex penalties produce shrinkage toward 0, the LASSO penalty is both concave and convex, so it both selects and shrinks, ...)

(3) \( \lambda \) is about the strength of regularization.

(4) \( L(y_i - f(x_i, \theta) ) \) is about predictive loss (quadratic, absolute, asymmetric, ...).

Many ML schemes emerge as special cases. To take just one well-known example, linear regression with regularization by LASSO and regularization strength chosen to optimize out-of-sample predictive MSE corresponds to (1) \( f(x_i, \theta)\) linear, (2) \( \gamma(\theta) = \sum_j |\theta_j| \), (3) \( \lambda \) cross-validated, and (4) \( L(y_i - f(x_i, \theta) ) = (y_i - f(x_i, \theta) )^2 \).

[Of course ML perspective and methods have featured prominently in time-series econometrics for many decades, but many of the recent econometric converts to ML (and audience members at the AFA talk) are cross-section types, not used to thinking much about things like out-of-sample predictive accuracy, etc.]

Anyway, one cute and memorable thing -- good for teaching -- was Sendhil's suggestion that one can use the canonical penalized estimation problem as a taxonomy for much of ML. Here's my quick attempt at fleshing out that suggestion.

Consider estimating a parameter vector \( \theta \) by solving the penalized estimation problem,

\( \hat{\theta} = argmin_{\theta} \sum_{i} L (y_i - f(x_i, \theta) ) ~~s.t.~~ \gamma(\theta) \le c , \)

or equivalently in Lagrange multiplier form,

\( \hat{\theta} = argmin_{\theta} \sum_{i} L (y_i - f(x_i, \theta) ) + \lambda \gamma(\theta) . \)

(1) \( f(x_i, \theta) \) is about the modeling strategy (linear, parametric non-linear, non-parametric non-linear (series, trees, nearest-neighbor, kernel, ...)).

(2) \( \gamma(\theta) \) is about the type of regularization. (Concave penalty functions non-differentiable at the origin produce selection to zero, smooth convex penalties produce shrinkage toward 0, the LASSO penalty is both concave and convex, so it both selects and shrinks, ...)

(3) \( \lambda \) is about the strength of regularization.

(4) \( L(y_i - f(x_i, \theta) ) \) is about predictive loss (quadratic, absolute, asymmetric, ...).

Many ML schemes emerge as special cases. To take just one well-known example, linear regression with regularization by LASSO and regularization strength chosen to optimize out-of-sample predictive MSE corresponds to (1) \( f(x_i, \theta)\) linear, (2) \( \gamma(\theta) = \sum_j |\theta_j| \), (3) \( \lambda \) cross-validated, and (4) \( L(y_i - f(x_i, \theta) ) = (y_i - f(x_i, \theta) )^2 \).

## Tuesday, January 3, 2017

### Torpedoing Econometric Randomized Controlled Trials

A very Happy New Year to all!

I get no pleasure from torpedoing anything, and "torpedoing" is likely exaggerated, but nevertheless take a look at "A Torpedo Aimed Straight at HMS Randomista". It argues that many econometric randomized controlled trials (RCT's) are seriously flawed -- not even

Note the interesting situation. Everyone these days is worried about

The underlying research paper, "Behavioural Responses and the Impact of New Agricultural Technologies: Evidence from a Double-Blind Field Experiment in Tanzania", by Bulte

Here's the abstract:

Randomized controlled trials in the social sciences are typically not double-blind, so participants know they are “treated” and will adjust their behavior accordingly. Such effort responses complicate the assessment of impact. To gauge the potential magnitude of effort responses we implement an open RCT and double-blind trial in rural Tanzania, and randomly allocate modern and traditional cowpea seed-varieties to a sample of farmers. Effort responses can be quantitatively important––for our case they explain the entire “treatment effect on the treated” as measured in a conventional economic RCT. Specifically, harvests are the same for people who know they received the modern seeds and for people who did not know what type of seeds they got, but people who knew they received the traditional seeds did much worse. We also find that most of the behavioral response is unobserved by the analyst, or at least not readily captured using coarse, standard controls.

I get no pleasure from torpedoing anything, and "torpedoing" is likely exaggerated, but nevertheless take a look at "A Torpedo Aimed Straight at HMS Randomista". It argues that many econometric randomized controlled trials (RCT's) are seriously flawed -- not even

*internally*valid -- due to their failure to use double-blind randomization. At first the non-double-blind critique may sound cheap and obvious, inviting you to roll your eyes and say "get over it". But ultimately it's not.Note the interesting situation. Everyone these days is worried about

*external*validity (extensibility), under the implicit*assumption*that internal validity has been achieved (e.g., see this earlier post). But the non-double-blind critique makes clear that even internal validity may be dubious in econometric RCT's as typically implemented.The underlying research paper, "Behavioural Responses and the Impact of New Agricultural Technologies: Evidence from a Double-Blind Field Experiment in Tanzania", by Bulte

*et al*., was published in 2014 in the*American Journal of Agricultural Economics*. Quite an eye-opener.Here's the abstract:

Randomized controlled trials in the social sciences are typically not double-blind, so participants know they are “treated” and will adjust their behavior accordingly. Such effort responses complicate the assessment of impact. To gauge the potential magnitude of effort responses we implement an open RCT and double-blind trial in rural Tanzania, and randomly allocate modern and traditional cowpea seed-varieties to a sample of farmers. Effort responses can be quantitatively important––for our case they explain the entire “treatment effect on the treated” as measured in a conventional economic RCT. Specifically, harvests are the same for people who know they received the modern seeds and for people who did not know what type of seeds they got, but people who knew they received the traditional seeds did much worse. We also find that most of the behavioral response is unobserved by the analyst, or at least not readily captured using coarse, standard controls.

## Sunday, December 18, 2016

### Holiday Haze

[Photo credit: Public domain, by Marcus Quigmire, from Florida, USA (Happy Holidays Uploaded by Princess MÃ©rida) [CC-BY-SA-2.0 (http://creativecommons.org/licenses/by-sa/2.0)], via Wikimedia Commons]

## Sunday, December 11, 2016

### Varieties of RCT Extensibility

Even internally-valid RCT's have issues. They reveal the treatment effect only for the precise experiment performed and situation studied. Consider, for example, a study of the effects of fertilizer on crop yield, done for region X during a heat wave. Even if internally valid, the estimated treatment effect is that of fertilizer on crop yield in region X

Note the interesting

In essence, we'd like panel data, to account both for cross-section effects and time-series effects, but most RCT's unfortunately have only a single cross section.

Mark Rosenzweig and Chris Udry have a fascinating new paper, "Extenal Validity in a Stochastic World", that grapples with some of the time-series extensibility issues raised above.

*during a heat wave*. The results do not necessarily generalize -- and in this example surely do not generalize -- to times of ``normal" weather, even in region X. And of course, for a variety of reasons, they may not generalize to regions other than X, even in heat waves.Note the interesting

*time-series*dimension to the failure of external validity (extensibility) in the example above. (The estimate is obtained during this year's heat wave, but next year may be "normal", or "cool". And this despite the lack of any true structural change. But of course there could be true structural change, which would only make matters worse.) This contrasts with the usual*cross-sectional*focus of extensibility discussions (e.g., we get effect e in region X, but what effect would we get in region Z?)In essence, we'd like panel data, to account both for cross-section effects and time-series effects, but most RCT's unfortunately have only a single cross section.

Mark Rosenzweig and Chris Udry have a fascinating new paper, "Extenal Validity in a Stochastic World", that grapples with some of the time-series extensibility issues raised above.

## Monday, December 5, 2016

### Exogenous vs. Endogenous Volatility Dynamics

I always thought putting exogenous volatility dynamics in macro-model shocks was a cop-out. Somehow it seemed more satisfying for volatility to be determined endogenously, in equilibrium. Then I came around: We allow for shocks with exogenous conditional-mean dynamics (e.g., AR(1)), so why shouldn't we allow for shocks with exogenous conditional-volatility dynamics? Now I might shift back, at least in part, thanks to new work by Sydney Ludvigson, Sai Ma, and Serena Ng, "Uncertainty and Business Cycles: Exogenous Impulse or
Endogenous Response?", which attempts to sort things out. The October 2016 version is here. It turns out that real (macro) volatility appears largely endogenous, whereas nominal (financial market) volatility appears largely exogenous.

## Monday, November 28, 2016

### Gary Gorton, Harald Uhlig, and the Great Crisis

Gary Gorton has made clear that the financial crisis of 2007 was in essence a traditional banking panic, not unlike those of the ninetheeth century. A key corollary is that the root cause of the Panic of 2007 can't be something relatively new, like "Too Big to Fail". (See this.) Lots of people blame residential mortgage-backed securities (RMBS's), but they're also too new. Interestingly, in new work Juan Ospina and Harald Uhlig examine RBMS's directly. Sure enough, and contrary to popular impression, they performed quite well through the crisis.

## Sunday, November 20, 2016

### Dense Data for Long Memory

From the last post, you might think that efficient learning about low-frequency phenomena requires tall data. Certainly efficient estimation of trend, as stressed in the last post,

*does*require tall data. But it turns out that efficient estimation of other aspects of low-frequency dynamics sometimes requires only dense data. In particular, consider a pure long memory, or "fractionally integrated", process, \( (1-L)^d x_t = \epsilon_t \), 0 < \( d \) < 1/2. (See, for example, this or this.) In a general \( I(d) \) process, \(d\) governs only low-frequency behavior (the rate of decay of long-lag autocorrelations toward zero, or equivalently, the rate of explosion of low-frequency spectra toward infinity), so tall data are needed for efficient estimation of \(d\). But in a pure long-memory process, one parameter (\(d\)) governs behavior at*all*frequencies, including arbitrarily low frequencies, due to the self-similarity ("scaling law") of pure long memory. Hence for pure long memory a short but dense sample can be as informative about \(d\) as a tall sample. (And pure long memory often appears to be a highly-accurate approximation to financial asset return volatilities, as for example in ABDL.)## Monday, November 7, 2016

### Big Data for Volatility vs.Trend

Although largely uninformative for some purposes, dense data (high-frequency sampling) are highly informative for others. The massive example of recent decades is volatility estimation. The basic insight traces at least to Robert Merton's early work. Roughly put, as we sample returns arbitrarily finely, we can infer underlying volatility (quadratic variation) arbitrarily well.

So, what is it for which dense data are "largely

Assembling everything, for estimating yesterday's stock-market volatility you'd love to have yesterday's 1-minute intra-day returns, but for estimating the expected return on the stock market (the slope of a linear log-price trend) you'd much rather have 100 years of annual returns, despite the fact that a naive count would say that 1 day of 1-minute returns is a much "bigger" sample.

So different aspects of Big Data -- in this case dense vs. tall -- are of different value for different things. Dense data promote accurate volatility estimation, and tall data promote accurate trend estimation.

So, what is it for which dense data are "largely

*un*informative"? The massive example of recent decades is long-term trend. Again roughly put and assuming linearity, long-term trend is effectively a line segment drawn between a sample's first and last observations, so for efficient estimation we need tall data (long calendar span), not dense data.Assembling everything, for estimating yesterday's stock-market volatility you'd love to have yesterday's 1-minute intra-day returns, but for estimating the expected return on the stock market (the slope of a linear log-price trend) you'd much rather have 100 years of annual returns, despite the fact that a naive count would say that 1 day of 1-minute returns is a much "bigger" sample.

So different aspects of Big Data -- in this case dense vs. tall -- are of different value for different things. Dense data promote accurate volatility estimation, and tall data promote accurate trend estimation.

Subscribe to:
Posts (Atom)