Saturday, September 29, 2018

RCT's vs. RDD's

Art Owen and Hal Varian have an eye-opening new paper, "Optimizing the Tie-Breaker Regression Discontinuity Design".

Randomized controlled trials (RCT's) are clearly the gold standard in terms of statistical efficiency for teasing out causal effects. Assume that you really can do an RCT. Why then would you ever want to do anything else?

Answer: There may be important considerations beyond statistical efficiency. Take the famous "scholarship example". (You want to know whether receipt of an academic scholarship causes enhanced academic performance among top scholarship test performers.) In an RCT approach you're going to give lots of academic scholarships to lots of randomly-selected people, many of whom are not top performers. That's wasteful. In a regression discontinuity design (RDD) approach ("give scholarships only to top-performers who score above X in the scholarship exam, and compare the performances of students who scored just above and below X"), you don't give any scholarships to non-top performers. So it's not wasteful -- but the resulting inference is highly statistically inefficient. 

"Tie breakers" implement a middle ground: Definitely don't give scholarships to poor performers, definitely do give scholarships to top performers, and randomize for a middle group. So you gain some efficiency relative to pure RDD (but you're a little wasteful), and you're less wasteful than a pure RCT (but you lose some efficiency).

Hence there's an efficiency/wastefulness trade-off, and your location on it depends on the size of the your middle group. Owen and Varian characterize the trade-off and show how to optimize it. Really nice, clean, and useful.

[Sorry but I'm running way behind. I saw Hal present this work a few months ago at a fine ECB meeting on predictive modeling.]