Friday, December 21, 2018

Holiday Haze

File:Happy Holidays (5318408861).jpg
Happy holidays! 

Your blogger is about to vanish, returning in the new year. Many thanks for your past, present, and future support. 

If you're at ASSA Atlanta, I hope you'll come to the Penn Economics and Finance parties.

Sunday, December 16, 2018

Causality as Robust Prediction

I like thinking about causal estimation as a type of prediction (e.g., here). Here's a very nice slide deck from Peter Buhlmann at ETH Zurich detailing his group's recent and ongoing work in that tradition.














Thursday, December 13, 2018

More on Google Dataset Search

Some months ago I blogged on Google's new development of a dataset search tool.  Evidently it's coming along.  Check out the beta version here. Also, on dataset supply as opposed to demand, see here for how to maximize visibility of your datasets to the search engine.

[With thanks to the IIF's Oracle newsletter for alerting me.]


Monday, December 10, 2018

Greater New York Area Econometrics Colloquium

Last week's 13th annual Greater New York Area Econometrics Colloquium, generously hosted by Princeton, was a great success, with strong papers throughout. The program is below. I found two papers especially interesting. I already blogged on Spady and Stouli's “Simultaneous Mean-Variance Regression”. The other was "Nonparametric Sample Splitting", by Lee and Wang.

Think of a nonlinear classification problem. In general the decision boundary is of course a highly nonlinear surface, but it's a supervised learning situation, so it's "easy" to learn the surface using standard nonlinear regression methods. Lee and Wang, in contrast, study an unsupervised learning situation, effectively a threshold regression model, where the threshold is determined by an unknown nonparametric relation. And they have very cool applications to things like estimating effective economic borders, gerrymandering, etc. 

The 13th Greater New York Metropolitan Area Econometrics Colloquium

Princeton University, Saturday, December 1, 2018

9.00am-10.30am: Session 1
“Simple Inference for Projections and Linear Programs” by Hiroaki Kaido (BU), Francesca Molinari (Cornell), and Jörg Stoye (Cornell)
“Clustering for multi-dimensional heterogeneity with application to production function estimation” by Xu Cheng (UPenn), Peng Shao (UPenn), and Frank Schorfheide (UPenn)
“Adaptive Bayesian Estimation of Mixed Discrete-Continuous Distributions Under Smoothness and Sparsity” by Andriy Norets (Brown) and Justinas Pelenis (Vienna IAS)

11.00am-12.30pm: Session 2
“Factor-Driven Two-Regime Regression” by Sokbae Lee (Columbia), Yuan Liao (Rutgers), Myung Hwan Seo (Cowles), and Youngki Shin (McMaster)
“Semiparametric Estimation in Continuous-Time: Asymptotics for Integrated Volatility Functionals with Small and Large Bandwidths” by Xiye Yang (Rutgers)
“Nonparametric Sample Splitting” by Yoonseok Lee (Syracuse) and Yulong Wang (Syracuse)

2.00pm-3.30pm: Session 3
“Counterfactual Sensitivity and Robustness” by Timothy Christensen (NYU) and Benjamin Connault (IEX Group)
“Dynamically Optimal Treatment Allocation Using Reinforcement Learning” by Karun Adusumilli (UPenn), Friedrich Geiecke (LSE), and Claudio Schilter (LSE)
“Simultaneous Mean-Variance Regression” by Richard Spady (Johns Hopkins) and Sami Stouli (Bristol)

4.00pm-5.30pm: Session 4
“Semi-parametric instrument-free demand estimation: relaxing optimality and equilibrium assumptions” by Sungjin Cho (Seoul National), Gong Lee (Georgetown), John Rust (Georgetown), and Mengkai Yu (Georgetown)
“Nonparametric analysis of monotone choice” by Natalia Lazzati (UCSC), John Quah (Johns Hopkins), and Koji Shirai (Kwansei Gakuin)
“Discrete Choice under Risk with Limited Consideration” by Levon Barseghyan (Cornell), Francesca Molinari (Cornell), and Matthew Thirkettle (Cornell)

Organizing Committee
Bo Honoré, Michal Kolesár, Ulrich Müller, and Mikkel Plagborg-Møller

Participants

Adusumilli 
Karun
UPenn

Althoff
Lukas
Princeton
Anderson
Rachel
Princeton
Bai
Jushan
Columbia
Beresteanu
Arie
Pitt
Callaway
Brantly
Temple
Chao
John
Maryland
Cheng
Xu
UPenn
Choi
Jungjun
Rutgers
Choi
Sung Hoon
Rutgers
Cox
Gregory
Columbia
Christensen
Timothy
NYU
Diebold
Frank
UPenn
Dou
Liyu
Princeton
Gao
Wayne
Yale
Gaurav
Abhishek
Princeton
Henry
Marc
Penn State
Ho
Paul
Princeton
Honoré
Bo
Princeton
Hu
Yingyao
Johns Hopkins
Kolesar
Michal
Princeton
Lazzati
Natalia
UCSC
Lee
Simon
Columbia
Li
Dake
Princeton
Li
Lixiong
Penn State
Liao
Yuan
Rutgers
Menzel
Konrad
NYU
Molinari
Francesca
Cornell
Montiel Olea
José Luis
Columbia
Müller
Ulrich
Princeton
Norets
Andriy
Brown
Plagborg-Møller
Mikkel
Princeton
Poirier
Alexandre
Georgetown
Quah
John
Johns Hopkins
Rust
John
Georgetown
Schorfheide
Frank
UPenn
Seo
Myung
SNU & Cowles
Shin
Youngki
McMaster
Sims
Christopher
Princeton
Spady
Richard
Johns Hopkins
Stoye
Jörg
Cornell
Taylor
Larry
Lehigh
Vinod
Hrishikesh
Fordham
Wang
Yulong
Syracuse
Yang
Xiye
Rutgers
Zeleneev
Andrei
Princeton

Monday, December 3, 2018

Dual Regression and Prediction

Richard Spady and Sami Stouli have an interesting new paper, “Dual Regression". They change the usual OLS loss function from quadratic to something related but different, as per their equation (2.2), and they get impressive properties for estimation under correct specification. They also have some results under misspecification.

I'd like to understand more regarding dual regression's properties for prediction under misspecification. Generally we're comfortable with quadratic loss, in which case OLS delivers the goods (the conditional mean or linear projection) in large samples under great generality (e.g., see here). The dual regression estimator, in contrast, has a different probability limit under misspecification -- it's not providing a KLIC-optimal approximation.

If the above sounds negative, note well that the issue raised may be an opportunity, not a pitfall! Certainly there is nothing sacred about quadratic loss, even if the conditional mean is usually a natural predictor. We sometimes move to absolute-error loss (conditional median predictor), check-function loss (conditional quantile predictor), or all sorts of other predictive loss functions depending on the situation. But movements away from conditional mean or median prediction generally require some justification and interpretation. Equivalently, movements away from quadratic or absolute predictive loss generally require some justification and interpretation. I look forward to seeing that for the loss function that drives dual regression.