Saturday, September 29, 2018

RCT's vs. RDD's

Art Owen and Hal Varian have an eye-opening new paper, "Optimizing the Tie-Breaker Regression Discontinuity Design".

Randomized controlled trials (RCT's) are clearly the gold standard in terms of statistical efficiency for teasing out causal effects. Assume that you really can do an RCT. Why then would you ever want to do anything else?

Answer: There may be important considerations beyond statistical efficiency. Take the famous "scholarship example". (You want to know whether receipt of an academic scholarship causes enhanced academic performance among top scholarship test performers.) In an RCT approach you're going to give lots of academic scholarships to lots of randomly-selected people, many of whom are not top performers. That's wasteful. In a regression discontinuity design (RDD) approach ("give scholarships only to top-performers who score above X in the scholarship exam, and compare the performances of students who scored just above and below X"), you don't give any scholarships to non-top performers. So it's not wasteful -- but the resulting inference is highly statistically inefficient. 

"Tie breakers" implement a middle ground: Definitely don't give scholarships to poor performers, definitely do give scholarships to top performers, and randomize for a middle group. So you gain some efficiency relative to pure RDD (but you're a little wasteful), and you're less wasteful than a pure RCT (but you lose some efficiency).

Hence there's an efficiency/wastefulness trade-off, and your location on it depends on the size of the your middle group. Owen and Varian characterize the trade-off and show how to optimize it. Really nice, clean, and useful.

[Sorry but I'm running way behind. I saw Hal present this work a few months ago at a fine ECB meeting on predictive modeling.]

Sunday, September 23, 2018

NBER WP's Hit 25,000



A few weeks ago the NBER released WP25000, What a great NBER service -- there have been 7.6 million downloads of NBER WP's in the last year alone. 


This milestone is 
of both current and historical interest. The history is especially interesting. As Jim Poterba notes in a recent communication:
This morning's "New this Week" email included the release of the 25000th NBER working paper, a study of the intergenerational transmission of human capital by David Card, Ciprian Domnisoru, and Lowell Taylor.  The NBER working paper series was launched in 1973, at the inspiration of Robert Michael, who sought a way for NBER-affiliated researchers to share their findings and obtain feedback prior to publication.  The first working paper was "Education, Information, and Efficiency" by Finis Welch.  The design for the working papers -- which many will recall appeared with yellow covers in the pre-digital age -- was created by H. Irving Forman, the NBER's long-serving chart-maker and graphic artist.
Initially there were only a few dozen working papers per year, but as the number of NBER-affiliated researchers grew, particularly after Martin Feldstein became NBER president in 1977, the NBER working paper series also expanded.  In recent years, there have been about 1150 papers per year.  Over the 45 year history of the working paper series, the Economic Fluctuations and Growth Program has accounted for nearly twenty percent (4916) of the papers, closely followed by Labor Studies (4891) and Public Economics (4877).

Wednesday, September 19, 2018

Wonderful Network Connectedness Piece

Very cool NYT graphics summarizing U.S. Facebook network connectedness.  Check it out:
https://www.nytimes.com/interactive/2018/09/19/upshot/facebook-county-friendships.html?action=click&module=In%20Other%20News&pgtype=Homepage&action=click&module=News&pgtype=Homepage


They get the same result that Kamil Yilmaz and I have gotten for years in our analyses of economic and financial network connectedness:  There is a strong "gravity effect" -- that is, even in the electronic age, physical proximity is the key ingredient to network relationships. See for example:

Maybe not as surprising for facebook friends as for financial institutions (say).  But still... 

Sunday, September 16, 2018

Banque de France’s Open Data Room

See below for announcement of a useful new product from the Bank of France and its Representative Office in New York. 
Banque de France has been for many years at the forefront of disseminating statistical data to academics and other interested parties. Through Banque de France’s dedicated public portal http://webstat.banque-france.fr/en/, we offer a large set of free downloadable series (about 40 000 mainly aggregated series).

Banque de France has expanded further the service provided and launched, in Paris, in November 2016 an “Open Data Room”, providing researchers with a free access to granular data.
We are glad to announce that the “Open Data Room” service is now also available to US researchers through Banque de France Representative Office in New York City.

Saturday, September 15, 2018

An Open Letter to Tren Griffin

[I tried quite hard to email this privately. I post it here only because Griffin has, as far as I can tell, been very successful in scrubbing his email address from the web. Please forward it to him if you can figure out how.]

Mr. Griffin:

A colleague forwarded me your post, https://25iq.com/2018/09/08/risk-uncertainty-and-ignorance-in-investing-and-business-lessons-from-richard-zeckhauser/.  I enjoyed it, and Zeckhauser definitely deserves everyone's highest praise. 

However your post misses the bigger picture.  Diebold, Doherty, and Herring conceptualized and promoted the "Known, Unknown, Unknowable" (KuU) framework for financial risk management, which runs blatantly throughout your twelve "Lessons From Richard Zeckhauser".  Indeed the key Zeckhauser article on which you draw appeared in our book, "The Known, the Unknown and the Unknowable in Financial Risk Management", https://press.princeton.edu/titles/9223.html, which we also conceptualized, and for which we solicited the papers and authors, mentored them as regards integrating their thoughts into the KuU framework, etc.  The book was published almost a decade ago by Princeton University Press. 

I say all this not only to reveal my surprise and annoyance at your apparent unawareness, but also, and more constructively, because you and your readers may be interested in our KuU book, which has many other interesting parts (great as the Zeckhauser part may be), and which
, moreover, is more than the sum of its parts. A pdf of the first chapter has been available for many years at http://assets.press.princeton.edu/chapters/s9223.pdf.

Sincerely,

Friday, September 14, 2018

Machine Learning for Forecast Combination

How could I have forgotten to announce my latest paper, "Machine Learning for Regularized Survey Forecast Combination: Partially-Egalitarian Lasso and its Derivatives"? (Actually a heavily-revised version of an earlier paper, including a new title.) Came out as an NBER w.p. a week or two ago.

Monday, September 10, 2018

Interesting Papers of the Moment

Missing Events in Event Studies: Identifying the Effects of Partially-Measured News Surprises
by Refet S. Guerkaynak, Burcin Kisacikoglu, Jonathan H. Wright #25016 (AP ME)
http://papers.nber.org/papers/w25016?utm_campaign=ntw&utm_medium=email&utm_source=ntw


Colacito, Ric, Bridget Hoffmann, and Toan Phan (2018) “Temperature and growth: A panel
analysis of the United States,”
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2546456

Do You Know That I Know That You Know...? Higher-Order Beliefs in Survey Data
by Olivier Coibion, Yuriy Gorodnichenko, Saten Kumar, Jane Ryngaert #24987 (EFG ME)
http://papers.nber.org/papers/w24987?utm_campaign=ntw&utm_medium=email&utm_source=ntw

Wednesday, September 5, 2018

Google's New Dataset Search Tool

Check out Goolgle's new Datset Search.

Here's the description they issued today:

Making it easier to discover datasets
Natasha Noy
Research Scientist, Google AI
Published Sep 5, 2018

In today's world, scientists in many disciplines and a growing number of journalists live and breathe data. There are many thousands of data repositories on the web, providing access to millions of datasets; and local and national governments around the world publish their data as well. To enable easy access to this data, we launched Dataset Search, so that scientists, data journalists, data geeks, or anyone else can find the data required for their work and their stories, or simply to satisfy their intellectual curiosity.

Similar to how Google Scholar works, Dataset Search lets you find datasets wherever they’re hosted, whether it’s a publisher's site, a digital library, or an author's personal web page. To create Dataset search, we developed guidelines for dataset providers to describe their data in a way that Google (and other search engines) can better understand the content of their pages. These guidelines include  salient information about datasets: who created the dataset, when it was published, how the data was collected, what the terms are for using the data, etc. We then collect and link this information, analyze where different versions of the same dataset might be, and find publications that may be describing or discussing the dataset. Our approach is based on an open standard for describing this information (schema.org) and anybody who publishes data can describe their dataset this way. We encourage dataset providers, large and small, to adopt this common standard so that all datasets are part of this robust ecosystem.

In this new release, you can find references to most datasets in environmental and social sciences, as well as data from other disciplines including government data and data provided by news organizations, such as ProPublica. As more data repositories use the schema.org standard to describe their datasets, the variety and coverage of datasets that users will find in Dataset Search, will continue to grow.

Dataset Search works in multiple languages with support for additional languages coming soon. Simply enter what you are looking for and we will help guide you to the published dataset on the repository provider’s site.

For example, if you wanted to analyze daily weather records, you might try this query in Dataset Search:

[daily weather records]

You’ll see data from NASA and NOAA, as well as from academic repositories such as Harvard's Dataverse and Inter-university Consortium for Political and Social Research (ICPSR). Ed Kearns, Chief Data Officer at NOAA, is a strong supporter of this project and helped NOAA make many of their datasets searchable in this tool. “This type of search has long been the dream for many researchers in the open data and science communities” he said. “And for NOAA, whose mission includes the sharing of our data with others, this tool is key to making our data more accessible to an even wider community of users.”

This launch is one of a series of initiatives to bring datasets more prominently into our products. We recently made it easier to discover tabular data in Search, which uses this same metadata along with the linked tabular data to provide answers to queries directly in search results. While that initiative focused more on news organizations and data journalists, Dataset search can be useful to a much broader audience, whether you're looking for scientific data, government data, or data provided by news organizations.

A search tool like this one is only as good as the metadata that data publishers are willing to provide. We hope to see many of you use the open standards to describe your data, enabling our users to find the data that they are looking for. If you publish data and don't see it in the results, visit our instructions on our developers site which also includes a link to ask questions and provide feedback.

Monday, September 3, 2018

The Coming Storm

The role of time-series statistics / econometrics in climate analyses is expanding (e.g., here).  Related -- albeit focusing on shorter-term meteorological aspects rather than longer-term climatological aspects -- it's worth listening to Michael Lewis' latest, The Coming Storm.  (You have to listen rather than read, as it's only available as an audiobook, but it's only about two hours.)  It's a fascinating story, well researched and well told by Lewis, just as you'd expect.  There are lots of interesting insights on (1) the collection, use, and abuse of public weather data, including ongoing, ethically-dubious, and potentially life-destroying attempts to privatize public weather data for private gain, (2) the clear and massive improvements in weather forecasting in recent decades, (3) behavioral aspects of how best to communicate forecasts so people understand them, believe them, and take appropriate action before disaster strikes.