Monday, February 5, 2018

Big Data, Machine Learning, and Economic Statistics

Greetings from a very happy Philadelphia celebrating the Eagles' victory!

The following is adapted from the "background" and "purpose" statements for a planned 2019 NBER/CRIW conference, "Big Data for 21st Century Economic Statistics". Prescient and fascinating reading. (The full call for papers is here.)

Background: The coming decades will witness significant changes in the production of the social and economic statistics on which government officials, business decision makers, and private citizens rely. The statistical information currently produced by the federal statistical agencies rests primarily on “designed data” -- that is, data collected through household and business surveys. The increasing cost of fielding these surveys, the difficulty of obtaining survey responses, and questions about the reliability of some of the information collected, have raised questions about the sustainability of that model. At the same time, the potential for using “big data” -- very large data sets built to meet governments’ and businesses’ administrative and operational needs rather than for statistical purposes -- in the production of official statistics has grown.

These naturally-occurring data include not only administrative data maintained by government agencies but also scanner data, data scraped from the Web, credit card company records, data maintained by payroll providers, medical records, insurance company records, sensor data, and the Internet of Things. If the challenges associated with their use can be satisfactorily resolved, these emerging sorts of data could allow the statistical agencies not only to supplement or replace the survey data on which they currently depend, but also to introduce new statistics that are more granular, more up-to-date, and of higher quality than those currently being produced.

Purpose: The purpose of this conference is to provide a forum where economists, data providers, and data analysts can meet to present research on the use of big data in the production of federal social and economic statistics. Among other things, this involves discussing (1) Methods for combining multiple data sources, whether they be carefully designed surveys or experiments, large government administrative datasets, or private sector big data, to produce economic and social statistics; (2) Case studies illustrating how big data can be used to improve or replace existing statistical data series or create new statistical data series; (3) Best practices for characterizing the quality of big data sources and blended estimates constructed using data from multiple sources.