|
Saturday, April 30, 2016
SoFiE 2016 Hong Kong
Sunday, April 24, 2016
The Distribution of Global Economic Activity...
... as proxied by the global distribution of nighttime lights (from a fascinating new paper by Hendersen et al.). Like many good graphics, this one repays careful study. You'll see lots of places where the lights match your prior, but you'll also see places that are perhaps "surprisingly" well-lit relative to popular perceptions (e.g., central America), other places that are perhaps surprisingly dark (e.g., most of Russia), fascinating patterns (e.g., look at Europe stretching east into Russia), etc.
Monday, April 18, 2016
On the Real-Time GDP War
A few days ago the WSJ did an interesting piece, Fed Banks Spar Over GDP Data, highlighting that the "race to provide credible real-time data on U.S. economic growth is pitting the Federal Reserve Bank of New York against its sibling in Atlanta."
In all this, real-time data on "economic growth" is interpreted as real-time data on GDP growth.
In my opinion, all of the real-time GDP products basically reflect a misguided perspective if the goal is real-time tracking of economic growth (which is as it should be, and what is claimed). If you want to track real-time growth, you should be tracking an extraction of a broad dynamic factor, effectively averaging over many indicators, not just tracking real-time GDP. That has been the leading and invaluable perspective from Burns and Mitchell straight through to modern dynamic-factor approaches. My favorite, of course, is the FRB Philadelphia's ADS Index, but there are many others.
In all this, real-time data on "economic growth" is interpreted as real-time data on GDP growth.
In my opinion, all of the real-time GDP products basically reflect a misguided perspective if the goal is real-time tracking of economic growth (which is as it should be, and what is claimed). If you want to track real-time growth, you should be tracking an extraction of a broad dynamic factor, effectively averaging over many indicators, not just tracking real-time GDP. That has been the leading and invaluable perspective from Burns and Mitchell straight through to modern dynamic-factor approaches. My favorite, of course, is the FRB Philadelphia's ADS Index, but there are many others.
Wednesday, April 13, 2016
Big Data: Tall, Wide, and Dense
It strikes me that "tall", "wide", and "dense" might be useful words and conceptualizations of aspects of Big Data relevant in time-series econometrics.
Think of a regression situation, with a (T x K) "X matrix" for T "days" (or whatever) of data for each of K variables. Now imagine sampling intra-day, m times per day. Then X is (mT x K). Big data correspond to huge-X situations arising because one or more of T, K, and m is huge. (Of course there will always be subjectivity associated with "how huge is huge".)
T, K, and m are usefully considered separately.
-- As T gets large we have "tall data" (in reference to the tall X matrix, due to the large number of time periods, i.e., the long calendar span of data)
-- As K gets large we have "wide data" (in reference to the wide X matrix due to the large number of regressors)
-- As m gets large we have "dense data" (in reference to the high-frequency intra-day sampling, regardless of whether the data are tall)
A few examples:
-- Consider 2500 days of 1-minute returns for each of 5000 stocks. The data are tall, wide and dense.
-- Consider 25 days of 1-minute returns for each of 50 stocks. The data are dense, but neither tall nor wide.
-- Consider 2500 days of daily returns for each of 5000 stocks. The data are tall and wide, but not dense.
Think of a regression situation, with a (T x K) "X matrix" for T "days" (or whatever) of data for each of K variables. Now imagine sampling intra-day, m times per day. Then X is (mT x K). Big data correspond to huge-X situations arising because one or more of T, K, and m is huge. (Of course there will always be subjectivity associated with "how huge is huge".)
T, K, and m are usefully considered separately.
-- As T gets large we have "tall data" (in reference to the tall X matrix, due to the large number of time periods, i.e., the long calendar span of data)
-- As K gets large we have "wide data" (in reference to the wide X matrix due to the large number of regressors)
A few examples:
-- Consider 2500 days of 1-minute returns for each of 5000 stocks. The data are tall, wide and dense.
-- Consider 25 days of 1-minute returns for each of 50 stocks. The data are dense, but neither tall nor wide.
-- Consider 2500 days of daily returns for each of 5000 stocks. The data are tall and wide, but not dense.
Sunday, April 10, 2016
On "The Human Capital Approach to Inference"
Check out the interesting new paper by Bentley MacLeod at Columbia ("The Human Capital Approach to Inference"), on using economic theory in combination with machine learning to estimate conditional average treatment effects better than can be done with randomized control trials.
Quite apart from new methods for accurate estimation of conditional average treatment effects, the paper's intro contains some interesting tidbits on causal econometric inference. Here's one sequence in yellow, with my reactions:
BM: "There are two distinct approaches to modern empirical economics."
-- The MacLeod paper is exclusively about causal inference, so it should say "two distinct approaches to causal inference in modern empirical economics." Equating causal inference to all of empirical economics is simply wrong. Causal inference is a large and very important part of modern empirical economics, but far from its entirety. The booming field of financial econometrics, for example, is largely and intentionally reduced-form. See this.
BM: "First, there is research using structural models that begins by assuming individuals make utility maximizing decisions within a well defined environment, and then proceeds to measure the value of the unknown parameters..."
-- There is some unsettling truth here. A cynical but not-entirely-false view is that structural causal inference effectively assumes a causal mechanism, known up to a vector of parameters that can be estimated. Big assumption. And of course different structural modelers can make different assumptions and get different results.
BM: "The second approach addresses the self-selection of individuals into different observed treatments or choices by either explicitly randomizing treatments/choices in the context of an experiment...or through the use of a natural experiment that allows for an instrumental variables strategy. There is general agreement that explicit randomization provides one of the cleanest ways to obtain a measure of the effect of choice."
-- There's rarely general agreement about anything in economics. But yes, randomization is arguably the gold standard for causal effect estimation, if and when it can be done credibly.
Quite apart from new methods for accurate estimation of conditional average treatment effects, the paper's intro contains some interesting tidbits on causal econometric inference. Here's one sequence in yellow, with my reactions:
BM: "There are two distinct approaches to modern empirical economics."
-- The MacLeod paper is exclusively about causal inference, so it should say "two distinct approaches to causal inference in modern empirical economics." Equating causal inference to all of empirical economics is simply wrong. Causal inference is a large and very important part of modern empirical economics, but far from its entirety. The booming field of financial econometrics, for example, is largely and intentionally reduced-form. See this.
BM: "First, there is research using structural models that begins by assuming individuals make utility maximizing decisions within a well defined environment, and then proceeds to measure the value of the unknown parameters..."
-- There is some unsettling truth here. A cynical but not-entirely-false view is that structural causal inference effectively assumes a causal mechanism, known up to a vector of parameters that can be estimated. Big assumption. And of course different structural modelers can make different assumptions and get different results.
BM: "The second approach addresses the self-selection of individuals into different observed treatments or choices by either explicitly randomizing treatments/choices in the context of an experiment...or through the use of a natural experiment that allows for an instrumental variables strategy. There is general agreement that explicit randomization provides one of the cleanest ways to obtain a measure of the effect of choice."
-- There's rarely general agreement about anything in economics. But yes, randomization is arguably the gold standard for causal effect estimation, if and when it can be done credibly.
Subscribe to:
Posts (Atom)