Wednesday, September 19, 2018

Wonderful Network Connectedness Piece

Very cool NYT graphics summarizing U.S. Facebook network connectedness.  Check it out:
https://www.nytimes.com/interactive/2018/09/19/upshot/facebook-county-friendships.html?action=click&module=In%20Other%20News&pgtype=Homepage&action=click&module=News&pgtype=Homepage

They get the same result that Kamil Yilmaz and I have gotten for years in our analyses of economic and financial network connectedness:  There is a strong "gravity effect" -- that is, even in the electronic age, physical proximity is the key ingredient to network relationships. See for example:

Maybe not as surprising for facebook friends as for financial institutions (say).  But still... 

Sunday, September 16, 2018

Banque de France’s Open Data Room

See below for announcement of a useful new product from the Bank of France and its Representative Office in New York. 
Banque de France has been for many years at the forefront of disseminating statistical data to academics and other interested parties. Through Banque de France’s dedicated public portal http://webstat.banque-france.fr/en/, we offer a large set of free downloadable series (about 40 000 mainly aggregated series).

Banque de France has expanded further the service provided and launched, in Paris, in November 2016 an “Open Data Room”, providing researchers with a free access to granular data.
We are glad to announce that the “Open Data Room” service is now also available to US researchers through Banque de France Representative Office in New York City.

Saturday, September 15, 2018

An Open Letter to Tren Griffin

[I tried quite hard to email this privately. I post it here only because Griffin has, as far as I can tell, been very successful in scrubbing his email address from the web. Please forward it to him if you can figure out how.]

Mr. Griffin:

A colleague forwarded me your post, https://25iq.com/2018/09/08/risk-uncertainty-and-ignorance-in-investing-and-business-lessons-from-richard-zeckhauser/.  I enjoyed it, and Zeckhauser definitely deserves everyone's highest praise. 

However your post misses the bigger picture.  Diebold, Doherty, and Herring conceptualized and promoted the "Known, Unknown, Unknowable" (KuU) framework for financial risk management, which runs blatantly throughout your twelve "Lessons From Richard Zeckhauser".  Indeed the key Zeckhauser article on which you draw appeared in our book, "The Known, the Unknown and the Unknowable in Financial Risk Management", https://press.princeton.edu/titles/9223.html, which we also conceptualized, and for which we solicited the papers and authors, mentored them as regards integrating their thoughts into the KuU framework, etc.  The book was published almost a decade ago by Princeton University Press. 

I say all this not only to reveal my surprise and annoyance at your apparent unawareness, but also, and more constructively, because you and your readers may be interested in our KuU book, which has many other interesting parts (great as the Zeckhauser part may be), and which
, moreover, is more than the sum of its parts. A pdf of the first chapter has been available for many years at http://assets.press.princeton.edu/chapters/s9223.pdf.

Sincerely,

Friday, September 14, 2018

Machine Learning for Forecast Combination

How could I have forgotten to announce my latest paper, "Machine Learning for Regularized Survey Forecast Combination: Partially-Egalitarian Lasso and its Derivatives"? (Actually a heavily-revised version of an earlier paper, including a new title.) Came out as an NBER w.p. a week or two ago.

Monday, September 10, 2018

Interesting Papers of the Moment

Missing Events in Event Studies: Identifying the Effects of Partially-Measured News Surprises
by Refet S. Guerkaynak, Burcin Kisacikoglu, Jonathan H. Wright #25016 (AP ME)
http://papers.nber.org/papers/w25016?utm_campaign=ntw&utm_medium=email&utm_source=ntw


Colacito, Ric, Bridget Hoffmann, and Toan Phan (2018) “Temperature and growth: A panel
analysis of the United States,”
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2546456

Do You Know That I Know That You Know...? Higher-Order Beliefs in Survey Data
by Olivier Coibion, Yuriy Gorodnichenko, Saten Kumar, Jane Ryngaert #24987 (EFG ME)
http://papers.nber.org/papers/w24987?utm_campaign=ntw&utm_medium=email&utm_source=ntw

Wednesday, September 5, 2018

Google's New Dataset Search Tool

Check out Goolgle's new Datset Search.

Here's the description they issued today:

Making it easier to discover datasets
Natasha Noy
Research Scientist, Google AI
Published Sep 5, 2018

In today's world, scientists in many disciplines and a growing number of journalists live and breathe data. There are many thousands of data repositories on the web, providing access to millions of datasets; and local and national governments around the world publish their data as well. To enable easy access to this data, we launched Dataset Search, so that scientists, data journalists, data geeks, or anyone else can find the data required for their work and their stories, or simply to satisfy their intellectual curiosity.

Similar to how Google Scholar works, Dataset Search lets you find datasets wherever they’re hosted, whether it’s a publisher's site, a digital library, or an author's personal web page. To create Dataset search, we developed guidelines for dataset providers to describe their data in a way that Google (and other search engines) can better understand the content of their pages. These guidelines include  salient information about datasets: who created the dataset, when it was published, how the data was collected, what the terms are for using the data, etc. We then collect and link this information, analyze where different versions of the same dataset might be, and find publications that may be describing or discussing the dataset. Our approach is based on an open standard for describing this information (schema.org) and anybody who publishes data can describe their dataset this way. We encourage dataset providers, large and small, to adopt this common standard so that all datasets are part of this robust ecosystem.

In this new release, you can find references to most datasets in environmental and social sciences, as well as data from other disciplines including government data and data provided by news organizations, such as ProPublica. As more data repositories use the schema.org standard to describe their datasets, the variety and coverage of datasets that users will find in Dataset Search, will continue to grow.

Dataset Search works in multiple languages with support for additional languages coming soon. Simply enter what you are looking for and we will help guide you to the published dataset on the repository provider’s site.

For example, if you wanted to analyze daily weather records, you might try this query in Dataset Search:

[daily weather records]

You’ll see data from NASA and NOAA, as well as from academic repositories such as Harvard's Dataverse and Inter-university Consortium for Political and Social Research (ICPSR). Ed Kearns, Chief Data Officer at NOAA, is a strong supporter of this project and helped NOAA make many of their datasets searchable in this tool. “This type of search has long been the dream for many researchers in the open data and science communities” he said. “And for NOAA, whose mission includes the sharing of our data with others, this tool is key to making our data more accessible to an even wider community of users.”

This launch is one of a series of initiatives to bring datasets more prominently into our products. We recently made it easier to discover tabular data in Search, which uses this same metadata along with the linked tabular data to provide answers to queries directly in search results. While that initiative focused more on news organizations and data journalists, Dataset search can be useful to a much broader audience, whether you're looking for scientific data, government data, or data provided by news organizations.

A search tool like this one is only as good as the metadata that data publishers are willing to provide. We hope to see many of you use the open standards to describe your data, enabling our users to find the data that they are looking for. If you publish data and don't see it in the results, visit our instructions on our developers site which also includes a link to ask questions and provide feedback.

Monday, September 3, 2018

The Coming Storm

The role of time-series statistics / econometrics in climate analyses is expanding (e.g., here).  Related -- albeit focusing on shorter-term meteorological aspects rather than longer-term climatological aspects -- it's worth listening to Michael Lewis' latest, The Coming Storm.  (You have to listen rather than read, as it's only available as an audiobook, but it's only about two hours.)  It's a fascinating story, well researched and well told by Lewis, just as you'd expect.  There are lots of interesting insights on (1) the collection, use, and abuse of public weather data, including ongoing, ethically-dubious, and potentially life-destroying attempts to privatize public weather data for private gain, (2) the clear and massive improvements in weather forecasting in recent decades, (3) behavioral aspects of how best to communicate forecasts so people understand them, believe them, and take appropriate action before disaster strikes. 

Monday, August 27, 2018

Long Memory / Scaling Laws in Return Volatility

The 25-year accumulation of evidence for long memory / fractional integration / self-similarity / scaling laws in financial asset return volatility continues unabated.  For the latest see this nice new paper from Bank of Portugal, in particular its key Table 6. Of course the interval estimates of the fractional integration parameter "d" are massively far from both 0 and 1 -- that's the well-known long memory. But what's new and interesting is the systematic difference in the intervals depending on whether one uses absolute or range-based volatility. The absolute d intervals tend to be completely below 1/2 (0<d<1/2 corresponds to covariance-stationary dynamics), whereas the range-based d intervals tend to include 1/2 (1/2<d<1 corresponds to mean-reverting but not covariance- stationary dynamics, due to infinite unconditional variance). 

Realized vol based on the range is less noisy than realized vol based on absolute returns. But least noisy of all, and not considered in the paper above, is realized vol calculated directly from high-frequency return data (HFD-vol), as done by numerous authors in recent decades. Interestingly, recent work for HFD-vol also reports d intervals that tend to poke above 1/2. See this earlier post.

Monday, August 20, 2018

More on the New U.S. GDP Series

BEA's new publishing of NSA GDP is a massive step forward. Now it should take one more step, if it insists on continuing to publish SA GDP.

Publishing only indirect SA GDP ("adjust the components and add them up") lends it an undeserved "official" stamp of credibility, so BEA should also publish a complementary official direct SA GDP ("adjust the aggregate directly"), which is now possible. 

This is a really big deal. Real GDP is undoubtedly the most important data series in all of macroeconomics, and indirect vs. direct SA GDP growth estimates can differ greatly. Their average absolute deviation is about one percent, and average real GDP growth itself is only about two percent! And which series you use has large implications for important issues, such as the widely-discussed puzzle of weak first-quarter growth (see Rudebusch et al.), among other things.

How do we know all this about properties of indirect vs. direct SA GDP growth estimates, since BEA doesn't provide direct SA GDP? You can now take the newly-provided NSA GDP and directly adjust it yourself. See Jonathan Wright's wonderful new paper. (Ungated version here.)

Of course direct SA has many issues of its own.  Ultimately significant parts of both direct and indirect SA GDP are likely spurious artifacts of various direct vs. indirect SA assumptions / methods. 

So another, more radical, idea, is simply to stop publishing SA GDP in any form, instead publishing only NSA GDP (and its NSA components). Sound crazy? Why, exactly? Are official government attempts to define and remove "seasonality" any less dubious than, say, official attempts to define and remove "trend"? (The latter is, mercifully, not attempted...)

Tuesday, August 7, 2018

Factor Model w Time-Varying Loadings

Markus Pelger has a nice paper on factor modeling with time-varying loadings in high dimensions. There are many possible applications. He applies it to level-slope-curvature yield-curve models. 

For me another really interesting application would be measuring connectedness in financial markets, as a way of tracking systemic risk. The Diebold-Yilmaz (DY) connectedness framework is based on a high-dimensional VAR with time-varying coefficients, but not factor structure. An obvious alternative in financial markets, which we used to discuss a lot but never pursued, is factor structure with time-varying loadings, exactly in Pelger! 

It would seem, however, that any reasonable connectedness measure in a factor environment would need to be based not only time-varying loadings but also time-varying idiosynchratic shock variances, or more precisely a time-varying noise/signal ratio (e.g., in a 1-factor model, the ratio of the idiosyncratic shock variance to the factor innovation variance). That is, connectedness in factor environments is driven by BOTH the size of the loadings on the factor(s) AND the amount of variation in the data explained by the factor(s). Time-varying loadings don't really change anything if the factors are swamped by massive noise. 

Typically one might fix the factor innovation variance for identification, but allow for time-varying idiosyncratic shock variance in addition to time-varying factor loadings. It seems that Pelger's framework does allow for that. Crudely, and continuing the 1-factor example, consider y_t  =  lambda_t  f_t  +  e_t. His methods deliver estimates of the time series of loadings lambda_t and factor f_t, robust to heteroskedasticity in the idiosyncratic shock e_t. Then in a second step one could back out an estimate of the time series of e_t and fit a volatility model to it. 
Then the entire system would be estimated and one could calculate connectedness measures based, for example, on variance decompositions as in the DY framework