Saturday, October 19, 2019

Missing data in Factor Models

Serena Ng's site is back. Her new paper with Jushan Bai, "Matrix Completion, Counterfactuals, and Factor Analysis of Missing Data", which I blogged about earlier, is now up on arXiv, here.  Closely related, Marcus Pelger just sent me his new paper with Ruoxuan Xiong, "Large Dimensional Latent Factor Modeling with Missing Observations and Applications to Causal Inference", which I look forward to reading. One is dated Oct 15 and one is dated Oct 16. Science is rushing forward!

Saturday, October 12, 2019

Interval Prediction

Last time I blogged on Serena's amazing presentation from Per's Chicago meeting,
https://fxdiebold.blogspot.com/2019/10/large-dimensional-factor-analysis-with.html

But I was equally blown away by Rina's amazing "Predictive inference with the jackknife+".
Rina Foygel Barber∗ , Emmanuel J. Canes† , Aaditya Ramdas‡ , Ryan J. Tibshirani‡§
https://arxiv.org/pdf/1905.02928.pdf.

Correctly calibrated prediction intervals despite arbitrary model misspecification!

Of course I'm left with lots of questions.  They have nice correct-coverage theorems. What about length?  I would like theorems (not just simulations) as regards shortest length intervals with guaranteed correct coverage. Their results seem to require iid or similar exchangability environments. What about heteroskedastic environments where prediction error variance depends on covariates? What about time series environments?

Then, quite amazingly, "Distributional conformal prediction" by Victor Chernozukov et al., arrived in my mailbox.
https://arxiv.org/pdf/1909.07889.pdf
It is similarly motivated and may address some of my questions.

Anyway, great developments for interval prediction!

Monday, October 7, 2019

Carbon Offsets



At the end of a recently-received request that I submit my receipts from a conference trip last week:
... Let me know if you'd like us to purchase carbon offsets for your miles, and deduct the amount (or any amount) from your reimbursement. We’ll do it on your behalf. Thank You! For example: NYC-Chicago round trip = $5.72. Oxford-Chicago round trip = $35.77. Nantes - Chicago round trip = $37.08 Bergen - Chicago round trip = $35.44
A first for me!  Ironically, I spoke on some of my new climate econometrics work with Glenn Rudebusch. (It was not a climate conference per se, and mine was the only climate paper.)

Sunday, October 6, 2019

Large Dimensional Factor Analysis with Missing Data

Back from the very strong Stevanovich meeting.  Program and abstracts here.  One among many  highlights was:

Large  Dimensional  Factor Analysis  with  Missing Data
Presented by Serena Ng, (Columbia, Dept. of Economics)

Abstract:
This paper introduces two factor-based imputation procedures that will fill
missing values with consistent estimates of the common component. The
first method is applicable when the missing data are bunched. The second
method is appropriate when the data are missing in a staggered or
disorganized manner. Under the strong factor assumption, it is shown that
the low rank component can be consistently estimated but there will be at
least four convergence rates, and for some entries, re-estimation can
accelerate convergence. We provide a complete characterization of the
sampling error without requiring regularization or imposing the missing at
random assumption as in the machine learning literature. The
methodology can be used in a wide range of applications, including
estimation of covariances and counterfactuals.

This paper just blew me away.  Re-arrange the X columns to get all the "complete cases across people" (tall block) in the leftmost columns, and re-arrange the X rows to get all the "complete cases across variables" (wide block) in the topmost rows.  The intersection is the "balanced" block in the upper left.  Then iterate on the tall and wide blocks to impute the missing data in the bottom right "missing data" block. The key figure that illustrates the procedure provided a real "eureka moment" for me.  Plus they have a full asymptotic theory as opposed to just worst-case bounds.

Kudos!

I'm not sure whether the paper is circulating yet, and Serena's web site vanished recently (not her fault -- evidently Google made a massive error), but you'll soon be able to get the paper one way or another.






Sunday, September 29, 2019

Krusell on Economics of Climate

In general I'm not a fan of podcasts -- it takes annoyingly longer to listen than to read -- but if you're interested in climate economics you must hear this Per Krusell gem. It's from 2017 but fresh as ever.  He also gave a mini-course at Penn back in 2017.  So sad I could not go. 

I also like Marty Weitzman's Climate Shock.  Similarly informed and serious.  Quirky and endearing writing style.  Makes a strong case for a low discount rate.


Machine Learning and Big Data

Nice looking meeting coming up this week, "Big Data and Machine Learning in Econometrics, Finance, and Statistics" at U Chicago's Stevanovich Center for Financial Mathematics.  Preliminary program here.


Sunday, September 22, 2019

Temperature Volatility

Temperature level is of course heavily studied, and trending upward alarmingly quickly. In a new paper, Glenn Rudebusch and I study temperature volatility, which has been much less heavily studied. We show that temperature volatility is pervasively trending downward, and that its "twin peaks" seasonal pattern is also evolving, both of which have implications for agriculture and much else. Our analysis is based on the daily temperature range, in precise parallel with the time-honored use of the daily log price range as a volatility (quadratic variation) estimator in financial markets.

Tuesday, September 17, 2019

Econometrics Meeting at Penn

Please consider submitting to this year's Greater New York Area (GNYA) econometrics meeting, hosted this year by Penn (third time, I think).  Always a fine conference.  GNYA is defined VERY broadly. Program and participant list from last year's meeting at Princeton here.  Call for papers for this year's Penn meeting immediately below.

Dear friends,

We are pleased to announce that the University of Pennsylvania will host the Greater New York Metropolitan Area Econometrics Colloquium on Saturday, December 7, 2018.  If you would like to present your work at the colloquium, please send your paper or extended abstract by Sunday, October 27, 2018.  We plan to have the program selected by Friday, November 8th, 2018.

Further information about the colloquium will be posted at


Please feel free to forward this call for papers to your colleagues. As usual, we will not include presentations by graduate students in this short event. Each presentation should last about 30 minutes. We plan to include 8-12 presentations in the program.

Continental breakfast, lunch, and dinner will be provided. We cannot cover travel or accommodations. There are three hotels on campus: The Study at University City (newest, 0.3 mile), Sheraton Philadelphia University City (1 block), Hilton Inn at Penn (1 block). About two miles from campus, there are also many options in center city Philadelphia.

Please send submissions to gnyec19@gmail.com.
Best,
Karun Adusumilli
Xu Cheng
Frank Diebold
Wayne Gao
Frank Schorfheide

Monday, September 9, 2019

Environmental and Energy Policy and the Economy

The first volume from last May's NBER meeting is forthcoming; see https://www.nber.org/books/kotc-1  Marvelously, the meetings and volumes will be ongoing annually.  See below for the 2020 CFP.  Note that submissions from non-NBER researchers are welcome.


NBER Call for Papers / Proposals
2nd Annual NBER Environmental and Energy Policy and the Economy Conference

 
Dear Researchers,
 
We are seeking papers or proposals for the second annual NBER conference/publication on Environmental and Energy Policy and the Economy. We will accept six papers for presentation at the National Press Club in Washington, D.C. on May 21, 2020. The audience will include the professional staffs of government agencies, research institutions, and NGOs focused on energy and environmental policy. The contributed papers will then be published in an annual volume by the University of Chicago Press.
 
To view last year?s agenda and papers for the forthcoming volume, please click 
HERE and HERE.
 
Papers should be relevant to current policy debates and accessible to a professional audience, yet following standard NBER protocol, they should avoid making policy recommendations. While standalone projects are specifically encouraged, we also welcome spinoff projects where authors intend to later submit a more extensive or technical version to a journal, or may have already done so. While no paper should be a duplicate of another paper, alternate versions that put results into a more general, policy relevant context and summarize them in more accessible language are encouraged. This is a great opportunity to communicate research to the policy community.
 
Submissions should be either complete papers or 2-3 page abstracts outlining the intended contribution. Submissions are due by October 14, 2019, and can be uploaded at
 
http://www.nber.org/confsubmit/backend/cfp?id=EEPEs20
 
Submissions from researchers who are not affiliated with the NBER, and from researchers who are from groups that have been historically under-represented in the economics profession, are welcome. The authors of each paper will share an $8,000 honorarium.
 
Decisions about accepted papers will be made by mid-November. Complete drafts of papers will be due in early April 2019.
 
We look forward to hearing from you.
 
Matthew Kotchen
James Stock
Catherine Wolfram

"Economics of Climate Change" Conference

Between the earlier Milan Climate Econometrics meeting (here) and this upcoming November FRBSF meeting (program now available, below), there's a lot of new and stimulating work.  Really nice.


 The Economics of Climate Change
Federal Reserve Bank San Francisco
Janet Yellen Conference Center
November 8, 2019
AGENDA
8:00 a.m. Continental Breakfast
8:45 a.m. Introduction: Mary C. Daly, President, Federal Reserve Bank of San Francisco
9:00 a.m. Session Chair: Glenn D. Rudebusch, Federal Reserve Bank of San Francisco
Labor Supply in a Warmer World: The Impact of Climate Change on the Global Workforce
Presenter: Solomon Hsiang, University of California, Berkeley
Discussant: David Card, University of California, Berkeley
Long-Term Macroeconomic Effects of Climate Change: A Cross-Country Analysis
Presenter: M. Hashem Pesaran, University of Southern California
Discussant: Francis X. Diebold, University of Pennsylvania
10:15 a.m. Break
10:45 a.m. Session Chair: Galina B. Hale, Federal Reserve Bank of San Francisco
Integrated Assessment in a Multi-region World with Multiple Energy Sources and Endogenous Technical Change
Presenter: Conny Olovsson, Central Bank of Sweden (Sveriges Riksbank)
Discussant: Larry Karp, University of California, Berkeley
On the Implications of Pollution for the Measurement of Output, Volatility, and the Natural Interest Rate
Presenter: Nicholas Z. Muller, Carnegie Mellon University
Discussant: Karen Fisher-Vanden, Penn State University
Noon Lunch – Market Street Dining Room, 4th Floor
1:00 p.m. Session Chair: Sylvain Leduc, Federal Reserve Bank of San Francisco
Climate Change Risk
Presenter: Dana Kiku, University of Illinois
Discussant: Michael Bauer, Federal Reserve Bank of San Francisco
Carbon Risk
Presenter: Ryan Riordan, Queens University
Discussant: Harrison Hong, Columbia University
2:15 p.m. Break
2:45 p.m. Session Chair: Òscar Jordà, Federal Reserve Bank of San Francisco
A Run on Oil: Climate Policy, Stranded Assets, and Asset Prices
Presenter: Michael Barnett, Arizona State University
Discussant: Robert Ready, University of Oregon
The Environmental Bias of Trade Policy
Presenter: Joseph S. Shapiro, University of California, Berkeley
Discussant: Katheryn Russ, University of California, Davis
4:00 p.m. Break
4:30 p.m. Session Chair: Glenn D. Rudebusch, Federal Reserve Bank of San Francisco
The Macro Effects of Anticipating Climate Policy
Presenter: Stephie Fried, Arizona State University
Discussant: Tony Smith, Yale University
Climate Change: Macroeconomic Impact and Implications for Monetary Policy
Presenter: Sandra Batten, Bank of England
Discussant: Warwick McKibbin, Australian National University
5:45 p.m. Reception – Market Street Salon, 4th Floor
6:30 p.m. Dinner – Market Street Dining Room, 4th Floor
Introduction: Mary C. Daly, President, Federal Reserve Bank of San Francisco
Speaker: Frank Elderson, Member of the Governing Board, Netherlands Central Bank (De Nederlandsche Bank, DNB) and Chairman, Network for Greening the Financial System (NGFS)

Wednesday, September 4, 2019

Empirical Macro Workshop

Nice program.  Click below for slides.

16th Workshop on Methods and Applications for Dynamic Stochastic General Equilibrium Models

Authors Please upload your paper and slides here.
Jesús Fernández-Villaverde, Frank Schorfheide, Keith Sill, and Giorgio Primiceri, Organizers
October 4-5, 2019
Federal Reserve Bank of Philadelphia
Friday, October 4
8:30 am
Continental Breakfast
9:00 am
Jeffrey R. Campbell, Federal Reserve Bank of Chicago
Filippo Ferroni, Federal Reserve Bank of Chicago
Jonas Fisher, Federal Reserve Bank of Chicago
Leonardo Melosi, Federal Reserve Bank of Chicago
The Limits of Forward Guidance
Discussant:Kristoffer Nimark, Cornell University
10:00 am
Break
10:30 am
Andrew Foerster, Federal Reserve Bank of San Francisco
Andreas Hornstein, Federal Reserve Bank of Richmond
Pierre-Daniel Sarte, Federal Reserve Bank of Richmond
Mark W. Watson, Princeton University and NBER
Aggregate Implications of Changing Sectoral Trends
Discussant:André Kurmann, Drexel University
11:30 am
Christian Matthes, Federal Reserve Bank of Richmond
Felipe Schwartzman, Federal Reserve Bank of Richmond
What Do Sectoral Dynamics Tell Us About the Origins of Business Cycles?
Discussant:Hikaru Saijo, University of California at Santa Cruz
12:30 pm
Lunch
2:00 pm
Keynote Presentation by Michael Woodford, Columbia University and NBER
3:00 pm
Break
3:30 pm
Preston Mui, University of California at Berkeley
Benjamin Schoefer, University of California at Berkeley
The Aggregate Labor Supply Curve at the Extensive Margin: A Reservation Wedge Approach
Discussant:Sergio C. Salgado Ibanez, University of Pennsylvania
4:30 pm
Alexander W. Richter, Federal Reserve Bank of Dallas
Nathaniel A. Throckmorton, William & Mary
Oliver de Groot, University of St Andrews
Valuation Risk Revalued
Discussant:Winston Wei Dou, University of Pennsylvania
5:30 pm
Adjourn and Reception
Saturday, October 5
5:30 am
Continental Breakfast
9:00 am
Gerald Carlino, Federal Reserve Bank of Philadelphia
Thorsten Drautzburg, Federal Reserve Bank of Philadelphia
Robert P. Inman, University of Pennsylvania and NBER
Nicholas Zarra, New York University
Partisan Politics in Fiscal Unions: Evidence from U.S. States
Discussant:Karel Mertens, Federal Reserve Bank of Dallas
10:00 am
Break
10:30 am
Bertille Antoine, Simon Fraser University
Lynda Khalaf, Carleton University
Maral Kichian, University of Ottawa
Zhenjiang Lin, The University of Nottingham Ningbo China
Simulation Based Matching Inference with Applications to DSGE Models
Discussant:Simon Freyaldenhoven, Federal Reserve Bank of Philadelphia
11:30 am
Sophocles Mavroeidis, University of Oxford
Testing for Multiplicity of Equilibria in a Low Interest Rate Environment
Discussant:Mikkel Plagborg-Møller, Princeton University
12:30 pm
Adjourn and Lunch

Monday, September 2, 2019

Hello Again, and More

Sorry my friends, both for being AWOL and for not responding to your kind inquiries in that regard. I took some time off to start some new things in climate econometrics, and simultaneously to introspect. Glad to say I'm back for the duration.

Check out the papers from the fourth annual climate econometrics meeting, which just ended, here. More than fifty papers in two days! Is it selfless generosity or unbridled cruelty? Perhaps a little of both. But seriously, something's happening here.





Monday, May 20, 2019

Climate Change Heterogeneity

One can only go so far in climate econometrics studying time series like the proverbial "global average temperature", just as one can only go so far in macroeconomics with the proverbial "representative agent".  Disaggregation will be key to additional progress, as different people in different places experience different climate "treatments" and different economic outcomes.  The impressive new paper below begins to confront the massive tasks of data collection, manipulation, analysis, and visualization, in the context of a disaggregated analysis of the effects of temperature change on aggregate output.

"Climatic Constraints on Aggregate Economic Output", by Marshall Burke and Vincent Tanutama, NBER Working Paper No. 25779, 2019.

Abstract:  Efficient responses to climate change require accurate estimates of both aggregate damages and where and to whom they occur. While specific case studies and simulations have suggested that climate change disproportionately affects the poor, large-scale direct evidence of the magnitude and origins of this disparity is lacking. Similarly, evidence on aggregate damages, which is a central input into the evaluation of mitigation policy, often relies on country-level data whose accuracy has been questioned. Here we assemble longitudinal data on economic output from over 11,000 districts across 37 countries, including previously nondigitized sources in multiple languages, to assess both the aggregate and distributional impacts of warming temperatures. We find that local-level growth in aggregate output responds non-linearly to temperature across all regions, with output peaking at cooler temperatures (<10°C) than estimated in earlier country analyses and declining steeply thereafter. Long difference estimates of the impact of longer-term (decadal) trends in temperature on income are larger than estimates from an annual panel model, providing additional evidence for growth effects. Impacts of a given temperature exposure do not vary meaningfully between rich and poor regions, but exposure to damaging temperatures is much more common in poor regions. These results indicate that additional warming will exacerbate inequality, particularly across countries, and that economic development alone will be unlikely to reduce damages, as commonly hypothesized. We estimate that since 2000, warming has already cost both the US and the EU at least $4 trillion in lost output, and tropical countries are >5% poorer than they would have been without this warming.

Monday, May 13, 2019

Understanding the Bad News for IV Estimation

In an earlier post I discussed Alwyn Young's bad news for IV estimation, obtained by Monte Carlo. Immediately thereafter, Narayana Kocherlakota sent his new paper, "A Near-Exact Finite Sample Theory for an Instrumental Variable Estimator", which provides complementary analytic insights. Really nice stuff.





Monday, April 15, 2019

Hedging Realized vs. Expected Volatility

Not all conferences can be above average, let alone in the extreme right tail of the distribution, so it's wonderful when it happens, as with last week's AP conference. Fine papers all -- timely, thought provoking, and empirically sophisticated.  Thanks to Jan Eberly and Konstantin Milbradt for assembling the program, here (including links to papers). 

I keep thinking about the Dew-Becker-Giglio-Kelly paper. For returns r, they produce evidence that (1) investors are willing to pay a lot to insure against movements in realized volatility, r^2_{t}, but (2) investors are not willing to pay to insure against movements in expected future realized volatility (conditional variance), E_t(r^2_{t+1} | I_t). On the one hand, as a realized volatility guy I'm really intrigued by (1). On the other hand, it seems hard to reconcile (1) and (2), a concern that was raised at the meeting. On the third hand, maybe it's not so hard.  Hmmm...

Wednesday, April 10, 2019

Bad News for IV Estimation

Alwyn Young has an eye-opening recent paper, "Consistency without Inference: Instrumental Variables in Practical Application".  There's a lot going on worth thinking about in his Monte Carlo:  OLS vs. IV; robust/clustered s.e.'s vs. not; testing/accounting for weak instruments vs. not; jacknife/bootstrap vs. "conventional" inference; etc.  IV as typically implemented comes up looking, well, dubious.

Alwyn's related analysis of published studies is even more striking.  He shows that, in a sample of 1359 IV regressions in 31 papers published in the journals of the American Economic Association,
"... statistically significant IV results generally depend upon only one or two observations or clusters, excluded instruments often appear to be irrelevant, there is little statistical evidence that OLS is actually substantively biased, and IV confidence intervals almost always include OLS point estimates." 
Wow.

Perhaps the high leverage is Alwyn's most striking result, particularly as many empirical economists seem to have skipped class on the day when leverage assessment was taught.  Decades ago, Marjorie Flavin attempted some remedial education in her 1991 paper, "The Joint Consumption/Asset Demand Decision: A Case Study in Robust Estimation".  She concluded that
"Compared to the conventional results, the robust instrumental variables estimates are more stable across different subsamples, more consistent with the theoretical specification of the model, and indicate that some of the most striking findings in the conventional results were attributable to a single, highly unusual observation." 
Sound familiar?  The non-robustness of conventional IV seems disturbingly robust, from Flavin to Young.

Flavin's paper evidently fell on deaf ears and remains unpublished. Hopefully Young's will not meet the same fate.

Monday, April 8, 2019

Identification via the ZLB and More

Sophocles Mavroeidis at Oxford has a very nice paper on using the nominal interest rate zero lower bound (ZLB) to identify VAR's.  Effectively, hitting the ZLB is a form of (endogenous) structural change that can be exploited for identification.  He has results showing whether/when one has point identification, set identification, or no identification. Really good stuff.

An interesting question is whether there may be SETS of bounds that may be hit. Suppose so, and suppose that we don't know whether/when they'll be hit, but we do know that if/when one bound is hit, all bounds are hit. An example might be nominal short rates in two countries with tightly-integrated money markets.

Now recall the literature on testing for multivariate structural change, which reveals large power increases in such situations (Bai, Lumsdaine and Stock). In Sophocles' case, it suggests the potential for greatly sharpened set ID.  Of course it all depends on the truth/relevance of my supposition...




Friday, April 5, 2019

Inference with Social Network Dependence

I'm running behind as usual. I meant to post this right after the seminar, about two weeks ago.  Really interesting stuff -- spatial correlation due to network dependence.  A Google search will find the associated paper(s) instantly. Again, really good stuff.  BUT I would humbly suggest that the biostat people need to read more econometrics. A good start is this survey (itself four years old, and distilled for practitioners as the basic insights were known/published decades ago). The cool question moving forward is whether/when/how network structure can be used to determine/inform clustering.


Elizabeth L. Ogburn
Department of Biostatistics
Johns Hopkins University

Social Network dependence,
the replication crisis, and (in)valid inference

                                                               ABSTRACT
In the first part of this talk, I will show that social network structure can result in a new kind of structural confounding, confounding by network structure, potentially contributing to replication crises across the health and social sciences.  Researchers in these fields frequently sample subjects from one or a small number of communities, schools, hospitals, etc., and while many of the limitations of such convenience samples are well-known, the issue of statistical dependence due to social network ties has not previously been addressed. A paradigmatic example of this is the Framingham Heart Study (FHS). Using a statistic that we adapted to measure network dependence, we test for network dependence and for possible confounding by network structure in several of the thousands of influential papers published using FHS data. Results suggest that some of the many decades of research on coronary heart disease, other health outcomes, and peer influence using FHS data may be biased (away from the null) and anticonservative due to unacknowledged network structure.

But data with network dependence abounds, and in many settings researchers are explicitly interested in learning about social network dynamics.  Therefore, there is high demand for methods for causal and statistical inference with social network data. In the second part of the talk, I will describe recent work on causal inference for observational data from a single social network, focusing on (1) new types of causal estimands that are of interest in social network settings, and (2) conditions under which central limit theorems hold and inference based on approximate normality is licensed.

Monday, March 25, 2019

Ensemble Methods for Causal Prediction

Great to see ensemble learning methods (i.e., forecast combination) moving into areas of econometrics beyond time series / macro-econometrics, where they have thrived ever since Bates and Granger (1969), generating a massive and vibrant literature.  (For a recent contribution, including historical references, see Diebold and Shin, 2019.)  In particular, the micro-econometric / panel / causal literature is coming on board.  See for example this new and interesting paper by Susan Athey et al.

Saturday, March 23, 2019

Monday, March 18, 2019

Alan Krueger RIP

Very sad to report that Alan Krueger has passed away.  He was a tremendously gifted empirical economist, with a fine feel for identifying issues that were truly important, and for designing novel and powerful empirical strategies to address them.

The Housing Risk Premium is Huge

Earlier I blogged on Jorda et al.'s fascinating paper, "The Rate of Return on Everything".  Now they're putting their rich dataset to good use.  Check out the new paper, NBER w.p. 25653.

The Total Risk Premium Puzzle
Òscar Jordà, Moritz Schularick, and Alan M. Taylor

Abstract:
The risk premium puzzle is worse than you think. Using a new database for the U.S. and 15 other advanced economies from 1870 to the present that includes housing as well as equity returns (to capture the full risky capital portfolio of the representative agent), standard calculations using returns to total wealth and consumption show that: housing returns in the long run are comparable to those of equities, and yet housing returns have lower volatility and lower covariance with consumption growth than equities. The same applies to a weighted total-wealth portfolio, and over a range of horizons. As a result, the implied risk aversion parameters for housing wealth and total wealth are even larger than those for equities, often by a factor of 2 or more. We find that more exotic models cannot resolve these even bigger puzzles, and we see little role for limited participation, idiosyncratic housing risk, transaction costs, or liquidity premiums. 

Friday, March 15, 2019

Neyman-Pearson Classification

Neyman-Pearson (NP) hypothesis testing insists on fixed asymptotic test size (5%, say) and then takes whatever power it can get. Bayesian hypothesis assessment, in contrast, treats type I and II errors symmetrically, with size approaching 0 and power approaching 1 asymptotically. 

Classification tends to parallel Bayesian hypothesis assessment, again treating type I and II errors symmetrically.  For example, I might do a logit regression and classify cases with fitted P(I=1)<1/2 as group 0 and cases with fitted P(I=1)>1/2 as group 1.  The classification threshold of 1/2 produces a ``Bayes classifier".  

Bayes classifiers seem natural, and in many applications they are.  But an interesting insight is that some classification problems may have hugely different costs of type I and II errors, in which case an NP classification approach may be entirely natural, not clumsy.  (Consider, for example, deciding whether to convict someone of a crime that carries the death penalty.  Many people would view the cost of a false declaration of "guilty" as much greater than the cost of a false "innocent".) 

This leads to the idea and desirability of NP classifiers.  The issue is how to bound the type I classification error probability at some small chosen value.  Obviously it involves moving the classification threshold away from 1/2, but figuring out exactly what to do turns out to be a challenging problem.  Xin Tong and co-authors have made good progress.  Here are some of his papers (from his USC site):
  1. Chen, Y., Li, J.J., and Tong, X.* (2019) Neyman-Pearson criterion (NPC): a model selection criterion for asymmetric binary classification. arXiv:1903.05262.
  2. Tong, X., Xia, L., Wang, J., and Feng, Y. (2018) Neyman-Pearson classification: parametrics and power enhancement. arXiv:1802.02557v3.
  3. Xia, L., Zhao, R., Wu, Y., and Tong, X.* (2018) Intentional control of type I error over unconscious data distortion: a Neyman-Pearson approach to text classification. arXiv:1802.02558.
  4. Tong, X.*, Feng, Y. and Li, J.J. (2018) Neyman-Pearson (NP) classification algorithms and NP receiver operating characteristics (NP-ROC). Science Advances, 4(2):eaao1659.
  5. Zhao, A., Feng, Y., Wang, L., and Tong, X.* (2016) Neyman-Pearson classification under high-dimensional settings. Journal of Machine Learning Research, 17:1−39.
  6. Li, J.J. and Tong, X. (2016) Genomic applications of the Neyman-Pearson classification paradigm. Chapter in Big Data Analytics in Genomics. Springer (New York). DOI: 10.1007/978-3-319-41279-5; eBook ISBN: 978-3-319-41279-5.
  7. Tong, X.*, Feng, Y. and Zhao, A. (2016) A survey on Neyman-Pearson classification and suggestions for future research. Wiley Interdisciplinary Reviews: Computational Statistics, 8:64-81.
  8. Tong, X.* (2013). A plug-in approach to Neyman-Pearson classification. Journal of Machine Learning Research, 14:3011-3040.
  9. Rigollet, P. and Tong, X. (2011) Neyman-Pearson classification, convexity and stochastic constraints. Journal of Machine Learning Research, 12:2825-2849.

Machine Learning and Alternative Data for Predicting Economic Indicators

I discussed an interesting paper by Chen et al. today at the CRIW.  My slides are here.

Wednesday, March 6, 2019

Significance Testing as a Noise Amplifier

See this insightful post on why statistical significance testing is effectively a noise amplifier. I find it interesting along the lines of "something not usually conceptualized in terms of XX is revealed to be very much about XX".  In this case XX is noise amplification / reduction.  Like many good insights, it seems obvious ex post, but no one recognized it before the "eureka moment".

So significance testing is really a filter:  The input is data and the output is an accept/reject decision for some hypothesis.  But what a non-linear, imprecisely-defined, filter -- we're a long way from looking at the gain functions of simple linear filters as in classical frequency-domain filter analysis!

See also this earlier post on significance testing.

Sunday, March 3, 2019

Standard Errors for Things that Matter

Many times in applied / empirical seminars I have seen something like this:

The paper estimates a parameter vector b and dutifully reports asymptotic s.e.'s.  But then the ultimate object of interest turns out not to be b, but rather some nonlinear but continuous function of the elements of b, say c = f(b). So the paper calculates and reports an estimate of c as c_hat = f(b_hat).  Fine, insofar as c_hat is consistent if b_hat is consistent.  But then the paper forgets to calculate an asymptotic s.e. for c_hat.

So c is the object of interest, and hundreds, maybe thousands, of person-hours are devoted to producing a point estimate of c, but then no one remembers (cares?) to assess its estimation uncertainty.  Geez.  Of course one could do delta method, simulation, etc.

Monday, February 25, 2019

Big Data for 21st Century Economic Statistics


I earlier posted here when the call for papers was announced for the NBER's CRIW meeting on Big Data for 21st Century Economic Statistics. The wheels have been turning, and the meeting will soon transpire. The program is here, with links to papers. [For general info on the CRIW's impressive contributions over the decades, see here.]

Wednesday, February 20, 2019

Modified CRLB with Differential Privacy

It turns out that with differential privacy the Cramer-Rao lower bound (CRLB) is not achievable (too bad for MLE), but you can figure out what *is* achievable, and find estimators that do the trick. (See the interesting talk here by Feng Ruan, and the associated papers on his web site.) The key point is that estimation efficiency is degraded by privacy. The new frontier seems to me to be this: Let's go beyond stark "privacy" or "no privacy" situations, because in reality there is a spectrum of "epsilon-strengths" of "epsilon-differential" privacy.  (Right?)  Then there is a tension: I like privacy, but I also like estimation efficiency, and the two trade off against each other. So there is a choice to be made, and the optimum depends on preferences.

Tuesday, February 19, 2019

Berk-Nash Equilibrium and Pseudo MLE

The Berk-White statistics/econometrics tradition is alive and well, appearing now as Berk-Nash equilibrium in cutting-edge economic theory. See for example Kevin He's Harvard job-market paper here and the references therein, and the slides from yesterday's lunch talk by my Penn colleague Yuichi Yamamoto. But the connection between Berk-Nash equilibrium of economic theory and KLIC-minimizing pseudo-MLE of econometric theory is under-developed. When the Berk-Nash people get better acquainted with Berk-White people, good things may happen. Effectively Yuichi is pushing in that direction, working toward characterizing log-run behavior of likelihood maximizers rather than beliefs.