Friday, March 15, 2019

Neyman-Pearson Classification

Neyman-Pearson (NP) hypothesis testing insists on fixed asymptotic test size (5%, say) and then takes whatever power it can get. Bayesian hypothesis assessment, in contrast, treats type I and II errors symmetrically, with size approaching 0 and power approaching 1 asymptotically. 

Classification tends to parallel Bayesian hypothesis assessment, again treating type I and II errors symmetrically.  For example, I might do a logit regression and classify cases with fitted P(I=1)<1/2 as group 0 and cases with fitted P(I=1)>1/2 as group 1.  The classification threshold of 1/2 produces a ``Bayes classifier".  

Bayes classifiers seem natural, and in many applications they are.  But an interesting insight is that some classification problems may have hugely different costs of type I and II errors, in which case an NP classification approach may be entirely natural, not clumsy.  (Consider, for example, deciding whether to convict someone of a crime that carries the death penalty.  Many people would view the cost of a false declaration of "guilty" as much greater than the cost of a false "innocent".) 

This leads to the idea and desirability of NP classifiers.  The issue is how to bound the type I classification error probability at some small chosen value.  Obviously it involves moving the classification threshold away from 1/2, but figuring out exactly what to do turns out to be a challenging problem.  Xin Tong and co-authors have made good progress.  Here are some of his papers (from his USC site):
  1. Chen, Y., Li, J.J., and Tong, X.* (2019) Neyman-Pearson criterion (NPC): a model selection criterion for asymmetric binary classification. arXiv:1903.05262.
  2. Tong, X., Xia, L., Wang, J., and Feng, Y. (2018) Neyman-Pearson classification: parametrics and power enhancement. arXiv:1802.02557v3.
  3. Xia, L., Zhao, R., Wu, Y., and Tong, X.* (2018) Intentional control of type I error over unconscious data distortion: a Neyman-Pearson approach to text classification. arXiv:1802.02558.
  4. Tong, X.*, Feng, Y. and Li, J.J. (2018) Neyman-Pearson (NP) classification algorithms and NP receiver operating characteristics (NP-ROC). Science Advances, 4(2):eaao1659.
  5. Zhao, A., Feng, Y., Wang, L., and Tong, X.* (2016) Neyman-Pearson classification under high-dimensional settings. Journal of Machine Learning Research, 17:1−39.
  6. Li, J.J. and Tong, X. (2016) Genomic applications of the Neyman-Pearson classification paradigm. Chapter in Big Data Analytics in Genomics. Springer (New York). DOI: 10.1007/978-3-319-41279-5; eBook ISBN: 978-3-319-41279-5.
  7. Tong, X.*, Feng, Y. and Zhao, A. (2016) A survey on Neyman-Pearson classification and suggestions for future research. Wiley Interdisciplinary Reviews: Computational Statistics, 8:64-81.
  8. Tong, X.* (2013). A plug-in approach to Neyman-Pearson classification. Journal of Machine Learning Research, 14:3011-3040.
  9. Rigollet, P. and Tong, X. (2011) Neyman-Pearson classification, convexity and stochastic constraints. Journal of Machine Learning Research, 12:2825-2849.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.