On Neyman-Pearson optimality of binary neural net classifiers

In classical binary statistical pattern recognition optimality in Neyman-Pearson sense, achieved by a (log) likelihood ratio based classifier, is often desirable. A drawback of a Neyman-Pearson optimal classifier is that it requires full knowledge of the (quotient of the) class-conditional probability densities of the input data, which is often unrealistic. The design of neural net classifiers is data driven, meaning that no explicit use is made of the class-conditional probability densities of the input data. In this paper a proof is presented that a neural net can also be trained to approximate a log-likelihood ratio and be used as a Neyman-Pearson optimal, prior-independent classifier. Properties of the approximation of the log-likelihood ratio are discussed. Examples of neural nets trained on synthetic data with known log-likelihood ratios as ground truth illustrate the results.

[1]  Liwei Wang,et al.  The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Harry L. Van Trees,et al.  Detection, Estimation, and Modulation Theory, Part I , 1968 .

[4]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[5]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[6]  Jian Sun,et al.  Bayesian Face Revisited: A Joint Formulation , 2012, ECCV.

[7]  Javier Ortega-Garcia,et al.  Likelihood Ratio Calibration in a Transparent and Testable Forensic Speaker Recognition Framework , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[8]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[9]  Robert D. Nowak,et al.  A Neyman-Pearson approach to statistical learning , 2005, IEEE Transactions on Information Theory.

[10]  Taiji Suzuki,et al.  Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality , 2018, ICLR.

[11]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[12]  Luuk J. Spreeuwers,et al.  Low-resolution face alignment and recognition using mixed-resolution classifiers , 2017, IET Biom..

[13]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..