How to make a neural network say "Don't know"

Abstract Despite various advantages, due to improper training, sometimes a Multi-Layer Perceptron (MLP) classifies a test data point far from the training data into a completely irrelevant class. On the other hand, for example, when we train an MLP to distinguish between four types of childhood cancers (neuroblastoma, rhabdomyosarcoma, non-Hodgkin lymphomas, and Ewing sarcoma) using gene expression profiles and test it on some other kind of cancer (or normal patient) data, the test data will be misclassified to one of the four classes. These unexpected situations arise due to the “open world” nature of the problem. Such problems exist with many other learning systems. We want to address these problems by equipping the network with the ability to not make any judgment when it should not. We have developed an algorithm to provide a practical solution to this problem. We first estimate the domain of the training data (sampling window). We show consistency of our estimate along with some other interesting results. An MLP should say “Don’t know” if a test point is outside the sampling window. To realize this, we generate observations from the complement region and label them to a new class “Don’t know”. We train a network with training data along with the generated data to get a classifier that supports the extra class. The problem of generating a large number of points from the complement region is dealt with a novel scheme that exploits the input sensitivity of the system on the complement region via a regularization. We study the effectiveness of our proposed method with several data sets.

[1]  Nils-Bastian Heidenreich,et al.  Bandwidth Selection Methods for Kernel Density Estimation - A Review of Performance , 2010 .

[2]  Antonio Cuevas,et al.  Set estimation and nonparametric detection , 2000 .

[3]  Thien M. Ha,et al.  The Optimum Class-Selective Rejection Rule , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Richard Lippmann,et al.  Review of Neural Networks for Speech Recognition , 1989, Neural Computation.

[5]  Terrance E. Boult,et al.  Probability Models for Open Set Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  D. Chakraborty,et al.  Making a multilayered perceptron network say - "don't know" when it should , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[7]  A. Tsybakov On nonparametric estimation of density level sets , 1997 .

[8]  Artur Gramacki,et al.  FFT-based fast bandwidth selector for multivariate kernel density estimation , 2015, Comput. Stat. Data Anal..

[9]  D. M. Titterington,et al.  Neural Networks: A Review from a Statistical Perspective , 1994 .

[10]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[11]  G. Debreu,et al.  Theory of Value , 1959 .

[12]  S. Hashem,et al.  Sensitivity analysis for feedforward artificial neural networks with differentiable activation functions , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[13]  Kurt Hornik,et al.  Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[14]  Hakan Cevikalp,et al.  Efficient object detection using cascades of nearest convex model classifiers , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  C. K. Chow,et al.  An optimum character recognition system using decision functions , 1957, IRE Trans. Electron. Comput..

[16]  Yee-Hong Yang,et al.  Classifier design with incomplete knowledge , 1998, Pattern Recognit..

[17]  B. Cadre Kernel estimation of density level sets , 2005, math/0501221.

[18]  Bernard Dubuisson,et al.  A statistical decision rule with incomplete knowledge about classes , 1993, Pattern Recognit..

[19]  Jan Paul Siebert,et al.  Vehicle Recognition Using Rule Based Methods , 1987 .

[20]  Chong-Ho Choi,et al.  Sensitivity analysis of multilayer perceptron with differentiable activation functions , 1992, IEEE Trans. Neural Networks.

[21]  Witold Andrzejewski,et al.  Graphics processing units in acceleration of bandwidth selection for kernel density estimation , 2013, Int. J. Appl. Math. Comput. Sci..

[22]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[23]  Philip S. Yu,et al.  A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions , 2007, SDM.

[24]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[25]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[26]  Arlene K. H. Kim,et al.  Global rates of convergence in log-concave density estimation , 2014, 1404.2298.

[27]  Daniel S. Yeung,et al.  Sensitivity analysis of multilayer perceptron to input and weight perturbations , 2001, IEEE Trans. Neural Networks.

[28]  Terrance E. Boult,et al.  Multi-class Open Set Recognition Using Probability of Inclusion , 2014, ECCV.

[29]  W. Polonik Measuring Mass Concentrations and Estimating Density Contour Clusters-An Excess Mass Approach , 1995 .

[30]  L. Fu,et al.  Sensitivity analysis for input vector in multilayer feedforward neural networks , 1993, IEEE International Conference on Neural Networks.

[31]  Anderson Rocha,et al.  Toward Open Set Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[33]  Nikhil R. Pal,et al.  A novel training scheme for multilayered perceptrons to realize proper generalization and incremental learning , 2003, IEEE Trans. Neural Networks.

[34]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[35]  C. K. Chow,et al.  On optimum recognition error and reject tradeoff , 1970, IEEE Trans. Inf. Theory.

[36]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[37]  Nicholas C. Yannelis,et al.  Set-Valued Functions of Two Variables in Economic Theory , 1991 .

[38]  Zsolt Tulassay,et al.  Application of neural networks in medicine - a review , 1998 .

[39]  Nicholas C. Yannelis,et al.  On a Market Equilibrium Theorem with an Infinite Number of Commodities , 1985 .

[40]  Thomas L. Dean,et al.  Neural Networks and Neuroscience-Inspired Computer Vision , 2014, Current Biology.

[41]  Nils-Bastian Heidenreich,et al.  Bandwidth selection for kernel density estimation: a review of fully automatic selectors , 2013, AStA Advances in Statistical Analysis.

[42]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[43]  Stephen Dorling,et al.  Meteorologically adjusted trends in UK daily maximum surface ozone concentrations , 2000 .