Kernel Whitening for One-Class Classification

In one-class classification one tries to describe a class of target data and to distinguish it from all other possible outlier objects. Obvious applications are areas where outliers are very diverse or very difficult or expensive to measure, such as in machine diagnostics or in medical applications. In order to have a good distinction between the target objects and the outliers, good representation of the data is essential. The performance of many one-class classifiers critically depends on the scaling of the data and is often harmed by data distributions in (non-linear) subspaces. This paper presents a simple preprocessing method which actively tries to map the data to a spherical symmetric cluster and is almost insensitive to data distributed in subspaces. It uses techniques from Kernel PCA to rescale the data in a kernel feature space to unit variance. This transformed data can now be described very well by the Support Vector Data Description, which basically fits a hypersphere around the data. The paper presents the methods and some preliminary experimental results.

[1]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[2]  Christopher M. Bishop,et al.  Novelty detection and neural network validation , 1994 .

[3]  Nathalie Japkowicz,et al.  A Novelty Detection Approach to Classification , 1995, IJCAI.

[4]  Christopher M. Bishop,et al.  Neural Network for Pattern Recognition , 1995 .

[5]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[6]  Michael Brady,et al.  Novelty detection for the identification of masses in mammograms , 1995 .

[7]  Don R. Hush,et al.  Network constraints and multi-objective optimization for one-class classification , 1996, Neural Networks.

[8]  Sung-Bae Cho,et al.  Recognition of unconstrained handwritten numerals by doubly self-organizing neural network , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[9]  Gunter Ritter,et al.  Outliers in statistical pattern recognition and an application to automatic chromosome classification , 1997, Pattern Recognit. Lett..

[10]  Cecilia Surace,et al.  A Novelty Detection Approach to Diagnose Damage in a Cracked Beam , 1997 .

[11]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[12]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[13]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[14]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[15]  Bernhard Schölkopf,et al.  SV Estimation of a Distribution's Support , 1999, NIPS 1999.

[16]  Colin Campbell,et al.  A Linear Programming Approach to Novelty Detection , 2000, NIPS.

[17]  David M. J. Tax,et al.  One-class classification , 2001 .

[18]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .