Classification in the presence of class noise using a probabilistic Kernel Fisher method

In machine learning, class noise occurs frequently and deteriorates the classifier derived from the noisy data set. This paper presents two promising classifiers for this problem based on a probabilistic model proposed by Lawrence and Scholkopf (2001). The proposed algorithms are able to tolerate class noise, and extend the earlier work of Lawrence and Scholkopf in two ways. First, we present a novel incorporation of their probabilistic noise model in the Kernel Fisher discriminant; second, the distribution assumption previously made is relaxed in our work. The methods were investigated on simulated noisy data sets and a real world comparative genomic hybridization (CGH) data set. The results show that the proposed approaches substantially improve standard classifiers in noisy data sets, and achieve larger performance gain in non-Gaussian data sets and small size data sets.

[1]  Tony R. Martinez,et al.  Instance Pruning Techniques , 1997, ICML.

[2]  Belur V. Dasarathy,et al.  Nosing Around the Neighborhood: A New System Structure and Classification Rule for Recognition in Partially Exposed Environments , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  P. Vannoorenberghe,et al.  Handling uncertain labels in multiclass problems using belief decision trees , 2002 .

[4]  Ashwin Srinivasan,et al.  Distinguishing Exceptions From Noise in Non-Monotonic Learning , 1992 .

[5]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[6]  Tristrom Cooke,et al.  Two Variations on Fisher's Linear Discriminant for Pattern Recognition , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  G. Gates The Reduced Nearest Neighbor Rule , 1998 .

[8]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[9]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[10]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[11]  Peter Devilee,et al.  Comparative genomic hybridization profiles in human BRCA1 and BRCA2 breast tumors highlight differential sets of genomic aberrations. , 2005, Cancer research.

[12]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[13]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[14]  G. Gates,et al.  The reduced nearest neighbor rule (Corresp.) , 1972, IEEE Trans. Inf. Theory.

[15]  Bernhard Schölkopf,et al.  Estimating a Kernel Fisher Discriminant in the Presence of Label Noise , 2001, ICML.

[16]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[17]  Yasubumi Sakakibara,et al.  Noise-Tolerant Occam Algorithms and Their Applications to Learning Decision Trees , 1993, Machine Learning.

[18]  Tony R. Martinez,et al.  A noise filtering method using neural networks , 2003, IEEE International Workshop on Soft Computing Techniques in Instrumentation, Measurement and Related Applications, 2003. SCIMA 2003..

[19]  Nada Lavrac,et al.  Experiments with Noise Filtering in a Medical Domain , 1999, ICML.

[20]  N. Mati,et al.  Discovering Informative Patterns and Data Cleaning , 1996 .

[21]  Xindong Wu,et al.  Eliminating Class Noise in Large Datasets , 2003, ICML.

[22]  Carla E. Brodley,et al.  Identifying and Eliminating Mislabeled Training Instances , 1996, AAAI/IAAI, Vol. 1.

[23]  Isabelle Guyon,et al.  Discovering Informative Patterns and Data Cleaning , 1996, Advances in Knowledge Discovery and Data Mining.

[24]  I. Tomek An Experiment with the Edited Nearest-Neighbor Rule , 1976 .

[25]  Dr. M. G. Worster Methods of Mathematical Physics , 1947, Nature.

[26]  Marcel J T Reinders,et al.  Molecular classification of breast carcinomas by comparative genomic hybridization: a specific somatic genetic profile for BRCA1 tumors. , 2002, Cancer research.

[27]  Xindong Wu Knowledge Acquisition from Databases , 1995 .

[28]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[29]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[30]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[31]  John Mingers,et al.  An Empirical Comparison of Pruning Methods for Decision Tree Induction , 1989, Machine Learning.

[32]  Wei Fan,et al.  Bagging , 2009, Encyclopedia of Machine Learning.