Robust supervised classification with mixture models: Learning from data with uncertain labels

In the supervised classification framework, human supervision is required for labeling a set of learning data which are then used for building the classifier. However, in many applications, human supervision is either imprecise, difficult or expensive. In this paper, the problem of learning a supervised multi-class classifier from data with uncertain labels is considered and a model-based classification method is proposed to solve it. The idea of the proposed method is to confront an unsupervised modeling of the data with the supervised information carried by the labels of the learning data in order to detect inconsistencies. The method is able afterward to build a robust classifier taking into account the detected inconsistencies into the labels. Experiments on artificial and real data are provided to highlight the main features of the proposed method as well as an application to object recognition under weak supervision.

[1]  Marcel J. T. Reinders,et al.  Classification in the presence of class noise using a probabilistic Kernel Fisher method , 2007, Pattern Recognit..

[2]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[3]  E. M. Carter,et al.  High breakdown mixture discriminant analysis , 2005 .

[4]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[5]  G. Gates,et al.  The reduced nearest neighbor rule (Corresp.) , 1972, IEEE Trans. Inf. Theory.

[6]  Bernhard Schölkopf,et al.  Estimating a Kernel Fisher Discriminant in the Presence of Label Noise , 2001, ICML.

[7]  R. Tibshirani,et al.  Discriminant Analysis by Gaussian Mixtures , 1996 .

[8]  Cordelia Schmid,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[9]  Cordelia Schmid,et al.  High-dimensional data clustering , 2006, Comput. Stat. Data Anal..

[10]  Douglas M. Hawkins,et al.  High-Breakdown Linear Discriminant Analysis , 1997 .

[11]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[12]  G. Gates The Reduced Nearest Neighbor Rule , 1998 .

[13]  Gérard Govaert,et al.  Gaussian parsimonious clustering models , 1995, Pattern Recognit..

[14]  Yasubumi Sakakibara,et al.  Noise-Tolerant Occam Algorithms and Their Applications to Learning Decision Trees , 1993, Machine Learning.

[15]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[18]  N. Mati,et al.  Discovering Informative Patterns and Data Cleaning , 1996 .

[19]  Xindong Wu,et al.  Eliminating Class Noise in Large Datasets , 2003, ICML.

[20]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[21]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[22]  Ido Dagan,et al.  Machine Learning Challenges, Evaluating Predictive Uncertainty, Visual Object Classification and Recognizing Textual Entailment, First PASCAL Machine Learning Challenges Workshop, MLCW 2005, Southampton, UK, April 11-13, 2005, Revised Selected Papers , 2006, MLCW.

[23]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[24]  Isabelle Guyon,et al.  Discovering Informative Patterns and Data Cleaning , 1996, Advances in Knowledge Discovery and Data Mining.

[25]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[26]  Tony R. Martinez,et al.  A noise filtering method using neural networks , 2003, IEEE International Workshop on Soft Computing Techniques in Instrumentation, Measurement and Related Applications, 2003. SCIMA 2003..

[27]  Nada Lavrac,et al.  Experiments with Noise Filtering in a Medical Domain , 1999, ICML.

[28]  Cordelia Schmid,et al.  Object Localization by Subspace Clustering of Local Descriptors , 2006, ICVGIP.

[29]  P. Vannoorenberghe,et al.  Handling uncertain labels in multiclass problems using belief decision trees , 2002 .

[30]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[31]  Tony R. Martinez,et al.  Instance Pruning Techniques , 1997, ICML.

[32]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[33]  John Mingers,et al.  An Empirical Comparison of Pruning Methods for Decision Tree Induction , 1989, Machine Learning.

[34]  G. Celeux,et al.  Regularized Gaussian Discriminant Analysis through Eigenvalue Decomposition , 1996 .

[35]  George H. John Robust Decision Trees: Removing Outliers from Databases , 1995, KDD.

[36]  Belur V. Dasarathy,et al.  Nosing Around the Neighborhood: A New System Structure and Classification Rule for Recognition in Partially Exposed Environments , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.