INFFC: An iterative class noise filter based on the fusion of classifiers with noise sensitivity control

In classification, noise may deteriorate the system performance and increase the complexity of the models built. In order to mitigate its consequences, several approaches have been proposed in the literature. Among them, noise filtering, which removes noisy examples from the training data, is one of the most used techniques. This paper proposes a new noise filtering method that combines several filtering strategies in order to increase the accuracy of the classification algorithms used after the filtering process. The filtering is based on the fusion of the predictions of several classifiers used to detect the presence of noise. We translate the idea behind multiple classifier systems, where the information gathered from different models is combined, to noise filtering. In this way, we consider the combination of classifiers instead of using only one to detect noise. Additionally, the proposed method follows an iterative noise filtering scheme that allows us to avoid the usage of detected noisy examples in each new iteration of the filtering process. Finally, we introduce a noisy score to control the filtering sensitivity, in such a way that the amount of noisy examples removed in each iteration can be adapted to the necessities of the practitioner. The first two strategies (use of multiple classifiers and iterative filtering) are used to improve the filtering accuracy, whereas the last one (the noisy score) controls the level of conservation of the filter removing potentially noisy examples. The validity of the proposed method is studied in an exhaustive experimental study. We compare the new filtering method against several state-of-the-art methods to deal with datasets with class noise and study their efficacy in three classifiers with different sensitivity to noise.

[1]  Shiliang Sun,et al.  Local within-class accuracies for weighting individual outputs in multiple classifier systems , 2010, Pattern Recognit. Lett..

[2]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[3]  Choh-Man Teng,et al.  Polishing Blemishes: Issues in Data Correction , 2004, IEEE Intell. Syst..

[4]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[5]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[6]  Veda C. Storey,et al.  A Framework for Analysis of Data Quality Research , 1995, IEEE Trans. Knowl. Data Eng..

[7]  Emilio Corchado,et al.  A survey of multiple classifier systems as hybrid systems , 2014, Inf. Fusion.

[8]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[9]  Saso Dzeroski,et al.  Noise Elimination in Inductive Concept Learning: A Case Study in Medical Diagnosois , 1996, ALT.

[10]  Francisco Herrera,et al.  Tackling the problem of classification with noisy data using Multiple Classifier Systems: Analysis of the performance and robustness , 2013, Inf. Sci..

[11]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Odile Papini,et al.  Information Fusion , 2014, Computer Vision, A Reference Guide.

[13]  Roberto Alejo,et al.  Analysis of new techniques to obtain quality training sets , 2003, Pattern Recognit. Lett..

[14]  Pierre A. Devijver On the editing rate of the Multiedit algorithm , 1986, Pattern Recognit. Lett..

[15]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[16]  Ludmila I. Kuncheva,et al.  Data reduction using classifier ensembles , 2007, ESANN.

[17]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[18]  Juan José Rodríguez Diez,et al.  A weighted voting framework for classifiers ensembles , 2012, Knowledge and Information Systems.

[19]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[20]  Francisco Herrera,et al.  Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification , 2013, Pattern Recognit..

[21]  Xindong Wu,et al.  Mining With Noise Knowledge: Error-Aware Data Mining , 2008, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[22]  Francisco Herrera,et al.  Data Preprocessing in Data Mining , 2014, Intelligent Systems Reference Library.

[23]  George H. John Robust Decision Trees: Removing Outliers from Databases , 1995, KDD.

[24]  I. Tomek An Experiment with the Edited Nearest-Neighbor Rule , 1976 .

[25]  Igor Kononenko,et al.  Machine Learning and Data Mining: Introduction to Principles and Algorithms , 2007 .

[26]  Enrico Blanzieri,et al.  Detecting potential labeling errors in microarrays by data perturbation , 2006, Bioinform..

[27]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[28]  Francisco Herrera,et al.  Analyzing the presence of noise in multi-class problems: alleviating its influence with the One-vs-One decomposition , 2012, Knowledge and Information Systems.

[29]  Nada Lavrac,et al.  Experiments with Noise Filtering in a Medical Domain , 1999, ICML.

[30]  Choh-Man Teng,et al.  Correcting Noisy Data , 1999, ICML.

[31]  Rosa Maria Valdovinos,et al.  New Applications of Ensembles of Classifiers , 2003, Pattern Analysis & Applications.

[32]  Nicolás García-Pedrajas,et al.  Random feature weights for decision tree ensemble construction , 2012, Inf. Fusion.

[33]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[34]  Saso Dzeroski,et al.  Noise detection and elimination in data preprocessing: Experiments in medical domains , 2000, Appl. Artif. Intell..

[35]  Gábor Lugosi,et al.  Learning with an unreliable teacher , 1992, Pattern Recognit..

[36]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[37]  Taghi M. Khoshgoftaar,et al.  Improving Software Quality Prediction by Noise Filtering Techniques , 2007, Journal of Computer Science and Technology.

[38]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[39]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[40]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[41]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[42]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[43]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[44]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[45]  Ray J. Hickey,et al.  Artificial Intelligence Noise modelling and evaluating learning from examples , 2003 .