An Extensive Evaluation of Filtering Misclassified Instances in Supervised Classification Tasks

Not all instances in a data set are equally beneficial for inferring a model of the data, and some instances (such as outliers) can be detrimental. Several machine learning techniques treat the instances in a data set differently during training such as curriculum learning, filtering, and boosting. However, it is difficult to determine how beneficial an instance is for inferring a model of the data. In this article, we present an automated method that orders the instances in a data set by complexity based on their likelihood of being misclassified (instance hardness) for supervised classification problems that generates a hardness ordering. The underlying assumption of this method is that instances with a high likelihood of being misclassified represent more complex concepts in a data set. Using a hardness ordering allows a learning algorithm to focus on the most beneficial instances. We integrate a hardness ordering into the learning process using curriculum learning, filtering, and boosting. We find that focusing on the simpler instances during training significantly increases generalization accuracy. Also, the effects of curriculum learning depend on the learning algorithm that is used. In general, filtering and boosting outperform curriculum learning, and filtering has the most significant effect on accuracy. © 2014 Wiley Periodicals, Inc.

[1]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  Josef Schmee,et al.  Outliers in Statistical Data (2nd ed.) , 1986 .

[4]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[5]  T. Martinez,et al.  Estimating The Potential for Combining Learning Models , 2005 .

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[8]  Christophe G. Giraud-Carrier,et al.  A metric for unsupervised metalearning , 2011, Intell. Data Anal..

[9]  Jan Peters,et al.  Learning complex motions by sequencing simpler motion templates , 2009, ICML '09.

[10]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[11]  Valentin I. Spitkovsky,et al.  From Baby Steps to Leapfrog: How “Less is More” in Unsupervised Dependency Parsing , 2010, NAACL.

[12]  Kewei Tu,et al.  On the Utility of Curricula in Unsupervised Learning of Probabilistic Grammars , 2011, IJCAI.

[13]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[14]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[15]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[16]  Wei Jiang,et al.  On-line outlier detection and data cleaning , 2004, Comput. Chem. Eng..

[17]  Saso Dzeroski,et al.  Noise detection and elimination in data preprocessing: Experiments in medical domains , 2000, Appl. Artif. Intell..

[18]  Yoram Singer,et al.  Leveraging the margin more carefully , 2004, ICML.

[19]  Tony R. Martinez,et al.  Improving classification accuracy by identifying and removing instances that should be misclassified , 2011, The 2011 International Joint Conference on Neural Networks.

[20]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[21]  George H. John Robust Decision Trees: Removing Outliers from Databases , 1995, KDD.

[22]  Terence D. Sanger,et al.  Neural network learning control of robot manipulators using gradually increasing task difficulty , 1994, IEEE Trans. Robotics Autom..

[23]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[24]  J. Elman Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[25]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[26]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[27]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[28]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[29]  Tony R. Martinez,et al.  An instance level analysis of data complexity , 2014, Machine Learning.

[30]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[31]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .