Constructing ensembles of classifiers using supervised projection methods based on misclassified instances

In this paper, we propose an approach for ensemble construction based on the use of supervised projections, both linear and non-linear, to achieve both accuracy and diversity of individual classifiers. The proposed approach uses the philosophy of boosting, putting more effort on difficult instances, but instead of learning the classifier on a biased distribution of the training set, it uses misclassified instances to find a supervised projection that favors their correct classification. We show that supervised projection algorithms can be used for this task. We try several known supervised projections, both linear and non-linear, in order to test their ability in the present framework. Additionally, the method is further improved introducing concepts from oversampling for imbalance datasets. The introduced method counteracts the negative effect of a low number of instances for constructing the supervised projections. The method is compared with AdaBoost showing an improved performance on a large set of 45 problems from the UCI Machine Learning Repository. Also, the method shows better robustness in presence of noise with respect to AdaBoost.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[3]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[4]  Bor-Chen Kuo,et al.  Regularized feature extractions for hyperspectral data classification , 2003, IGARSS 2003. 2003 IEEE International Geoscience and Remote Sensing Symposium. Proceedings (IEEE Cat. No.03CH37477).

[5]  R. Barandelaa,et al.  Strategies for learning in class imbalance problems , 2003, Pattern Recognit..

[6]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[7]  K. Fukunaga,et al.  Nonparametric Discriminant Analysis , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation (3rd Edition) , 2007 .

[9]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[10]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[11]  Qi Tian,et al.  Boosting Multiple Classifiers Constructed by Hybrid Discriminant Analysis , 2005, Multiple Classifier Systems.

[12]  Ana I. González Acuña An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, Boosting, and Randomization , 2012 .

[13]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[14]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[15]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[16]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[17]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[18]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[19]  Xin Geng,et al.  Supervised nonlinear dimensionality reduction for visualization and classification , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[21]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[22]  Nicolás García-Pedrajas,et al.  Immune Network based Ensembles , 2007, ESANN.

[23]  Nicolás García-Pedrajas,et al.  Nonlinear Boosting Projections for Ensemble Construction , 2007, J. Mach. Learn. Res..

[24]  Bor-Chen Kuo,et al.  Nonparametric weighted feature extraction for classification , 2004, IEEE Transactions on Geoscience and Remote Sensing.

[25]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[26]  M. Bressan,et al.  Nonparametric discriminant analysis and nearest neighbor classification , 2003, Pattern Recognit. Lett..

[27]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[28]  L. Kuncheva,et al.  Combining classifiers: Soft computing solutions. , 2001 .

[29]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[30]  Alejandro Echeverría,et al.  Evolutionary discriminant analysis , 2006, IEEE Transactions on Evolutionary Computation.

[31]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[32]  Christopher J. Merz,et al.  Using Correspondence Analysis to Combine Classifiers , 1999, Machine Learning.

[33]  Sankar K. Pal,et al.  Pattern Recognition: From Classical to Modern Approaches , 2001 .

[34]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[35]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[36]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[37]  Donald F. Specht,et al.  A general regression neural network , 1991, IEEE Trans. Neural Networks.

[38]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.