Boosting random subspace method

In this paper we propose a boosting approach to random subspace method (RSM) to achieve an improved performance and avoid some of the major drawbacks of RSM. RSM is a successful method for classification. However, the random selection of inputs, its source of success, can also be a major problem. For several problems some of the selected subspaces may lack the discriminant ability to separate the different classes. These subspaces produce poor classifiers that harm the performance of the ensemble. Additionally, boosting RSM would also be an interesting approach for improving its performance. Nevertheless, the application of the two methods together, boosting and RSM, achieves poor results, worse than the results of each method separately. In this work, we propose a new approach for combining RSM and boosting. Instead of obtaining random subspaces, we search subspaces that optimize the weighted classification error given by the boosting algorithm, and then the new classifier added to the ensemble is trained using the obtained subspace. An additional advantage of the proposed methodology is that it can be used with any classifier, including those, such as k nearest neighbor classifiers, that cannot use boosting methods easily. The proposed approach is compared with standard ADABoost and RSM showing an improved performance on a large set of 45 problems from the UCI Machine Learning Repository. An additional study of the effect of noise on the labels of the training instances shows that the less aggressive versions of the proposed methodology are more robust than ADABoost in the presence of noise.

[1]  Larry J. Eshelman,et al.  The CHC Adaptive Search Algorithm: How to Have Safe Search When Engaging in Nontraditional Genetic Recombination , 1990, FOGA.

[2]  Ke Chen,et al.  Methods of Combining Multiple Classifiers with Different Features and Their Applications to Text-Independent Speaker Identification , 1997, Int. J. Pattern Recognit. Artif. Intell..

[3]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[4]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Sankar K. Pal,et al.  Pattern Recognition: From Classical to Modern Approaches , 2001 .

[6]  Jon Patrick,et al.  Meta-Learning Orthographic and Contextual Models for Language Independent Named Entity Recognition , 2003, CoNLL.

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[9]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[10]  Ron Kohavi,et al.  Option Decision Trees with Majority Votes , 1997, ICML.

[11]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[12]  Gan Li,et al.  Combining Control Strategies Using Genetic Algorithms with Memory , 1997, Evolutionary Programming.

[13]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[15]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[16]  Lawrence O. Hall,et al.  Comparing pure parallel ensemble creation techniques against bagging , 2003, Third IEEE International Conference on Data Mining.

[17]  Nicolás García-Pedrajas,et al.  Nonlinear Boosting Projections for Ensemble Construction , 2007, J. Mach. Learn. Res..

[18]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[19]  Philipp Slusallek,et al.  Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[20]  Robert Givan,et al.  Online Ensemble Learning: An Empirical Study , 2000, Machine Learning.

[21]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[22]  Jian Li,et al.  Reducing the Overfitting of Adaboost by Controlling its Data Distribution Skewness , 2006, Int. J. Pattern Recognit. Artif. Intell..

[23]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[24]  L. Breiman Stacked Regressions , 1996, Machine Learning.

[25]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[26]  César Hervás-Martínez,et al.  Cooperative coevolution of artificial neural network ensembles for pattern classification , 2005, IEEE Transactions on Evolutionary Computation.

[27]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[28]  Eugene M. Kleinberg,et al.  On the Algorithmic Implementation of Stochastic Discrimination , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[30]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[31]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[32]  L. Kuncheva,et al.  Combining classifiers: Soft computing solutions. , 2001 .

[33]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[34]  N. Garc'ia-Pedrajas,et al.  CIXL2: A Crossover Operator for Evolutionary Algorithms Based on Population Features , 2005, J. Artif. Intell. Res..

[35]  R. Iman,et al.  Approximations of the critical region of the fbietkan statistic , 1980 .

[36]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[37]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[38]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[39]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[40]  Christopher J. Merz,et al.  Using Correspondence Analysis to Combine Classifiers , 1999, Machine Learning.

[41]  Robert P. W. Duin,et al.  Bagging and the Random Subspace Method for Redundant Feature Spaces , 2001, Multiple Classifier Systems.

[42]  Jon Atli Benediktsson,et al.  Proceedings of the 8th International Workshop on Multiple Classifier Systems , 2009, International Workshop on Multiple Classifier Systems.

[43]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[44]  Nicolás García-Pedrajas,et al.  Supervised projection approach for boosting classifiers , 2009, Pattern Recognit..

[45]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.