Double committee adaboost

Abstract In this paper we make an extensive study of different combinations of ensemble techniques for improving the performance of adaboost considering the following strategies: reducing the correlation problem among the features, reducing the effect of the outliers in adaboost training, and proposing an efficient way for selecting/weighing the weak learners. First, we show that random subspace works well coupled with several adaboost techniques. Second, we show that an ensemble based on training perturbation using editing methods (to reduce the importance of the outliers) further improves performance. We examine the robustness of the new approach by applying it to a number of benchmark datasets representing a range of different problems. We find that compared with other state-of-the-art classifiers our proposed method performs consistently well across all the tested datasets. One useful finding is that this approach obtains a performance similar to support vector machine (SVM), using the well-known LibSVM implementation, even when both kernel selection and various parameters of SVM are carefully tuned for each dataset. The main drawback of the proposed approach is the computation time, which is high as a result of combining the different ensemble techniques. We have also tested the fusion between our selected committee of adaboost with SVM (again using the widely tested LibSVM tool) where the parameters of SVM are tuned for each dataset. We find that the fusion between SVM and a committee of adaboost (i.e., a heterogeneous ensemble) statistically outperforms the most used SVM tool with parameters tuned for each dataset. The MATLAB code of our best approach is available at bias.csr.unibo.it/nanni/ADA.rar .

[1]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Ruy Luiz Milidiú,et al.  Improving BAS committee performance with a semi-supervised approach , 2009, ESANN.

[3]  Jun Kawai,et al.  LOCATE: a mouse protein subcellular localization database , 2005, Nucleic Acids Res..

[4]  Gonzalo Mart Switching Class Labels to Generate Classication Ensembles , 2005 .

[5]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[6]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[7]  Jian Guo,et al.  TSSub: eukaryotic protein subcellular localization by extracting features from profiles , 2006, Bioinform..

[8]  Chun-Xia Zhang,et al.  RotBoost: A technique for combining Rotation Forest and AdaBoost , 2008, Pattern Recognit. Lett..

[9]  Li Zhang,et al.  Sparse ensembles using weighted combination methods based on linear programming , 2011, Pattern Recognit..

[10]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Loris Nanni,et al.  Ensemblator: An ensemble of classifiers for reliable classification of biological data , 2007, Pattern Recognit. Lett..

[12]  Kagan Tumer,et al.  Input decimated ensembles , 2003, Pattern Analysis & Applications.

[13]  Loris Nanni,et al.  Reduced Reward-punishment editing for building ensembles of classifiers , 2011, Expert Syst. Appl..

[14]  Lior Shamir,et al.  Source Code for Biology and Medicine Open Access Wndchrm – an Open Source Utility for Biological Image Analysis , 2022 .

[15]  Nojun Kwak,et al.  Feature extraction for classification problems and its application to face recognition , 2008, Pattern Recognit..

[16]  Yoav Freund,et al.  An Adaptive Version of the Boost by Majority Algorithm , 1999, COLT.

[17]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[18]  Yang Kai,et al.  Genetic Algorithm Based Optimization for AdaBoost , 2008, 2008 International Conference on Computer Science and Software Engineering.

[19]  Osamu Watanabe,et al.  Scaling Up a Boosting-Based Learner via Adaptive Sampling , 2000, PAKDD.

[20]  Anthony J. Bonner,et al.  Combining classifiers to predict gene function in Arabidopsis thaliana using large-scale gene expression measurements , 2007, BMC Bioinformatics.

[21]  Rocco A. Servedio,et al.  Smooth boosting and learning with malicious noise , 2003 .

[22]  Vasile Palade,et al.  A neural network based multi-classifier system for gene identification in DNA sequences , 2004, Neural Computing & Applications.

[23]  Guido Bologna,et al.  A comparison study on protein fold recognition , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[24]  De-Shuang Huang,et al.  Cancer classification using Rotation Forest , 2008, Comput. Biol. Medicine.

[25]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[26]  David H. Wolpert,et al.  Coevolutionary free lunches , 2005, IEEE Transactions on Evolutionary Computation.

[27]  Aníbal R. Figueiras-Vidal,et al.  Committees of Adaboost ensembles with modified emphasis functions , 2010, Neurocomputing.

[28]  Tom Bylander,et al.  Using Validation Sets to Avoid Overfitting in AdaBoost , 2006, FLAIRS.

[29]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[30]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[31]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[32]  Gonzalo Martínez-Muñoz,et al.  Switching class labels to generate classification ensembles , 2005, Pattern Recognit..

[33]  Robert Sabourin,et al.  From dynamic classifier selection to dynamic ensemble selection , 2008, Pattern Recognit..

[34]  H. Altay Güvenir,et al.  Learning differential diagnosis of erythemato-squamous diseases using voting feature intervals , 1998, Artif. Intell. Medicine.

[35]  Kuniaki Uehara,et al.  Improvement of Boosting Algorithm by Modifying the Weighting Rule , 2004, Annals of Mathematics and Artificial Intelligence.

[36]  Robert F. Murphy,et al.  A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells , 2001, Bioinform..

[37]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Loris Nanni,et al.  Ensemble generation and feature selection for the identification of students with learning disabilities , 2009, Expert Syst. Appl..

[39]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[40]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.