Predictive Ensemble Modelling: Experimental Comparison of Boosting Implementation Methods

This paper presents the empirical comparison of boosting implementation by reweighting and resampling methods. The goal of this paper is to determine which of the two methods performs better. In the study, we used four algorithms namely: Decision Stump, Neural Network, Random Forest and Support Vector Machine as base classifiers and AdaBoost as a technique to develop various ensemble models. We applied 10-fold cross validation method in measuring and evaluating the performance metrics of the models. The results show that in both methods the average of the correctly classified and incorrectly classified are relatively the same. However, average values of the RMSE in both methods are insignificantly different. The results further show that the two methods are independent of the datasets and the base classier used. Additionally, we found that the complexity of the chosen ensemble technique and boosting method does not necessarily lead to better performance.

[1]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[2]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  Yoav Freund,et al.  Boosting: Foundations and Algorithms , 2012 .

[5]  Taghi M. Khoshgoftaar,et al.  Resampling or Reweighting: A Comparison of Boosting Implementations , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.

[6]  Hiroshi Nagahashi,et al.  A new method for solving overfitting problem of gentle AdaBoost , 2014, International Conference on Graphic and Image Processing.

[7]  Chastine Fatichah,et al.  A Combined AdaBoost and NEWFM Technique for Medical Data Classification , 2015 .

[8]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[9]  Stan Matwin,et al.  Improvements to AdaBoost Dynamic , 2012, Canadian Conference on AI.

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[11]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[12]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[13]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[14]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[15]  Gunnar Rätsch,et al.  Boosting Algorithms for Maximizing the Soft Margin , 2007, NIPS.

[16]  Dursun Delen,et al.  Predicting breast cancer survivability: a comparison of three data mining methods , 2005, Artif. Intell. Medicine.

[17]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[18]  Jian Li,et al.  Two new regularized AdaBoost algorithms , 2004, 2004 International Conference on Machine Learning and Applications, 2004. Proceedings..

[19]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[20]  Hooshang Jazayeri-Rad,et al.  Comparing the Fault Diagnosis Performances of Single Neural Networks and Two Ensemble Neural Networks Based on the Boosting Methods , 2014 .

[21]  Yan Wang,et al.  Using a novel AdaBoost algorithm and Chou's Pseudo amino acid composition for predicting protein subcellular localization. , 2011, Protein and peptide letters.

[22]  Marco Botta,et al.  Resampling vs Reweighting in Boosting a Relational Weak Learner , 2001, AI*IA.