Adapting MultiBoost ensemble for class imbalanced learning

Learning with class imbalanced data sets is a challenging undertaking by the common learning algorithms. These algorithms favor majority class due to imbalanced class representation, noise and their inability to expand the boundaries of minority class in concept space. To improve the performance of minority class identification, ensembles combined with data resampling techniques have gained much popularity. However, these ensembles attain higher minority class performance at the cost of majority class performance. In this paper, we adapt the MultiBoost ensemble to deal with the minority class identification problem. Our technique inherits the power of its constituent and therefore improves the prediction performance of the minority class by expanding the concept space and overall classification performance by reducing bias and variance in the error. We compared our technique with seven existing simple and ensemble techniques using thirteen data sets. The experimental results show that proposed technique gains significant performance improvement on all tested metrics. Furthermore, it also has inherited advantage over other ensembles due to its capability of parallel computation.

[1]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[2]  Taghi M. Khoshgoftaar,et al.  Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[3]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[4]  Zhendong Niu,et al.  Distribution based ensemble for class imbalance learning , 2015, Fifth International Conference on the Innovative Computing Technology (INTECH 2015).

[5]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[6]  David A. Cieslak,et al.  Automatically countering imbalance and its empirical relationship to cost , 2008, Data Mining and Knowledge Discovery.

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  Taghi M. Khoshgoftaar,et al.  Resampling or Reweighting: A Comparison of Boosting Implementations , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.

[9]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[10]  Edward Y. Chang,et al.  KBA: kernel boundary alignment considering imbalanced data distribution , 2005, IEEE Transactions on Knowledge and Data Engineering.

[11]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[12]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[13]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[14]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[15]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[16]  Xin Yao,et al.  Diversity analysis on imbalanced data sets by using ensemble models , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[17]  Sheng Chen,et al.  A Kernel-Based Two-Class Classifier for Imbalanced Data Sets , 2007, IEEE Transactions on Neural Networks.

[18]  Yanqing Zhang,et al.  SVMs Modeling for Highly Imbalanced Classification , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[20]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[21]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[22]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[23]  Foster Provost,et al.  The effect of class distribution on classifier learning: an empirical study , 2001 .