Hybrid Methods for Class Imbalance Learning Employing Bagging with Sampling Techniques

Class imbalance classification has become a dominant problem in supervised learning. The bias of majority class instances dominates in quantity over minority class instances in imbalanced datasets, which produce the suboptimal classification results for classifying the minority class instances. In the last decade, several methods including sampling techniques, cost-sensitive learning, and ensemble methods have been introduced for dealing with class imbalance classification. Among all the methods, the ensemble method performs better in compare with sampling and cost-sensitive learning. The ensemble learning uses sampling technique (either under-sampling or oversampling) with bagging or boosting algorithms. However, which sampling techniques will work better with ensemble learning to improve class imbalance is extremely depend on problem domains. In this paper, we propose two bagging based methods: (a) ADASYNBagging, and (b) RSYNBagging for dealing with imbalanced classification. The ADASYNBagging uses ADASYN based over-sample technique with bagging algorithm. On the contrary, the RSYNBagging uses random under-sampling and ADASYN based over-sample technique with bagging algorithm. RSYNBagging utilizes both under-sampling and over-sampling in alternate iterations and thus incorporates the advantages of both techniques without introducing any extra parameter to tune or increasing time complexity. We have tested the performance of our proposed ADASYNBagging and RSYNBagging methods against existing best performing methods Underbagging, SMOTEBagging on 11 benchmark imbalanced datasets and the initial results are strongly encouraging.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[3]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[4]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[5]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[6]  J. Ross Quinlan Improved Estimates for the Accuracy of Small Disjuncts , 2005, Machine Learning.

[7]  J.J. Hopfield,et al.  Artificial neural networks , 1988, IEEE Circuits and Devices Magazine.

[8]  Xindong Wu,et al.  10 Challenging Problems in Data Mining Research , 2006, Int. J. Inf. Technol. Decis. Mak..

[9]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[11]  Xin Yao,et al.  Diversity analysis on imbalanced data sets by using ensemble models , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[12]  Yuan Yan Tang,et al.  Hybrid Sampling with Bagging for Class Imbalance Learning , 2016, PAKDD.

[13]  Rosa Maria Valdovinos,et al.  New Applications of Ensembles of Classifiers , 2003, Pattern Analysis & Applications.

[14]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Cen Li,et al.  Classifying imbalanced data using a bagging ensemble variation (BEV) , 2007, ACM-SE 45.

[16]  Nitesh V. Chawla,et al.  SPECIAL ISSUE ON LEARNING FROM IMBALANCED DATA SETS , 2004 .

[17]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[18]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[19]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[20]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[21]  Juan José Rodríguez Diez,et al.  Random Balance: Ensembles of variable priors classifiers for imbalanced data , 2015, Knowl. Based Syst..

[22]  H. Kashima,et al.  Roughly balanced bagging for imbalanced data , 2009 .

[23]  Ying He,et al.  MSMOTE: Improving Classification Performance When Training Data is Imbalanced , 2009, 2009 Second International Workshop on Computer Science and Engineering.

[24]  Gerald Schaefer,et al.  An improved ensemble approach for imbalanced classification problems , 2013, 2013 IEEE 8th International Symposium on Applied Computational Intelligence and Informatics (SACI).

[25]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[26]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[27]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[28]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[29]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[30]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..