Ensemble MultiBoost Based on RIPPER Classifier for Prediction of Imbalanced Software Defect Data

Identifying defective software entities is essential to ensure software quality during software development. However, the high dimensionality and class distribution imbalance of software defect data seriously affect software defect prediction performance. In order to solve this problem, this paper proposes an Ensemble MultiBoost based on RIPPER classifier for prediction of imbalanced Software Defect data, called EMR_SD. Firstly, the algorithm uses principal component analysis (PCA) method to find out the most effective features from the original features of the data set, so as to achieve the purpose of dimensionality reduction and redundancy removal. Furthermore, the combined sampling method of adaptive synthetic sampling (ADASYN) and random sampling without replacement is performed to solve the problem of data class imbalance. This classifier establishes association rules based on attributes and classes, using MultiBoost to reduce deviation and variance, so as to achieve the purpose of reducing classification error. The proposed prediction model is evaluated experimentally on the NASA MDP public datasets and compared with existing similar algorithms. The results show that EMR_SD algorithm is superior to DNC, CEL and other defect prediction techniques in most evaluation indicators, which proves the effectiveness of the algorithm.

[1]  Budi Yulianto,et al.  Mobile Application Software Defect Prediction , 2016, 2016 IEEE Symposium on Service-Oriented System Engineering (SOSE).

[2]  Bin Liu,et al.  Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning , 2017, Inf. Softw. Technol..

[3]  Taghi M. Khoshgoftaar,et al.  Cost-sensitive boosting in software quality modeling , 2002, 7th IEEE International Symposium on High Assurance Systems Engineering, 2002. Proceedings..

[4]  Shujuan Jiang,et al.  A feature selection approach based on a similarity measure for software defect prediction , 2017, Frontiers of Information Technology & Electronic Engineering.

[5]  Xin Yao,et al.  Using Class Imbalance Learning for Software Defect Prediction , 2013, IEEE Transactions on Reliability.

[6]  Olcay Taner Yildiz,et al.  Software defect prediction using Bayesian networks , 2012, Empirical Software Engineering.

[7]  Xiang Chen,et al.  Empirical Studies of a Two-Stage Data Preprocessing Approach for Software Fault Prediction , 2014, IEEE Transactions on Reliability.

[8]  A. Soleimani,et al.  An AIS based feature selection method for software fault prediction , 2014, 2014 Iranian Conference on Intelligent Systems (ICIS).

[9]  Jan Vanthienen,et al.  Software Defect Prediction Based on Association Rule Classification , 2010 .

[10]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[11]  Burak Turhan,et al.  Implications of ceiling effects in defect predictors , 2008, PROMISE '08.

[12]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[13]  Mohammad Alshayeb,et al.  Software defect prediction using ensemble learning on selected features , 2015, Inf. Softw. Technol..

[14]  Yuxiang Shen,et al.  An empirical study on pareto based multi-objective feature selection for software defect prediction , 2019, J. Syst. Softw..

[15]  Qinbao Song,et al.  Using Coding-Based Ensemble Learning to Improve Software Defect Prediction , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[16]  Vipul Vashisht,et al.  A Framework for Software Defect Prediction Using Neural Networks , 2015 .

[17]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[18]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[19]  Nathalie Japkowicz,et al.  A Novelty Detection Approach to Classification , 1995, IJCAI.

[20]  Taghi M. Khoshgoftaar,et al.  Predicting Faults in High Assurance Software , 2010, 2010 IEEE 12th International Symposium on High Assurance Systems Engineering.

[21]  Zsuzsanna Marian,et al.  Software defect prediction using relational association rule mining , 2014, Inf. Sci..

[22]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[23]  Amri Napolitano,et al.  A comparative study of iterative and non-iterative feature selection techniques for software defect prediction , 2014, Inf. Syst. Frontiers.

[24]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[25]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[26]  Jin Liu,et al.  Dictionary learning based software defect prediction , 2014, ICSE.

[27]  Tao Zhang,et al.  A Scenario-Based Approach to Predicting Software Defects Using Compressed C4.5 Model , 2014, 2014 IEEE 38th Annual Computer Software and Applications Conference.

[28]  Shunzhi Zhu,et al.  An improved semi-supervised learning method for software defect prediction , 2014, J. Intell. Fuzzy Syst..

[29]  Taghi M. Khoshgoftaar,et al.  Choosing software metrics for defect prediction: an investigation on feature selection techniques , 2011, Softw. Pract. Exp..

[30]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[31]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[32]  Tracy Hall,et al.  What is the Impact of Imbalance on Software Defect Prediction Performance? , 2015, PROMISE.

[33]  Scott Dick,et al.  Evaluating Stratification Alternatives to Improve Software Defect Prediction , 2012, IEEE Transactions on Reliability.

[34]  Xiaoyuan Jing,et al.  Multiple kernel ensemble learning for software defect prediction , 2015, Automated Software Engineering.

[35]  Sandeep Kumar,et al.  Predicting Number of Faults in Software System using Genetic Programming , 2015, SCSE.

[36]  Nicolino J. Pizzi,et al.  A fuzzy classifier approach to estimating software quality , 2013, Inf. Sci..

[37]  Naoyasu Ubayashi,et al.  Studying just-in-time defect prediction using cross-project models , 2015, Empirical Software Engineering.

[38]  Hao Chen,et al.  Kernel Based Asymmetric Learning for Software Defect Prediction , 2012, IEICE Trans. Inf. Syst..

[39]  Audris Mockus,et al.  A large-scale empirical study of just-in-time quality assurance , 2013, IEEE Transactions on Software Engineering.