Improving Ranking-Oriented Defect Prediction Using a Cost-Sensitive Ranking SVM

Context: Ranking-oriented defect prediction (RODP) ranks software modules to allocate limited testing resources to each module according to the predicted number of defects. Most RODP methods overlook that ranking a module with more defects incorrectly makes it difficult to successfully find all of the defects in the module due to fewer testing resources being allocated to the module, which results in much higher costs than incorrectly ranking the modules with fewer defects, and the numbers of defects in software modules are highly imbalanced in defective software datasets. Cost-sensitive learning is an effective technique in handling the cost issue and data imbalance problem for software defect prediction. However, the effectiveness of cost-sensitive learning has not been investigated in RODP models. Aims: In this article, we propose a cost-sensitive ranking support vector machine (SVM) (CSRankSVM) algorithm to improve the performance of RODP models. Method: CSRankSVM modifies the loss function of the ranking SVM algorithm by adding two penalty parameters to address both the cost issue and the data imbalance problem. Additionally, the loss function of the CSRankSVM is optimized using a genetic algorithm. Results: The experimental results for 11 project datasets with 41 releases show that CSRankSVM achieves 1.12%–15.68% higher average fault percentile average (FPA) values than the five existing RODP methods (i.e., decision tree regression, linear regression, Bayesian ridge regression, ranking SVM, and learning-to-rank (LTR)) and 1.08%–15.74% higher average FPA values than the four data imbalance learning methods (i.e., random undersampling and a synthetic minority oversampling technique; two data resampling methods; RankBoost, an ensemble learning method; IRSVM, a CSRankSVM method for information retrieval). Conclusion: CSRankSVM is capable of handling the cost issue and data imbalance problem in RODP methods and achieves better performance. Therefore, CSRankSVM is recommended as an effective method for RODP.

[1]  Elaine J. Weyuker,et al.  Comparing the effectiveness of several modeling methods for fault prediction , 2010, Empirical Software Engineering.

[2]  Sousuke Amasaki,et al.  A Bayesian belief network for assessing the likelihood of fault content , 2003, 14th International Symposium on Software Reliability Engineering, 2003. ISSRE 2003..

[3]  Luís Torgo,et al.  Resampling strategies for regression , 2015, Expert Syst. J. Knowl. Eng..

[4]  Taghi M. Khoshgoftaar,et al.  Count Models for Software Quality Estimation , 2007, IEEE Transactions on Reliability.

[5]  J. A. Ferreira,et al.  On the Benjamini-Hochberg method , 2006, math/0611265.

[6]  Qinbao Song,et al.  A General Software Defect-Proneness Prediction Framework , 2011, IEEE Transactions on Software Engineering.

[7]  Xin Yao,et al.  Using Class Imbalance Learning for Software Defect Prediction , 2013, IEEE Transactions on Reliability.

[8]  Michael B. Miller Linear Regression Analysis , 2013 .

[9]  Li-Wei Chen,et al.  Integration of the grey relational analysis with genetic algorithm for software effort estimation , 2008, Eur. J. Oper. Res..

[10]  Tu Minh Phuong,et al.  Similarity-based and rank-based defect prediction , 2014, 2014 International Conference on Advanced Technologies for Communications (ATC 2014).

[11]  Lionel C. Briand,et al.  A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..

[12]  Sandeep Kumar,et al.  Predicting Number of Faults in Software System using Genetic Programming , 2015, SCSE.

[13]  Rainer Koschke,et al.  Revisiting the evaluation of defect prediction models , 2009, PROMISE '09.

[14]  Baowen Xu,et al.  An Improved SDA Based Defect Prediction Framework for Both Within-Project and Cross-Project Class-Imbalance Problems , 2017, IEEE Transactions on Software Engineering.

[15]  David E. Goldberg,et al.  Genetic algorithms and Machine Learning , 1988, Machine Learning.

[16]  Taghi M. Khoshgoftaar,et al.  How Many Software Metrics Should be Selected for Defect Prediction? , 2011, FLAIRS.

[17]  Tai-hoon Kim,et al.  Application of Genetic Algorithm in Software Testing , 2009 .

[18]  Michael M. Richter,et al.  Defect Prediction using Case-Based Reasoning: an Attribute Weighting Technique Based upon sensitivity Analysis in Neural Networks , 2012, Int. J. Softw. Eng. Knowl. Eng..

[19]  Akito Monden,et al.  MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction , 2018, IEEE Transactions on Software Engineering.

[20]  Taghi M. Khoshgoftaar,et al.  The use of decision trees for cost‐sensitive classification: an empirical study in software quality prediction , 2011, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[21]  Akito Monden,et al.  Revisiting common bug prediction findings using effort-aware models , 2010, 2010 IEEE International Conference on Software Maintenance.

[22]  Divya Tomar,et al.  Prediction of Defective Software Modules Using Class Imbalance Learning , 2016, Appl. Comput. Intell. Soft Comput..

[23]  Tim Menzies,et al.  On the value of user preferences in search-based software engineering: A case study in software product lines , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[24]  W. Afzal,et al.  prediction of fault count data using genetic programming , 2008, 2008 IEEE International Multitopic Conference.

[25]  Gerardo Canfora,et al.  Defect prediction as a multiobjective optimization problem , 2015, Softw. Test. Verification Reliab..

[26]  Jun Wang,et al.  Compressed C4.5 Models for Software Defect Prediction , 2012, 2012 12th International Conference on Quality Software.

[27]  Ruchika Malhotra,et al.  A systematic review of machine learning techniques for software fault prediction , 2015, Appl. Soft Comput..

[28]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[29]  Taghi M. Khoshgoftaar,et al.  A Comparative Study of Ensemble Feature Selection Techniques for Software Defect Prediction , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[30]  Song Wang,et al.  Automatically Learning Semantic Features for Defect Prediction , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[31]  Taghi M. Khoshgoftaar,et al.  Choosing software metrics for defect prediction: an investigation on feature selection techniques , 2011, Softw. Pract. Exp..

[32]  Xiao-Yuan Jing,et al.  Cross-Project and Within-Project Semisupervised Software Defect Prediction: A Unified Approach , 2018, IEEE Transactions on Reliability.

[33]  Tao Wang,et al.  Naive Bayes Software Defect Prediction Model , 2010, 2010 International Conference on Computational Intelligence and Software Engineering.

[34]  Tong-Seng Quah,et al.  Application of neural networks for software quality prediction using object-oriented metrics , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[35]  Ayse Basar Bener,et al.  Software Defect Prediction: Heuristics for Weighted Naïve Bayes , 2007, ICSOFT.

[36]  Yue Jiang,et al.  Techniques for evaluating fault prediction models , 2008, Empirical Software Engineering.

[37]  H. Vinod A Survey of Ridge Regression and Related Techniques for Improvements over Ordinary Least Squares , 1978 .

[38]  Wushao Wen,et al.  Ridge and Lasso Regression Models for Cross-Version Defect Prediction , 2018, IEEE Transactions on Reliability.

[39]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[40]  Xiang Chen,et al.  Empirical Studies of a Two-Stage Data Preprocessing Approach for Software Fault Prediction , 2014, IEEE Transactions on Reliability.

[41]  Witold Pedrycz,et al.  Identification of defect-prone classes in telecommunication software systems using design metrics , 2006, Inf. Sci..

[42]  Taghi M. Khoshgoftaar,et al.  An Empirical Study of Feature Ranking Techniques for Software Quality Prediction , 2012, Int. J. Softw. Eng. Knowl. Eng..

[43]  Karim O. Elish,et al.  Predicting defect-prone software modules using support vector machines , 2008, J. Syst. Softw..

[44]  Chuanxiang Ma,et al.  Cross-company defect prediction via semi-supervised clustering-based data filtering and MSTrA-based transfer learning , 2018, Soft Comput..

[45]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[46]  Xin Yao,et al.  A Learning-to-Rank Approach to Software Defect Prediction , 2015, IEEE Transactions on Reliability.

[47]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[48]  Akito Monden,et al.  The Significant Effects of Data Sampling Approaches on Software Defect Prioritization and Classification , 2017, 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[49]  Tracy Hall,et al.  Researcher Bias: The Use of Machine Learning in Software Defect Prediction , 2014, IEEE Transactions on Software Engineering.

[50]  Yutao Ma,et al.  An empirical study on predicting defect numbers , 2015, SEKE.

[51]  Md Zahidul Islam,et al.  Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem , 2015, Inf. Syst..

[52]  Tim Menzies,et al.  Learning from Open-Source Projects: An Empirical Study on Defect Prediction , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[53]  Olcay Taner Yildiz,et al.  Software defect prediction using Bayesian networks , 2012, Empirical Software Engineering.

[54]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[55]  Taghi M. Khoshgoftaar,et al.  Attribute Selection and Imbalanced Data: Problems in Software Defect Prediction , 2010, 2010 22nd IEEE International Conference on Tools with Artificial Intelligence.

[56]  Jin Liu,et al.  Dictionary learning based software defect prediction , 2014, ICSE.

[57]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[58]  Taghi M. Khoshgoftaar,et al.  A Comprehensive Empirical Study of Count Models for Software Fault Prediction , 2007, IEEE Transactions on Reliability.

[59]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[60]  Jun Zheng,et al.  Cost-sensitive boosting neural networks for software defect prediction , 2010, Expert Syst. Appl..

[61]  David Lo,et al.  HYDRA: Massively Compositional Model for Cross-Project Defect Prediction , 2016, IEEE Transactions on Software Engineering.

[62]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[63]  Santosh Singh Rathore,et al.  Comparative analysis of neural network and genetic programming for number of software faults prediction , 2015, 2015 National Conference on Recent Advances in Electronics & Computer Engineering (RAECE).

[64]  Premkumar T. Devanbu,et al.  Recalling the "imprecision" of cross-project defect prediction , 2012, SIGSOFT FSE.

[65]  Sabrina Ahmad,et al.  Neural Network Parameter Optimization Based on Genetic Algorithm for Software Defect Prediction , 2014 .

[66]  Bojan Cukic,et al.  Defect Prediction between Software Versions with Active Learning and Dimensionality Reduction , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.

[67]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[68]  Sandeep Kumar,et al.  A Decision Tree Regression based Approach for the Number of Software Faults Prediction , 2016, ACM SIGSOFT Softw. Eng. Notes.

[69]  Vipul Vashisht,et al.  A Framework for Software Defect Prediction Using Neural Networks , 2015 .

[70]  Ayse Basar Bener,et al.  Defect prediction from static code features: current results, limitations, new approaches , 2010, Automated Software Engineering.

[71]  Zhaowei Shang,et al.  Tackling class overlap and imbalance problems in software defect prediction , 2018, Software Quality Journal.

[72]  Sandeep Kumar,et al.  An empirical study of some software fault prediction techniques for the number of faults prediction , 2017, Soft Comput..

[73]  R B D'Agostino,et al.  A comparison of logistic regression to decision-tree induction in a medical domain. , 1993, Computers and biomedical research, an international journal.

[74]  Ahmet Zengin,et al.  HSDD: A hybrid sampling strategy for class imbalance in defect prediction data sets , 2016, 2016 Eleventh International Conference on Digital Information Management (ICDIM).

[75]  Bruce Christianson,et al.  Using the Support Vector Machine as a Classification Method for Software Defect Prediction with Static Code Metrics , 2009, EANN.

[76]  Ping Guo,et al.  Software Defect Prediction Using Fuzzy Support Vector Regression , 2010, ISNN.

[77]  William Marsh,et al.  On the effectiveness of early life cycle defect prediction with Bayesian Nets , 2008, Empirical Software Engineering.

[78]  Filomena Ferrucci,et al.  A Genetic Algorithm to Configure Support Vector Machines for Predicting Fault-Prone Components , 2011, PROFES.

[79]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[80]  Shane McIntosh,et al.  Automated Parameter Optimization of Classification Techniques for Defect Prediction Models , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[81]  Hang Li,et al.  Cost-Sensitive Learning of SVM for Ranking , 2006, ECML.

[82]  Yinglin Wang,et al.  A genetic algorithm for optimized feature selection with resource constraints in software product lines , 2011, J. Syst. Softw..

[83]  Daoqiang Zhang,et al.  Two-Stage Cost-Sensitive Learning for Software Defect Prediction , 2014, IEEE Transactions on Reliability.

[84]  David Lo,et al.  ELBlocker: Predicting blocking bugs with ensemble imbalance learning , 2015, Inf. Softw. Technol..

[85]  Mohammad Alshayeb,et al.  Software defect prediction using ensemble learning on selected features , 2015, Inf. Softw. Technol..

[86]  Qinbao Song,et al.  Using Coding-Based Ensemble Learning to Improve Software Defect Prediction , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[87]  Tore Dybå,et al.  A systematic review of effect size in software engineering experiments , 2007, Inf. Softw. Technol..