Holistic Parameter Optimization for Software Defect Prediction

A software defect prediction (SDP) model identifies the defect-prone modules. Setting appropriate parameters in an SDP model is critical because it affects the model performance. In a recent study, parameters were automatically explored using an optimization algorithm. However, such studies did not explore all the parameters that could be handled in the SDP process from preprocessing to model building, but only optimized parameters in some modeling process steps, such as feature selection or model building. Our goal is to improve the model performance by optimizing parameters across the entire SDP process. For this, we propose a cost-sensitive decision tree based on harmony search (HS-CSDT). HS-CSDT uses a harmony search algorithm to simultaneously identify the optimal feature set, regularization technique, class weight, and decision tree hyperparameters. We compared HS-CSDT against the methods in related studies in terms of probability of detection, probability of false alarm, G-measure, and file inspection reduction in the evaluation of 28 open-source projects. The results of the effect size using Cohen’s d reveal that HS-CSDT provides a statistically better performance than methods in related work. Experimental results show that optimizing the identified parameters throughout the entire SDP modeling process by using the optimization algorithm helps improve the model performance. In summary, HS-CSDT shows excellent defect prediction performance by automatically allocating an appropriate parameter set according to the software project. Thus, the model can help effectively allocate limited quality assurance resources.

[1]  Kun Zhu,et al.  Software defect prediction based on enhanced metaheuristic feature selection optimization and a hybrid deep neural network , 2021, J. Syst. Softw..

[2]  Jongmoon Baik,et al.  HASPO: Harmony Search-Based Parameter Optimization for Just-in-Time Software Defect Prediction in Maritime Software , 2021, Applied Sciences.

[3]  Yan Xiao,et al.  COSTE: Complexity-based OverSampling TEchnique to alleviate the class imbalance problem in software defect prediction , 2021, Inf. Softw. Technol..

[4]  Jaechang Nam,et al.  Deep Semantic Feature Learning for Software Defect Prediction , 2020, IEEE Transactions on Software Engineering.

[5]  Amal Alazba,et al.  Software defect prediction using tree-based ensembles , 2020, PROMISE.

[6]  Xiao-Yuan Jing,et al.  Cross-Project Defect Prediction via Semi-Supervised Discriminative Feature Learning , 2020, IEICE Trans. Inf. Syst..

[7]  Xuesong Li,et al.  Deep learning based software defect prediction , 2020, Neurocomputing.

[8]  Jayadev Gyani,et al.  Class Imbalance Reduction (CIR): A Novel Approach to Software Defect Prediction in the Presence of Class Imbalance , 2020, Symmetry.

[9]  Shujuan Jiang,et al.  A Novel Class-Imbalance Learning Approach for Both Within-Project and Cross-Project Defect Prediction , 2020, IEEE Transactions on Reliability.

[10]  Hamza Turabieh,et al.  Enhanced Binary Moth Flame Optimization as a Feature Selection Algorithm to Predict Software Fault Prediction , 2020, IEEE Access.

[11]  Qinbao Song,et al.  A Comprehensive Investigation of the Role of Imbalanced Learning for Software Defect Prediction , 2019, IEEE Transactions on Software Engineering.

[12]  Abeer Alsadoon,et al.  A novel modified undersampling (MUS) technique for software defect prediction , 2019, Comput. Intell..

[13]  Shuib Basri,et al.  Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method Approach , 2019, Applied Sciences.

[14]  Chakkrit Tantithamthavorn,et al.  Mining Software Defects: Should We Consider Affected Releases? , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[15]  Akito Monden,et al.  On the relative value of data resampling approaches for software defect prediction , 2018, Empirical Software Engineering.

[16]  Akito Monden,et al.  MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction , 2018, IEEE Transactions on Software Engineering.

[17]  Xiao-Yuan Jing,et al.  Progress on approaches to software defect prediction , 2018, IET Softw..

[18]  Shane McIntosh,et al.  The Impact of Automated Parameter Optimization on Defect Prediction Models , 2018, IEEE Transactions on Software Engineering.

[19]  Fernando Bação,et al.  Oversampling for Imbalanced Learning Based on K-Means and SMOTE , 2017, Inf. Sci..

[20]  Amjad Hudaib,et al.  Software Defect Prediction using Feature Selection and Random Forest Algorithm , 2017, 2017 International Conference on New Trends in Computing Sciences (ICTCS).

[21]  Burak Turhan,et al.  A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction , 2017, Inf. Softw. Technol..

[22]  Tim Menzies,et al.  Is "Better Data" Better Than "Better Data Miners"? , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[23]  Jongmoon Baik,et al.  Effective multi-objective naïve Bayes learning for cross-project defect prediction , 2016, Appl. Soft Comput..

[24]  Tim Menzies,et al.  Why is Differential Evolution Better than Grid Search for Tuning Defect Predictors? , 2016, ArXiv.

[25]  Tim Menzies,et al.  What is wrong with topic modeling? And how to fix it using search-based software engineering , 2016, Inf. Softw. Technol..

[26]  Gerardo Canfora,et al.  Defect prediction as a multiobjective optimization problem , 2015, Softw. Test. Verification Reliab..

[27]  Franz Wotawa,et al.  A Novel Industry Grade Dataset for Fault Prediction Based on Model-Driven Developed Automotive Embedded Software , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[28]  Mohammad Alshayeb,et al.  Software defect prediction using ensemble learning on selected features , 2015, Inf. Softw. Technol..

[29]  Daoqiang Zhang,et al.  Two-Stage Cost-Sensitive Learning for Software Defect Prediction , 2014, IEEE Transactions on Reliability.

[30]  K. Murase,et al.  MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning , 2014, IEEE Transactions on Knowledge and Data Engineering.

[31]  Qinbao Song,et al.  Data Quality: Some Comments on the NASA Software Defect Datasets , 2013, IEEE Transactions on Software Engineering.

[32]  Gerardo Canfora,et al.  Multi-objective Cross-Project Defect Prediction , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[33]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[34]  Laurie A. Williams,et al.  Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities , 2011, IEEE Transactions on Software Engineering.

[35]  Rongxin Wu,et al.  ReLink: recovering links between bugs and changes , 2011, ESEC/FSE '11.

[36]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[37]  Taghi M. Khoshgoftaar,et al.  A Comparative Study of Ensemble Feature Selection Techniques for Software Defect Prediction , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[38]  Lech Madeyski,et al.  Towards identifying software project clusters with regard to defect prediction , 2010, PROMISE '10.

[39]  Mark Harman,et al.  The relationship between search based software engineering and predictive modeling , 2010, PROMISE '10.

[40]  Emilia Mendes,et al.  How effective is Tabu search to configure support vector regression for effort estimation? , 2010, PROMISE '10.

[41]  Ying He,et al.  MSMOTE: Improving Classification Performance When Training Data is Imbalanced , 2009, 2009 Second International Workshop on Computer Science and Engineering.

[42]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[43]  Diomidis Spinellis,et al.  Tool Writing: A Forgotten Art? , 2005, IEEE Softw..

[44]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[45]  Zong Woo Geem,et al.  A New Heuristic Optimization Algorithm: Harmony Search , 2001, Simul..

[46]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[47]  Philip H. Swain,et al.  Purdue e-Pubs , 2022 .

[48]  Jongmoon Baik,et al.  Effective Harmony Search-Based Optimization of Cost-Sensitive Boosting for Improving the Performance of Cross-Project Defect Prediction , 2018 .

[49]  Xiang Chen,et al.  MULTI: Multi-objective effort-aware just-in-time software defect prediction , 2018, Inf. Softw. Technol..

[50]  Gerald Schaefer,et al.  Cost-sensitive decision tree ensembles for effective imbalanced classification , 2014, Appl. Soft Comput..

[51]  Romi Satria Wahono,et al.  Genetic Feature Selection for Software Defect Prediction , 2014 .

[52]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..