Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method Approach

Software Defect Prediction (SDP) models are built using software metrics derived from software systems. The quality of SDP models depends largely on the quality of software metrics (dataset) used to build the SDP models. High dimensionality is one of the data quality problems that affect the performance of SDP models. Feature selection (FS) is a proven method for addressing the dimensionality problem. However, the choice of FS method for SDP is still a problem, as most of the empirical studies on FS methods for SDP produce contradictory and inconsistent quality outcomes. Those FS methods behave differently due to different underlining computational characteristics. This could be due to the choices of search methods used in FS because the impact of FS depends on the choice of search method. It is hence imperative to comparatively analyze the FS methods performance based on different search methods in SDP. In this paper, four filter feature ranking (FFR) and fourteen filter feature subset selection (FSS) methods were evaluated using four different classifiers over five software defect datasets obtained from the National Aeronautics and Space Administration (NASA) repository. The experimental analysis showed that the application of FS improves the predictive performance of classifiers and the performance of FS methods can vary across datasets and classifiers. In the FFR methods, Information Gain demonstrated the greatest improvements in the performance of the prediction models. In FSS methods, Consistency Feature Subset Selection based on Best First Search had the best influence on the prediction models. However, prediction models based on FFR proved to be more stable than those based on FSS methods. Hence, we conclude that FS methods improve the performance of SDP models, and that there is no single best FS method, as their performance varied according to datasets and the choice of the prediction model. However, we recommend the use of FFR methods as the prediction models based on FFR are more stable in terms of predictive performance.

[1]  Jens Grabowski,et al.  A Comparative Study to Benchmark Cross-Project Defect Prediction Approaches , 2018, IEEE Transactions on Software Engineering.

[2]  Yun Yang,et al.  A novel bagging C4.5 algorithm based on wrapper feature selection for supporting wise clinical decision making , 2017, J. Biomed. Informatics.

[3]  Mohammad Ali Zare Chahooki,et al.  A Survey on semi-supervised feature selection methods , 2017, Pattern Recognit..

[4]  Bin Liu,et al.  Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning , 2017, Inf. Softw. Technol..

[5]  Xiao-Yuan Jing,et al.  Progress on approaches to software defect prediction , 2018, IET Softw..

[6]  Dilip Kumar Yadav,et al.  A fuzzy logic based approach for phase-wise software defects prediction using software metrics , 2015, Inf. Softw. Technol..

[7]  Jesús S. Aguilar-Ruiz,et al.  Attribute Selection in Software Engineering Datasets for Detecting Fault Modules , 2007, 33rd EUROMICRO Conference on Software Engineering and Advanced Applications (EUROMICRO 2007).

[8]  Simon Fong,et al.  Feature selection methods: Case of filter and wrapper approaches for maximising classification accuracy , 2018 .

[9]  Shujuan Jiang,et al.  The Performance Stability of Defect Prediction Models with Class Imbalance: An Empirical Study , 2017, IEICE Trans. Inf. Syst..

[10]  Sashank Dara,et al.  Online Defect Prediction for Imbalanced Data , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[11]  Jin Liu,et al.  The Impact of Feature Selection on Defect Prediction Performance: An Empirical Comparison , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[12]  John Yearwood,et al.  A parallel framework for software defect detection and metric selection on cloud computing , 2017, Cluster Computing.

[13]  Divya Tomar,et al.  Prediction of software defects using Twin Support Vector Machine , 2014, 2014 International Conference on Information Systems and Computer Networks (ISCON).

[14]  Osamu Mizuno,et al.  The impact of feature reduction techniques on defect prediction models , 2019, Empirical Software Engineering.

[15]  Taghi M. Khoshgoftaar,et al.  Predicting high-risk program modules by selecting the right software measurements , 2011, Software Quality Journal.

[16]  Dhruba Kumar Bhattacharyya,et al.  An effective ensemble classification framework using random forests and a correlation based feature selection technique , 2017, Trans. GIS.

[17]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[18]  Baowen Xu,et al.  An Improved SDA Based Defect Prediction Framework for Both Within-Project and Cross-Project Class-Imbalance Problems , 2017, IEEE Transactions on Software Engineering.

[19]  Amjad Hudaib,et al.  Software Defect Prediction using Feature Selection and Random Forest Algorithm , 2017, 2017 International Conference on New Trends in Computing Sciences (ICTCS).

[20]  Naoyasu Ubayashi,et al.  Studying just-in-time defect prediction using cross-project models , 2015, Empirical Software Engineering.

[21]  Shane McIntosh,et al.  A Large-Scale Study of the Impact of Feature Selection Techniques on Defect Classification Models , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[22]  Shari Lawrence Pfleeger,et al.  Software Metrics : A Rigorous and Practical Approach , 1998 .

[23]  Samina Khalid,et al.  A survey of feature selection and feature extraction techniques in machine learning , 2014, 2014 Science and Information Conference.

[24]  Ömer Faruk Arar,et al.  Software defect prediction using cost-sensitive neural network , 2015, Appl. Soft Comput..

[25]  Abdullateef Oluwagbemiga Balogun,et al.  Software Defect Prediction Using Ensemble Learning: An ANP Based Evaluation Method , 2018, FUOYE Journal of Engineering and Technology.

[26]  John Yearwood,et al.  A Framework for Software Defect Prediction and Metric Selection , 2018, IEEE Access.

[27]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[28]  Ying Zou,et al.  Cross-Project Defect Prediction Using a Connectivity-Based Unsupervised Classifier , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[29]  Qinbao Song,et al.  Data Quality: Some Comments on the NASA Software Defect Datasets , 2013, IEEE Transactions on Software Engineering.

[30]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[31]  Atul Gupta,et al.  A comparative study of feature-ranking and feature-subset selection techniques for improved fault prediction , 2014, ISEC '14.

[32]  Lalita Bhanu Murthy Neti,et al.  Impact of Feature Selection Techniques on Bug Prediction Models , 2015, ISEC.

[33]  Richard Torkar,et al.  Towards Benchmarking Feature Subset Selection Methods for Software Fault Prediction , 2016, Computational Intelligence and Quantitative Software Engineering.

[34]  Arunkumar Chinnaswamy,et al.  Hybrid Feature Selection Using Correlation Coefficient and Particle Swarm Optimization on Microarray Gene Expression Data , 2015, IBICA.

[35]  Nilanjan Dey,et al.  Robust feature selection algorithm based on transductive SVM wrapper and genetic algorithm: application on computer-aided glaucoma classification , 2018, Int. J. Intell. Syst. Technol. Appl..

[36]  Songyot Nakariyakul,et al.  High-dimensional hybrid feature selection using interaction information-guided search , 2018, Knowl. Based Syst..

[37]  C. Arun Kumar,et al.  A Comparative Performance Evaluation of Supervised Feature Selection Algorithms on Microarray Datasets , 2017 .

[38]  Taghi M. Khoshgoftaar,et al.  Choosing software metrics for defect prediction: an investigation on feature selection techniques , 2011, Softw. Pract. Exp..

[39]  A. G. Akintola,et al.  Comparative Analysis of Selected Heterogeneous Classifiers for Software Defects Prediction Using Filter-Based Feature Selection Methods , 2018 .

[40]  Mohamed Idhammad,et al.  A hybrid filter-wrapper feature selection method for DDoS detection in cloud computing , 2018, Intell. Data Anal..

[41]  Bojana Dalbelo Basic,et al.  Stability of Software Defect Prediction in Relation to Levels of Data Imbalance , 2013, SQAMIA.

[42]  R. G. Jimoh,et al.  A PROMETHEE based evaluation of software defect predictors , 2018 .

[43]  Shane McIntosh,et al.  An Empirical Comparison of Model Validation Techniques for Defect Prediction Models , 2017, IEEE Transactions on Software Engineering.