Effect of Feature Selection in Software Fault Detection

The quality of software is enormously affected by the faults associated with it. Detection of faults at a proper stage in software development is a challenging task and plays a vital role in the quality of the software. Machine learning is, now a days, a commonly used technique for fault detection and prediction. However, the effectiveness of the fault detection mechanism is impacted by the number of attributes in the publicly available datasets. Feature selection is the process of selecting a subset of all the features that are most influential to the classification and it is a challenging task. This paper thoroughly investigates the effect of various feature selection techniques on software fault classification by using NASA’s some benchmark publicly available datasets. Various metrics are used to analyze the performance of the feature selection techniques. The experiment discovers that the most important and relevant features can be selected by the adopted feature selection techniques without sacrificing the performance of fault detection.

[1]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[2]  Zhi-Hua Zhou,et al.  Software defect detection with rocus , 2011 .

[3]  M. McHugh,et al.  The Chi-square test of independence , 2013, Biochemia medica.

[4]  Yuxiang Shen,et al.  Applying Feature Selection to Software Defect Prediction Using Multi-objective Optimization , 2017, 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC).

[5]  Divya Tomar,et al.  A Feature Selection Based Model for Software Defect Prediction , 2014 .

[6]  Qinbao Song,et al.  Data Quality: Some Comments on the NASA Software Defect Datasets , 2013, IEEE Transactions on Software Engineering.

[7]  C. Y. Peng,et al.  An Introduction to Logistic Regression Analysis and Reporting , 2002 .

[8]  Kumar Rajnish,et al.  Software Fault Prediction with Data Mining Techniques by Using Feature Selection Based Models , 2018, International Journal on Electrical Engineering and Informatics.

[9]  Sai Peck Lee,et al.  Integrated Approach to Software Defect Prediction , 2017, IEEE Access.

[10]  Mohamed Medhat Gaber,et al.  Random forests: from early developments to recent advancements , 2014 .

[11]  Bruce Christianson,et al.  The misuse of the NASA metrics data program data sets for automated software defect prediction , 2011, EASE.

[12]  Ahmed H. Yousef Extracting software static defect models using data mining , 2015 .

[13]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[14]  Lina Jia,et al.  A Hybrid Feature Selection Method for Software Defect Prediction , 2018, IOP Conference Series: Materials Science and Engineering.

[15]  Romi Satria Wahono,et al.  Genetic Feature Selection for Software Defect Prediction , 2014 .

[16]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[17]  Manish Mishra,et al.  A view of Artificial Neural Network , 2014, 2014 International Conference on Advances in Engineering & Technology Research (ICAETR - 2014).

[18]  M. Anbu,et al.  Feature selection using firefly algorithm in software defect prediction , 2017, Cluster Computing.

[19]  T. Crack A Note on Karl Pearson’s 1900 Chi-Squared Test: Two Derivations of the Asymptotic Distribution, and Uses in Goodness of Fit and Contingency Tests of Independence, and a Comparison with the Exact Sample Variance Chi-Square Result , 2018 .

[20]  Michel R. V. Chaudron,et al.  Assessing UML design metrics for predicting fault-prone classes in a Java system , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[21]  Manju Khari,et al.  Empirical Study of Software Defect Prediction: A Systematic Mapping , 2019, Symmetry.

[22]  Richa Singhal,et al.  Chi-square test and its application in hypothesis testing , 2015 .

[23]  Sarika Jain,et al.  Feature selection in software defect prediction: A comparative study , 2016, 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence).

[24]  Amjad Hudaib,et al.  Software Defect Prediction using Feature Selection and Random Forest Algorithm , 2017, 2017 International Conference on New Trends in Computing Sciences (ICTCS).

[25]  Bahman Arasteh,et al.  Software Fault-Prediction using Combination of Neural Network and Naive Bayes Algorithm , 2018, Journal of Networking Technology.

[26]  Qinbao Song,et al.  A General Software Defect-Proneness Prediction Framework , 2011, IEEE Transactions on Software Engineering.

[27]  Jin Liu,et al.  MICHAC: Defect Prediction via Feature Selection Based on Maximal Information Coefficient with Hierarchical Agglomerative Clustering , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[28]  Shujuan Jiang,et al.  A feature selection approach based on a similarity measure for software defect prediction , 2017, Frontiers of Information Technology & Electronic Engineering.