A Framework for Software Defect Prediction Using Feature Selection and Ensemble Learning Techniques

Testing is one of the crucial activities of software development life cycle which ensures the delivery of high quality product. As software testing consumes significant amount of resources so, if, instead of all software modules, only those are thoroughly tested which are likely to be defective then a high quality software can be delivered at lower cost. Software defect prediction, which has now become an essential part of software testing, can achieve this goal. This research presents a framework for software defect prediction by using feature selection and ensemble learning techniques. The framework consists of four stages: 1) Dataset Selection, 2) Pre Processing, 3) Classification, and 4) Reflection of Results. The framework is implemented on six publically available Cleaned NASA MDP datasets and performance is reflected by using various measures including: F-measure, Accuracy, MCC and ROC. First the performance of all search methods within the framework on each dataset is compared with each other and the method with highest score in each performance measure is identified. Secondly, the results of proposed framework with all search methods are compared with the results of 10 well-known supervised classification techniques. The results reflect that the proposed framework outperformed all of other classification techniques.

[1]  Siti Mariyam Shamsuddin,et al.  Handling Class Imbalance in Credit Card Fraud using Resampling Methods , 2018 .

[2]  Munir Ahmad,et al.  Sentiment Analysis of Tweets using SVM , 2017 .

[3]  Faseeha Matloob,et al.  Performance Analysis of Resampling Techniques on Class Imbalance Issue in Software Defect Prediction , 2019, International Journal of Information Technology and Computer Science.

[4]  Qinbao Song,et al.  Data Quality: Some Comments on the NASA Software Defect Datasets , 2013, IEEE Transactions on Software Engineering.

[5]  Munir Ahmad,et al.  Sentiment Analysis using SVM: A Systematic Literature Review , 2018 .

[6]  Munir Ahmad,et al.  Rainfall Prediction in Lahore City using Data Mining Techniques , 2018 .

[7]  Shabib Aftab,et al.  A Feed-Forward and Pattern Recognition ANN Model for Network Intrusion Detection , 2019, International Journal of Computer Network and Information Security.

[8]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[9]  Israr Ullah,et al.  A Classification Framework to Detect DoS Attacks , 2019, International Journal of Computer Network and Information Security.

[10]  Suresh N. Mali,et al.  A Hybrid Approach for Class Imbalance Problem in Customer Churn Prediction: A Novel Extension to Under-sampling , 2018 .

[11]  Israr Ullah,et al.  A Feature Selection based Ensemble Classification Framework for Software Defect Prediction , 2019, International Journal of Modern Education and Computer Science.

[12]  Munir Ahmad,et al.  Analyzing the Performance of SVM for Polarity Detection with Different Datasets , 2017 .

[13]  C. Manjula,et al.  Deep neural network based hybrid approach for software defect prediction using software metrics , 2018, Cluster Computing.

[14]  Ebru Akcapinar Sezer,et al.  A comparison of some soft computing methods for software fault prediction , 2015, Expert Syst. Appl..

[15]  Filippo Lanubile,et al.  Comparing models for identifying fault-prone software components , 1995, SEKE.

[16]  Shane McIntosh,et al.  Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[17]  Karim O. Elish,et al.  Predicting defect-prone software modules using support vector machines , 2008, J. Syst. Softw..

[18]  José Javier Dolado,et al.  Preliminary comparison of techniques for dealing with imbalance in software defect prediction , 2014, EASE '14.

[19]  Munir Ahmad,et al.  Rainfall Prediction using Data Mining Techniques: A Systematic Literature Review , 2018 .

[20]  Munir Ahmad,et al.  Performance Analysis of Machine Learning Techniques on Software Defect Prediction using NASA Datasets , 2019, International Journal of Advanced Computer Science and Applications.

[21]  Zsuzsanna Marian,et al.  Software defect prediction using relational association rule mining , 2014, Inf. Sci..

[22]  John Yearwood,et al.  A Framework for Software Defect Prediction and Metric Selection , 2018, IEEE Access.

[23]  Munir Ahmad,et al.  SVM Optimization for Sentiment Analysis , 2018 .

[24]  J C Riquelme,et al.  Finding Defective Modules from Highly Unbalanced Datasets , 2008 .