A Framework for Software Defect Prediction and Metric Selection

Automated software defect prediction is an important and fundamental activity in the domain of software development. However, modern software systems are inherently large and complex with numerous correlated metrics that capture different aspects of the software components. This large number of correlated metrics makes building a software defect prediction model very complex. Thus, identifying and selecting a subset of metrics that enhance the software defect prediction method’s performance are an important but challenging problem that has received little attention in the literature. The main objective of this paper is to identify significant software metrics, to build and evaluate an automated software defect prediction model. We propose two novel hybrid software defect prediction models to identify the significant attributes (metrics) using a combination of wrapper and filter techniques. The novelty of our approach is that it embeds the metric selection and training processes of software defect prediction as a single process while reducing the measurement overhead significantly. Different wrapper approaches were combined, including SVM and ANN, with a maximum relevance filter approach to find the significant metrics. A filter score was injected into the wrapper selection process in the proposed approaches to direct the search process efficiently to identify significant metrics. Experimental results with real defect-prone software data sets show that the proposed hybrid approaches achieve significantly compact metrics (i.e., selecting the most significant metrics) with high prediction accuracy compared with conventional wrapper or filter approaches. The performance of the proposed framework has also been verified using a statistical multivariate quality control process using multivariate exponentially weighted moving average. The proposed framework demonstrates that the hybrid heuristic can guide the metric selection process in a computationally efficient way by integrating the intrinsic characteristics from the filters into the wrapper and using the advantages of both the filter and wrapper approaches.

[1]  Qinbao Song,et al.  A General Software Defect-Proneness Prediction Framework , 2011, IEEE Transactions on Software Engineering.

[2]  T. Bayes An essay towards solving a problem in the doctrine of chances , 2003 .

[3]  Francisco Aparisi,et al.  Interpreting the Out-of-Control Signals of Multivariate Control Charts Employing Neural Networks , 2010 .

[4]  Ali Selamat,et al.  An empirical study based on semi-supervised hybrid self-organizing map for software fault prediction , 2015, Knowl. Based Syst..

[5]  Patrick van der Smagt,et al.  Introduction to neural networks , 1995, The Lancet.

[6]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[7]  Vir V. Phoha,et al.  On the Feature Selection Criterion Based on an Approximation of Multidimensional Mutual Information , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Izzat Alsmadi,et al.  Evaluating the impact of software metrics on defects prediction. Part 2 , 2014, Comput. Sci. J. Moldova.

[9]  Niclas Ohlsson,et al.  Predicting Fault-Prone Software Modules in Telephone Switches , 1996, IEEE Trans. Software Eng..

[10]  James A. Rodger,et al.  Toward reducing failure risk in an integrated vehicle health maintenance system: A fuzzy multi-sensor data fusion Kalman filter approach for IVHMS , 2012, Expert Syst. Appl..

[11]  S. Dick,et al.  Applying Novel Resampling Strategies To Software Defect Prediction , 2007, NAFIPS 2007 - 2007 Annual Meeting of the North American Fuzzy Information Processing Society.

[12]  Ebru Akcapinar Sezer,et al.  A comparison of some soft computing methods for software fault prediction , 2015, Expert Syst. Appl..

[13]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[14]  Bart Baesens,et al.  Comprehensible software fault and effort prediction: A data mining approach , 2015, J. Syst. Softw..

[15]  Maurice H. Halstead,et al.  Elements of software science (Operating and programming systems series) , 1977 .

[16]  Dilip Kumar Yadav,et al.  A fuzzy logic based approach for phase-wise software defects prediction using software metrics , 2015, Inf. Softw. Technol..

[17]  Sallie M. Henry,et al.  Object-oriented metrics that predict maintainability , 1993, J. Syst. Softw..

[18]  Chih-Ping Chu,et al.  Integrating in-process software defect prediction with association mining to discover defect pattern , 2009, Inf. Softw. Technol..

[19]  Ruchika Malhotra,et al.  A systematic review of machine learning techniques for software fault prediction , 2015, Appl. Soft Comput..

[20]  Mohammad Alshayeb,et al.  Software defect prediction using ensemble learning on selected features , 2015, Inf. Softw. Technol..

[21]  Hongyu Zhang,et al.  An investigation of the relationships between lines of code and defects , 2009, 2009 IEEE International Conference on Software Maintenance.

[22]  Chun-Nan Hsu,et al.  The ANNIGMA-wrapper approach to fast feature selection for neural nets , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[23]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[24]  Christian Borgelt,et al.  Introduction to Neural Networks , 2016 .

[25]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[26]  Alok Mishra,et al.  Experience in Predicting Fault-Prone Software Modules Using Complexity Metrics , 2012 .

[27]  Ömer Faruk Arar,et al.  Software defect prediction using cost-sensitive neural network , 2015, Appl. Soft Comput..

[28]  Ming Zhao,et al.  A comparison between software design and code metrics for the prediction of software fault content , 1998, Inf. Softw. Technol..

[29]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[30]  Xiao Liu,et al.  An empirical study on software defect prediction with a simplified metric set , 2014, Inf. Softw. Technol..

[31]  Musa A. Mammadov,et al.  A hybrid wrapper-filter approach to detect the source(s) of out-of-control signals in multivariate manufacturing process , 2014, Eur. J. Oper. Res..

[32]  Richard Torkar,et al.  Software fault prediction metrics: A systematic literature review , 2013, Inf. Softw. Technol..

[33]  Banu Diri,et al.  Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem , 2009, Inf. Sci..

[34]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..