A parallel framework for software defect detection and metric selection on cloud computing

With the continued growth of Internet of Things (IoT) and its convergence with the cloud, numerous interoperable software are being developed for cloud. Therefore, there is a growing demand to maintain a better quality of software in the cloud for improved service. This is more crucial as the cloud environment is growing fast towards a hybrid model; a combination of public and private cloud model. Considering the high volume of the available software as a service (SaaS) in the cloud, identification of non-standard software and measuring their quality in the SaaS is an urgent issue. Manual testing and determination of the quality of the software is very expensive and impossible to accomplish it to some extent. An automated software defect detection model that is capable to measure the relative quality of software and identify their faulty components can significantly reduce both the software development effort and can improve the cloud service. In this paper, we propose a software defect detection model that can be used to identify faulty components in big software metric data. The novelty of our proposed approach is that it can identify significant metrics using a combination of different filters and wrapper techniques. One of the important contributions of the proposed approach is that we designed and evaluated a parallel framework of a hybrid software defect predictor in order to deal with big software metric data in a computationally efficient way for cloud environment. Two different hybrids have been developed using Fisher and Maximum Relevance (MR) filters with a Artificial Neural Network (ANN) based wrapper in the parallel framework. The evaluations are performed with real defect-prone software datasets for all parallel versions. Experimental results show that the proposed parallel hybrid framework achieves a significant computational speedup on a computer cluster with a higher defect prediction accuracy and smaller number of software metrics compared to the independent filter or wrapper approaches.

[1]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[2]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[3]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[4]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[5]  Patrick van der Smagt,et al.  Introduction to neural networks , 1995, The Lancet.

[6]  Izzat Alsmadi,et al.  Evaluating the impact of software metrics on defects prediction. Part 2 , 2014, Comput. Sci. J. Moldova.

[7]  Christian Borgelt,et al.  Introduction to Neural Networks , 2016 .

[8]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[9]  S. Dick,et al.  Applying Novel Resampling Strategies To Software Defect Prediction , 2007, NAFIPS 2007 - 2007 Annual Meeting of the North American Fuzzy Information Processing Society.

[10]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[11]  Qinbao Song,et al.  A General Software Defect-Proneness Prediction Framework , 2011, IEEE Transactions on Software Engineering.

[12]  Carol Withrow,et al.  Prediction and control of ADA software defects , 1990, J. Syst. Softw..

[13]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[14]  Dilip Kumar Yadav,et al.  A fuzzy logic based approach for phase-wise software defects prediction using software metrics , 2015, Inf. Softw. Technol..

[15]  Richard Torkar,et al.  Software fault prediction metrics: A systematic literature review , 2013, Inf. Softw. Technol..

[16]  Ali Selamat,et al.  An empirical study based on semi-supervised hybrid self-organizing map for software fault prediction , 2015, Knowl. Based Syst..

[17]  Nader B. Ebrahimi,et al.  On the Statistical Analysis of the Number of Errors Remaining in a Software Design Document after Inspection , 1997, IEEE Trans. Software Eng..

[18]  Qinbao Song,et al.  Software defect association mining and defect correction effort prediction , 2006, IEEE Transactions on Software Engineering.

[19]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[20]  James A. Rodger,et al.  Toward reducing failure risk in an integrated vehicle health maintenance system: A fuzzy multi-sensor data fusion Kalman filter approach for IVHMS , 2012, Expert Syst. Appl..

[21]  Musa A. Mammadov,et al.  A hybrid wrapper-filter approach to detect the source(s) of out-of-control signals in multivariate manufacturing process , 2014, Eur. J. Oper. Res..

[22]  Bojan Cukic,et al.  Robust prediction of fault-proneness by random forests , 2004, 15th International Symposium on Software Reliability Engineering.

[23]  Chun-Nan Hsu,et al.  The ANNIGMA-wrapper approach to fast feature selection for neural nets , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[24]  Taghi M. Khoshgoftaar,et al.  Metric Selection for Software Defect Prediction , 2011, Int. J. Softw. Eng. Knowl. Eng..

[25]  Banu Diri,et al.  Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem , 2009, Inf. Sci..

[26]  Ebru Akcapinar Sezer,et al.  A comparison of some soft computing methods for software fault prediction , 2015, Expert Syst. Appl..

[27]  Ruchika Malhotra,et al.  A systematic review of machine learning techniques for software fault prediction , 2015, Appl. Soft Comput..

[28]  Zhan Li,et al.  A practical method for the software fault-prediction , 2007, 2007 IEEE International Conference on Information Reuse and Integration.

[29]  Vir V. Phoha,et al.  On the Feature Selection Criterion Based on an Approximation of Multidimensional Mutual Information , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  John H. Kalivas,et al.  Comparison of Forward Selection, Backward Elimination, and Generalized Simulated Annealing for Variable Selection , 1993 .

[31]  Chih-Ping Chu,et al.  Integrating in-process software defect prediction with association mining to discover defect pattern , 2009, Inf. Softw. Technol..

[32]  Jun Zheng,et al.  Cost-sensitive boosting neural networks for software defect prediction , 2010, Expert Syst. Appl..

[33]  David G. Stork,et al.  Pattern Classification , 1973 .

[34]  Taghi M. Khoshgoftaar,et al.  Regression modelling of software quality: empirical investigation☆ , 1990 .

[35]  Yue Jiang,et al.  Misclassification cost-sensitive fault prediction models , 2009, PROMISE '09.

[36]  Mohammad Alshayeb,et al.  Software defect prediction using ensemble learning on selected features , 2015, Inf. Softw. Technol..

[37]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[38]  Cong Jin,et al.  Prediction approach of software fault-proneness based on hybrid artificial neural network and quantum particle swarm optimization , 2015, Appl. Soft Comput..

[39]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[40]  Ömer Faruk Arar,et al.  Software defect prediction using cost-sensitive neural network , 2015, Appl. Soft Comput..

[41]  Ming Zhao,et al.  A comparison between software design and code metrics for the prediction of software fault content , 1998, Inf. Softw. Technol..

[42]  T. Bayes An essay towards solving a problem in the doctrine of chances , 2003 .

[43]  Francisco Aparisi,et al.  Interpreting the Out-of-Control Signals of Multivariate Control Charts Employing Neural Networks , 2010 .