A survey on software fault detection based on different prediction approaches

One of the software engineering interests is quality assurance activities such as testing, verification and validation, fault tolerance and fault prediction. When any company does not have sufficient budget and time for testing the entire application, a project manager can use some fault prediction algorithms to identify the parts of the system that are more defect prone. There are so many prediction approaches in the field of software engineering such as test effort, security and cost prediction. Since most of them do not have a stable model, software fault prediction has been studied in this paper based on different machine learning techniques such as decision trees, decision tables, random forest, neural network, Naïve Bayes and distinctive classifiers of artificial immune systems (AISs) such as artificial immune recognition system, CLONALG and Immunos. We use four public NASA datasets to perform our experiment. These datasets are different in size and number of defective data. Distinct parameters such as method-level metrics and two feature selection approaches which are principal component analysis and correlation based feature selection are used to evaluate the finest performance among the others. According to this study, random forest provides the best prediction performance for large data sets and Naïve Bayes is a trustable algorithm for small data sets even when one of the feature selection techniques is applied. Immunos99 performs well among AIS classifiers when feature selection technique is applied, and AIRSParallel performs better without any feature selection techniques. The performance evaluation has been done based on three different metrics such as area under receiver operating characteristic curve, probability of detection and probability of false alarm. These three evaluation metrics could give the reliable prediction criteria together.

[1]  Jun Zheng,et al.  Predicting software reliability with neural network ensembles , 2009, Expert Syst. Appl..

[2]  Baowen Xu,et al.  Testing and validating machine learning classifiers by metamorphic testing , 2011, J. Syst. Softw..

[3]  Banu Diri,et al.  Software defect prediction using artificial immune recognition system , 2007 .

[4]  Jason Brownlee,et al.  Artificial immune recognition system (AIRS): a review and analysis , 2005 .

[5]  Khaled El Emam,et al.  Comparing case-based reasoning classifiers for predicting high risk software components , 2001, J. Syst. Softw..

[6]  A. B. Watkins,et al.  A resource limited artificial immune classifier , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[7]  Rattikorn Hewett,et al.  Mining software defect data to support software testing management , 2011, Applied Intelligence.

[8]  Darrel C. Ince,et al.  A critique of three metrics , 1994, J. Syst. Softw..

[9]  Shari Lawrence Pfleeger,et al.  Software metrics (2nd ed.): a rigorous and practical approach , 1997 .

[10]  Banu Diri,et al.  A Fault Prediction Model with Limited Fault Data to Improve Test Process , 2008, PROFES.

[11]  Izzat Alsmadi,et al.  Evaluating the change of software fault behavior with dataset attributes based on categorical correlation , 2011, Adv. Eng. Softw..

[12]  M.J. Khan,et al.  Software quality prediction techniques: A comparative analysis , 2008, 2008 4th International Conference on Emerging Technologies.

[13]  Banu Diri,et al.  A systematic review of software fault prediction studies , 2009, Expert Syst. Appl..

[14]  Ayse Basar Bener,et al.  Data mining source code for locating software bugs: A case study in telecommunication industry , 2009, Expert Syst. Appl..

[15]  P. Sandhu,et al.  Prediction of Level of Severity of Faults in Software Systems using Density Based Clustering , 2022 .

[16]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[17]  Robert X. Gao,et al.  PCA-based feature selection scheme for machine defect classification , 2004, IEEE Transactions on Instrumentation and Measurement.

[18]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[19]  Edward B. Allen,et al.  GP-based software quality prediction , 1998 .

[20]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[21]  Shari Lawrence Pfleeger,et al.  Software Metrics : A Rigorous and Practical Approach , 1998 .

[22]  ZhangHongyu,et al.  Comments on "Data Mining Static Code Attributes to Learn Defect Predictors" , 2007 .

[23]  Taghi M. Khoshgoftaar,et al.  An application of fuzzy clustering to software quality prediction , 2000, Proceedings 3rd IEEE Symposium on Application-Specific Systems and Software Engineering Technology.

[24]  Jonathan Timmis,et al.  Artificial Immune Recognition System (AIRS): An Immune-Inspired Supervised Learning Algorithm , 2004, Genetic Programming and Evolvable Machines.

[25]  Banu Diri,et al.  Software Fault Prediction with Object-Oriented Metrics Based Artificial Immune Recognition System , 2007, PROFES.

[26]  Xiuzhen Zhang,et al.  Comments on "Data Mining Static Code Attributes to Learn Defect Predictors" , 2007, IEEE Trans. Software Eng..

[27]  Jason Brownlee,et al.  Immunos-81 : the misunderstood artificial immune system , 2005 .

[28]  Irena Koprinska,et al.  Learning to classify e-mail , 2007, Inf. Sci..

[29]  Hongfang Liu,et al.  Building effective defect-prediction models in practice , 2005, IEEE Software.

[30]  Cagatay Catal,et al.  Software fault prediction: A literature review and current trends , 2011, Expert Syst. Appl..

[31]  Andrew Watkins,et al.  Exploiting immunological metaphors in the development of serial, parallel and distributed learning algorithms , 2005 .

[32]  Taghi M. Khoshgoftaar,et al.  An empirical study of predicting software faults with case-based reasoning , 2006, Software Quality Journal.

[33]  Tong-Seng Quah,et al.  Application of neural networks for software quality prediction using object-oriented metrics , 2005, J. Syst. Softw..

[34]  Ayse Basar Bener,et al.  Analysis of Naive Bayes' assumptions on software fault data: An empirical study , 2009, Data Knowl. Eng..

[35]  Steven R. Rakitin,et al.  Software verification and validation for practitioners and managers , 2001 .

[36]  Banu Diri,et al.  Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem , 2009, Inf. Sci..

[37]  Jason Brownlee,et al.  Clonal selection theory and Clonalg: the clonal selection classification algorithm (CSCA) , 2005 .

[38]  Inci Batmaz,et al.  A review of data mining applications for quality improvement in manufacturing industry , 2011, Expert Syst. Appl..