An empirical study of some software fault prediction techniques for the number of faults prediction

During the software development process, prediction of the number of faults in software modules can be more helpful instead of predicting the modules being faulty or non-faulty. Such an approach may help in more focused software testing process and may enhance the reliability of the software system. Most of the earlier works on software fault prediction have used classification techniques for classifying software modules into faulty or non-faulty categories. The techniques such as Poisson regression, negative binomial regression, genetic programming, decision tree regression, and multilayer perceptron can be used for the prediction of the number of faults. In this paper, we present an experimental study to evaluate and compare the capability of six fault prediction techniques such as genetic programming, multilayer perceptron, linear regression, decision tree regression, zero-inflated Poisson regression, and negative binomial regression for the prediction of number of faults. The experimental investigation is carried out for eighteen software project datasets collected from the PROMISE data repository. The results of the investigation are evaluated using average absolute error, average relative error, measure of completeness, and prediction at level l measures. We also perform Kruskal–Wallis test and Dunn’s multiple comparison test to compare the relative performance of the considered fault prediction techniques.

[1]  Abhijit S. Pandya,et al.  A neural network approach for predicting software development faults , 1992, [1992] Proceedings Third International Symposium on Software Reliability Engineering.

[2]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[3]  Sandeep Kumar,et al.  A decision tree logic based recommendation system to select software fault prediction techniques , 2017, Computing.

[4]  Tim Menzies,et al.  Class level fault prediction using software clustering , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[5]  Santosh Singh Rathore,et al.  Comparative analysis of neural network and genetic programming for number of software faults prediction , 2015, 2015 National Conference on Recent Advances in Electronics & Computer Engineering (RAECE).

[6]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[7]  Sallie M. Henry,et al.  Object-oriented metrics that predict maintainability , 1993, J. Syst. Softw..

[8]  Harvey P. Siy,et al.  Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[9]  Ayse Basar Bener,et al.  Defect prediction from static code features: current results, limitations, new approaches , 2010, Automated Software Engineering.

[10]  Sandeep Kumar,et al.  Predicting Number of Faults in Software System using Genetic Programming , 2015, SCSE.

[11]  Xin Yao,et al.  Using Class Imbalance Learning for Software Defect Prediction , 2013, IEEE Transactions on Reliability.

[12]  A. Cameron,et al.  Regression Analysis of Count Data by A. Colin Cameron , 2013 .

[13]  Irfan Ahmad,et al.  Three empirical studies on predicting software maintainability using ensemble methods , 2015, Soft Comput..

[14]  N. Draper,et al.  Applied Regression Analysis: Draper/Applied Regression Analysis , 1998 .

[15]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[16]  Natalia Juristo Juzgado,et al.  Basics of Software Engineering Experimentation , 2010, Springer US.

[17]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[18]  Witold Pedrycz,et al.  Identification of defect-prone classes in telecommunication software systems using design metrics , 2006, Inf. Sci..

[19]  Cristina Marinescu,et al.  How Good Is Genetic Programming at Predicting Changes and Defects? , 2014, 2014 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing.

[20]  Venkata U. B. Challagulla,et al.  A Unified Framework for Defect Data Analysis Using the MBR Technique , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[21]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[22]  Liguo Yu,et al.  Using Negative Binomial Regression Analysis to Predict Software Faults: A Study of Apache Ant , 2012 .

[23]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[24]  Elaine J. Weyuker,et al.  Where the bugs are , 2004, ISSTA '04.

[25]  Richard Veryard The Economics of Information Systems and Software , 1991 .

[26]  Cagatay Catal,et al.  Software fault prediction: A literature review and current trends , 2011, Expert Syst. Appl..

[27]  Diane Lambert,et al.  Zero-inflacted Poisson regression, with an application to defects in manufacturing , 1992 .

[28]  Taghi M. Khoshgoftaar,et al.  Predictive Modeling Techniques of Software Quality from Software Measures , 1992, IEEE Trans. Software Eng..

[29]  Yutao Ma,et al.  An empirical study on predicting defect numbers , 2015, SEKE.

[30]  Tilo Strutz,et al.  Data Fitting and Uncertainty , 2011 .

[31]  Sandeep Kumar,et al.  A Decision Tree Regression based Approach for the Number of Software Faults Prediction , 2016, ACM SIGSOFT Softw. Eng. Notes.

[32]  D. Altman,et al.  Multiple significance tests: the Bonferroni method , 1995, BMJ.

[33]  Marian Jureczko,et al.  Significance of Different Software Metrics in Defect Prediction , 2011 .

[34]  Taghi M. Khoshgoftaar,et al.  Predicting fault-prone modules with case-based reasoning , 1997, Proceedings The Eighth International Symposium on Software Reliability Engineering.

[35]  Taghi M. Khoshgoftaar,et al.  Count Models for Software Quality Estimation , 2007, IEEE Transactions on Reliability.

[36]  Lionel C. Briand,et al.  Empirical Studies of Quality Models in Object-Oriented Systems , 2002, Adv. Comput..

[37]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[38]  Giuliano Antoniol,et al.  Evolution and Search Based Metrics to Improve Defects Prediction , 2009, 2009 1st International Symposium on Search Based Software Engineering.

[39]  J. Hilbe Negative Binomial Regression: Preface , 2007 .

[40]  J. Hilbe Negative Binomial Regression: Index , 2011 .

[41]  Taghi M. Khoshgoftaar,et al.  A Comprehensive Empirical Study of Count Models for Software Fault Prediction , 2007, IEEE Transactions on Reliability.

[42]  Alberto Bacchelli,et al.  Are Popular Classes More Defect Prone? , 2010, FASE.

[43]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[44]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[45]  W. Afzal,et al.  prediction of fault count data using genetic programming , 2008, 2008 IEEE International Multitopic Conference.

[46]  N. Draper,et al.  Applied Regression Analysis , 1967 .

[47]  Stephen F. Smith,et al.  A learning system based on genetic adaptive algorithms , 1980 .