Software fault prediction: A literature review and current trends

Software engineering discipline contains several prediction approaches such as test effort prediction, correction cost prediction, fault prediction, reusability prediction, security prediction, effort prediction, and quality prediction. However, most of these prediction approaches are still in preliminary phase and more research should be conducted to reach robust models. Software fault prediction is the most popular research area in these prediction approaches and recently several research centers started new projects on this area. In this study, we investigated 90 software fault prediction papers published between year 1990 and year 2009 and then we categorized these papers according to the publication year. This paper surveys the software engineering literature on software fault prediction and both machine learning based and statistical based approaches are included in this survey. Papers explained in this article reflect the outline of what was published so far, but naturally this is not a complete review of all the papers published so far. This paper will help researchers to investigate the previous studies from metrics, methods, datasets, performance evaluation metrics, and experimental results perspectives in an easy and effective manner. Furthermore, current trends are introduced and discussed.

[1]  Edward B. Allen,et al.  GP-based software quality prediction , 1998 .

[2]  Lars Lundberg,et al.  Statistical models vs. expert estimation for fault prediction in modified code - an industrial case study , 2007, J. Syst. Softw..

[3]  Letha H. Etzkorn,et al.  Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes , 2007, IEEE Transactions on Software Engineering.

[4]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[5]  Bojan Cukic Predicting Fault-Proneness : Do We Finally Know How ? , 2006 .

[6]  John C. Munson,et al.  Building high‐quality software fault predictors , 2006, Softw. Pract. Exp..

[7]  Filippo Lanubile,et al.  Comparing models for identifying fault-prone software components , 1995, SEKE.

[8]  Stephen R. Schach,et al.  Prediction of Run-Time Failures Using Static Product Quality Metrics , 2004, Software Quality Journal.

[9]  Javam C. Machado,et al.  The prediction of faulty classes using object-oriented design metrics , 2001, J. Syst. Softw..

[10]  Taghi M. Khoshgoftaar,et al.  A Comprehensive Empirical Study of Count Models for Software Fault Prediction , 2007, IEEE Transactions on Reliability.

[11]  Raed Shatnawi,et al.  The effectiveness of software metrics in identifying error-prone classes in post-release software evolution process , 2008, J. Syst. Softw..

[12]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[13]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[14]  Qian Yin,et al.  Software quality prediction using Affinity Propagation algorithm , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[15]  Sandro Morasca,et al.  Towards Industrially Relevant Fault-Proneness Models , 2003, Int. J. Softw. Eng. Knowl. Eng..

[16]  Adam A. Porter,et al.  Empirically guided software development using metric-based classification trees , 1990, IEEE Software.

[17]  Gary D. Boetticher,et al.  How to Predict More with Less Defect Prediction Using Machine Learners in an Implicitly Data Starved Domain , 2005 .

[18]  Taghi M. Khoshgoftaar,et al.  An empirical study of the impact of count models predictions on module-order models , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[19]  Ayse Basar Bener,et al.  Analysis of Naive Bayes' assumptions on software fault data: An empirical study , 2009, Data Knowl. Eng..

[20]  M.J. Khan,et al.  Software quality prediction techniques: A comparative analysis , 2008, 2008 4th International Conference on Emerging Technologies.

[21]  Ioannis Stamelos,et al.  Regression via Classification applied on software defect estimation , 2008, Expert Syst. Appl..

[22]  Bo Yu,et al.  Feature Selection and Clustering in Software Quality Prediction , 2007, EASE.

[23]  Richard C. Holt,et al.  The top ten list: dynamic fault prediction , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[24]  Du Zhang,et al.  Machine Learning Application in Software Engineering , 2005 .

[25]  CountDeclInstance X VariablePrivate,et al.  Fault Prediction in Object-Oriented Software Using Neural Network Techniques , 2004 .

[26]  Venkata U. B. Challagulla,et al.  Empirical assessment of machine learning based software defect prediction techniques , 2005, 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems.

[27]  Yuming Zhou,et al.  Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults , 2006, IEEE Transactions on Software Engineering.

[28]  Venkata U. B. Challagulla,et al.  Empirical Assessment of Machine Learning Based Software Defect Prediction Techniques , 2008, Int. J. Artif. Intell. Tools.

[29]  Akif Günes Koru,et al.  An empirical comparison and characterization of high defect and high complexity modules , 2003, J. Syst. Softw..

[30]  Banu Diri,et al.  Software Fault Prediction of Unlabeled Program Modules , 2009 .

[31]  S. Kanmani,et al.  Object oriented software quality prediction using general regression neural networks , 2004, SOEN.

[32]  Marek Reformat A Fuzzy-Based Meta-model for Reasoning about Number of Software Defects , 2003, IFSA.

[33]  Taghi M. Khoshgoftaar,et al.  Tree-based software quality estimation models for fault prediction , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[34]  Taghi M. Khoshgoftaar,et al.  Prediction of software faults using fuzzy nonlinear regression modeling , 2000, Proceedings. Fifth IEEE International Symposium on High Assurance Systems Engineering (HASE 2000).

[35]  Atchara Mahaweerawat,et al.  Adaptive Self-Organizing Map Clustering for Software Fault Prediction , 2007 .

[36]  Taghi M. Khoshgoftaar,et al.  Modeling software quality: the Software Measurement Analysis and Reliability Toolkit , 2000, Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000.

[37]  Stan Matwin,et al.  Machine Learning Method for Software Quality Model Building , 1999, ISMIS.

[38]  Bo Yu,et al.  Extract rules from software quality prediction model based on neural network , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[39]  Peter Kokol,et al.  Estimating Software Quality with Advanced Data Mining Techniques , 2006, 2006 International Conference on Software Engineering Advances (ICSEA'06).

[40]  Ayse Basar Bener,et al.  Validation of network measures as indicators of defective modules in software systems , 2009, PROMISE '09.

[41]  Banu Diri,et al.  Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem , 2009, Inf. Sci..

[42]  Elaine J. Weyuker,et al.  Automating algorithms for the identification of fault-prone files , 2007, ISSTA '07.

[43]  Hongfang Liu,et al.  An investigation of the effect of module size on defect prediction using static measures , 2005, PROMISE@ICSE.

[44]  Oral Alan,et al.  An outlier detection algorithm based on object-oriented metrics thresholds , 2009, 2009 24th International Symposium on Computer and Information Sciences.

[45]  Tim Menzies,et al.  Assessing Predictors of Software Defects , 2004 .

[46]  Taghi M. Khoshgoftaar,et al.  Software Quality Classification Modeling Using the SPRINT Decision Tree Algorithm , 2003, Int. J. Artif. Intell. Tools.

[47]  Hongfang Liu,et al.  Identifying and characterizing change-prone classes in two large-scale open-source products , 2007, J. Syst. Softw..

[48]  Xiuzhen Zhang,et al.  Comments on "Data Mining Static Code Attributes to Learn Defect Predictors" , 2007, IEEE Trans. Software Eng..

[49]  Victor R. Basili,et al.  Developing Interpretable Models with Optimized Set Reduction for Identifying High-Risk Software Components , 1993, IEEE Trans. Software Eng..

[50]  Hongfang Liu,et al.  Building effective defect-prediction models in practice , 2005, IEEE Software.

[51]  Taghi M. Khoshgoftaar,et al.  An application of zero-inflated Poisson regression for software fault prediction , 2001, Proceedings 12th International Symposium on Software Reliability Engineering.

[52]  Lars Lundberg,et al.  The accuracy of early fault prediction in modified code , 2005 .

[53]  Bojan Cukic,et al.  Predicting fault prone modules by the Dempster-Shafer belief networks , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[54]  Taghi M. Khoshgoftaar,et al.  Unsupervised learning for expert-based software quality estimation , 2004, Eighth IEEE International Symposium on High Assurance Systems Engineering, 2004. Proceedings..

[55]  Tim Menzies,et al.  How good is your blind spot sampling policy , 2004, Eighth IEEE International Symposium on High Assurance Systems Engineering, 2004. Proceedings..

[56]  Taghi M. Khoshgoftaar,et al.  An empirical study of predicting software faults with case-based reasoning , 2006, Software Quality Journal.

[57]  Taghi M. Khoshgoftaar,et al.  An application of fuzzy clustering to software quality prediction , 2000, Proceedings 3rd IEEE Symposium on Application-Specific Systems and Software Engineering Technology.

[58]  W. Pedrycz,et al.  Software quality prediction using median-adjusted class labels , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[59]  Tong-Seng Quah,et al.  Application of neural networks for software quality prediction using object-oriented metrics , 2005, J. Syst. Softw..

[60]  J C Riquelme,et al.  Finding Defective Modules from Highly Unbalanced Datasets , 2008 .

[61]  Venkata U. B. Challagulla,et al.  A Unified Framework for Defect Data Analysis Using the MBR Technique , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[62]  Zhan Li,et al.  A practical method for the software fault-prediction , 2007, 2007 IEEE International Conference on Information Reuse and Integration.

[63]  Taghi M. Khoshgoftaar,et al.  Assessment of a New Three-Group Software Quality Classification Technique: An Empirical Case Study , 2005, Empirical Software Engineering.

[64]  Taghi M. Khoshgoftaar,et al.  Improving usefulness of software quality classification models based on Boolean discriminant functions , 2002, 13th International Symposium on Software Reliability Engineering, 2002. Proceedings..

[65]  Taghi M. Khoshgoftaar,et al.  Software quality estimation with limited fault data: a semi-supervised learning perspective , 2007, Software Quality Journal.

[66]  Taghi M. Khoshgoftaar,et al.  Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study , 2004, Empirical Software Engineering.

[67]  Michael R. Lyu,et al.  Software quality prediction using mixture models with EM algorithm , 2000, Proceedings First Asia-Pacific Conference on Quality Software.

[68]  Giovanni Denaro,et al.  An empirical evaluation of object oriented metrics in industrial setting , 2003 .

[69]  Gary D. Boetticher,et al.  Improving Credibility of Machine Learner Models in Software Engineering , 2007 .

[70]  Ming Zhao,et al.  Application of multivariate analysis for software fault prediction , 1998, Software Quality Journal.

[71]  Yue Jiang,et al.  Fault Prediction using Early Lifecycle Data , 2007, The 18th IEEE International Symposium on Software Reliability (ISSRE '07).

[72]  Taghi M. Khoshgoftaar,et al.  Software Quality Analysis of Unlabeled Program Modules With Semisupervised Clustering , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[73]  Tim Menzies,et al.  Problems with Precision , 2007 .

[74]  Du Zhang,et al.  Advances in Machine Learning Applications in Software Engineering , 2007 .

[75]  Joanne Bechta Dugan,et al.  Empirical Analysis of Software Fault Content and Fault Proneness Using Bayesian Methods , 2007, IEEE Transactions on Software Engineering.

[76]  Hong-Zhong Huang,et al.  Early Software Quality Prediction Based on a Fuzzy Neural Network Model , 2007, Third International Conference on Natural Computation (ICNC 2007).

[77]  Banu Diri,et al.  Unlabelled extra data do not always mean extra performance for semi‐supervised fault prediction , 2009, Expert Syst. J. Knowl. Eng..

[78]  Rainer Koschke,et al.  Revisiting the evaluation of defect prediction models , 2009, PROMISE '09.

[79]  Michael R. Lyu,et al.  A novel method for early software quality prediction based on support vector machine , 2005, 16th IEEE International Symposium on Software Reliability Engineering (ISSRE'05).

[80]  Rudolf Ferenc,et al.  Using the Conceptual Cohesion of Classes for Fault Prediction in Object-Oriented Systems , 2008, IEEE Transactions on Software Engineering.

[81]  Martin Höst,et al.  Sensitivity of Website Reliability to Usage Profile Changes , 2007, The 18th IEEE International Symposium on Software Reliability (ISSRE '07).

[82]  Norman F. Schneidewind,et al.  Investigation of logistic regression as a discriminant of software quality , 2001, Proceedings Seventh International Software Metrics Symposium.

[83]  Ayse Basar Bener,et al.  Data mining source code for locating software bugs: A case study in telecommunication industry , 2009, Expert Syst. Appl..

[84]  Giovanni Denaro,et al.  Estimating software fault-proneness for tuning testing activities , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[85]  Premkumar T. Devanbu,et al.  A Comparative Study of Inductive Logic Programming Methods for Software Fault Prediction , 1997, ICML.

[86]  Chih-Ping Chu,et al.  Integrating in-process software defect prediction with association mining to discover defect pattern , 2009, Inf. Softw. Technol..

[87]  D. Binkley,et al.  Software Fault Prediction using Language Processing , 2007, Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007).

[88]  Taghi M. Khoshgoftaar,et al.  Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques , 2003, Empirical Software Engineering.

[89]  Taghi M. Khoshgoftaar,et al.  Application of neural networks to software quality modeling of a very large telecommunications system , 1997, IEEE Trans. Neural Networks.

[90]  Lionel C. Briand,et al.  A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..