Software Fault Prediction: A Systematic Mapping Study

Context: Software fault prediction has been an important research topic in the software engineering field for more than 30 years. Software defect prediction models are commonly used to detect faulty software modules based on software metrics collected during the software development process. Objective: Data mining techniques and machine learning studies in the fault prediction software context are mapped and characterized. We investigated the metrics and techniques and their performance according to performance metrics studied. An analysis and synthesis of these studies is conducted. Method: A systematic mapping study has been conducted for identifying and aggregating evidence about software fault prediction.Results: About 70 studies published from January 2002 to December 2014 were identified. Top 40 studies were selected for analysis, based on the quality criteria results. The main metrics used were: Halstead, McCabe and LOC (67.14%), Halstead, McCabe and LOC + Object-Oriented (15.71%), others (17.14%). The main models were: Machine Learning(ML) (47.14%), ML + Statistical Analysis (31.42%), others (21.41%). The data sets used were: private access (35%) and public access (65%). The most frequent combination of metrics, models and techniques were: Halstead, McCabe and LOC + Random Forest, Naive Bayes, Logistic Regression and Decision Tree representing the (60%) of the analyzed studies. Conclusions: This article has identified and classified the performance of the metrics, techniques and their combinations. This will help researchers to select datasets, metrics and models based on experimental results, with the objective to generate learning schemes that allow a better prediction software failures.

[1]  Aurora Trinidad Ramirez Pozo,et al.  A symbolic fault-prediction model based on multiobjective particle swarm optimization , 2010, J. Syst. Softw..

[2]  Ruchika Malhotra,et al.  Comparative analysis of statistical and machine learning methods for predicting faulty modules , 2014, Appl. Soft Comput..

[3]  John M Colford,et al.  Systematic reviews and meta-analyses: an illustrated, step-by-step guide. , 2004, The National medical journal of India.

[4]  Richard Torkar,et al.  Software fault prediction metrics: A systematic literature review , 2013, Inf. Softw. Technol..

[5]  Carol Withrow,et al.  Prediction and control of ADA software defects , 1990, J. Syst. Softw..

[6]  Qinbao Song,et al.  Software defect association mining and defect correction effort prediction , 2006, IEEE Transactions on Software Engineering.

[7]  Per Runeson,et al.  Checklists for Software Engineering Case Study Research , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).

[8]  Iker Gondra,et al.  Applying machine learning to software fault-proneness prediction , 2008, J. Syst. Softw..

[9]  Barbara Kitchenham,et al.  What's up with software metrics? - A preliminary mapping study , 2010, J. Syst. Softw..

[10]  Amri Napolitano,et al.  Software measurement data reduction using ensemble techniques , 2012, Neurocomputing.

[11]  Taghi M. Khoshgoftaar,et al.  Predicting Faults in High Assurance Software , 2010, 2010 IEEE 12th International Symposium on High Assurance Systems Engineering.

[12]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[13]  Frank Elberzhager,et al.  Reducing test effort: A systematic mapping study on existing approaches , 2012, Inf. Softw. Technol..

[14]  Banu Diri,et al.  A systematic review of software fault prediction studies , 2009, Expert Syst. Appl..

[15]  Taghi M. Khoshgoftaar,et al.  Classification-tree models of software-quality over multiple releases , 2000, IEEE Trans. Reliab..

[16]  Qinbao Song,et al.  A General Software Defect-Proneness Prediction Framework , 2011, IEEE Transactions on Software Engineering.

[17]  Per Runeson,et al.  Can we evaluate the quality of software engineering experiments? , 2010, ESEM '10.

[18]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[19]  Bojan Cukic,et al.  An adaptive approach with active learning in software fault prediction , 2012, PROMISE '12.

[20]  YUE JIANG,et al.  Incremental Development of Fault Prediction Models , 2013, Int. J. Softw. Eng. Knowl. Eng..

[21]  Kai Petersen,et al.  Systematic Mapping Studies in Software Engineering , 2008, EASE.

[22]  Yue Jiang,et al.  Comparing design and code metrics for software quality prediction , 2008, PROMISE '08.

[23]  Taghi M. Khoshgoftaar,et al.  Application of neural networks to software quality modeling of a very large telecommunications system , 1997, IEEE Trans. Neural Networks.

[24]  Claes Wohlin,et al.  Experimentation in Software Engineering , 2000, The Kluwer International Series in Software Engineering.

[25]  P. Singh,et al.  Empirical investigation of fault prediction capability of object oriented metrics of open source software , 2012, 2012 Ninth International Conference on Computer Science and Software Engineering (JCSSE).

[26]  Kai Petersen,et al.  Measuring and predicting software productivity: A systematic map and review , 2011, Inf. Softw. Technol..

[27]  Pearl Brereton,et al.  Performing systematic literature reviews in software engineering , 2006, ICSE.

[28]  Pearl Brereton,et al.  Lessons from applying the systematic literature review process within the software engineering domain , 2007, J. Syst. Softw..

[29]  David R. Jones,et al.  Synthesising qualitative and quantitative evidence: A review of possible methods , 2005 .

[30]  Taghi M. Khoshgoftaar,et al.  A Comprehensive Empirical Study of Count Models for Software Fault Prediction , 2007, IEEE Transactions on Reliability.