Performance Evaluation Metrics for Software Fault Prediction Studies

Experimental studies confirmed that only a small portion of software modules cause faults in software systems. Therefore, the majority of software modules are represented with non-faulty labels and the rest are marked with faulty labels during the modeling phase. These kinds of datasets are called imbalanced, and different performance metrics exist to evaluate the performance of proposed fault prediction techniques. In this study, we investigate 85 fault prediction papers based on their performance evaluation metrics and categorize these metrics into two main groups. Evaluation methods such as cross validation and stratified sampling are not in the scope of this paper, and therefore only evaluation metrics are examined. This study shows that researchers have used different evaluation parameters for software fault prediction until now and more studies on performance evaluation metrics for imbalanced datasets should be conducted.

[1]  R. Kothari,et al.  Learning from labeled and unlabeled data , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[2]  Michael R. Lyu,et al.  Software quality prediction using mixture models with EM algorithm , 2000, Proceedings First Asia-Pacific Conference on Quality Software.

[3]  Victor R. Basili,et al.  Developing Interpretable Models with Optimized Set Reduction for Identifying High-Risk Software Components , 1993, IEEE Trans. Software Eng..

[4]  D. Binkley,et al.  Software Fault Prediction using Language Processing , 2007, Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007).

[5]  Sanjay Misra Evaluation Criteria for Object-oriented Metrics , 2011 .

[6]  Eghbal G. Mansoori,et al.  Weighting fuzzy classification rules using receiver operating characteristics (ROC) analysis , 2007, Inf. Sci..

[7]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[8]  Taghi M. Khoshgoftaar,et al.  A Comprehensive Empirical Study of Count Models for Software Fault Prediction , 2007, IEEE Transactions on Reliability.

[9]  Charles X. Ling,et al.  AUC: A Better Measure than Accuracy in Comparing Learning Algorithms , 2003, Canadian Conference on AI.

[10]  Qian Yin,et al.  Software quality prediction using Affinity Propagation algorithm , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[11]  Lei Wang,et al.  AdaBoost with SVM-based component classifiers , 2008, Eng. Appl. Artif. Intell..

[12]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[13]  Hongfang Liu,et al.  An investigation of the effect of module size on defect prediction using static measures , 2005, ACM SIGSOFT Softw. Eng. Notes.

[14]  Yuming Zhou,et al.  Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults , 2006, IEEE Transactions on Software Engineering.

[15]  Khaled El Emam,et al.  Comparing case-based reasoning classifiers for predicting high risk software components , 2001, J. Syst. Softw..

[16]  Taghi M. Khoshgoftaar,et al.  Tree-based software quality estimation models for fault prediction , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[17]  Yu Chen,et al.  A Soft Real-Time Web News Classification System with Double Control Loops , 2005, WAIM.

[18]  Bojan Cukic,et al.  A Statistical Framework for the Prediction of Fault-Proneness , 2007 .

[19]  Taghi M. Khoshgoftaar,et al.  Experimental perspectives on learning from imbalanced data , 2007, ICML '07.

[20]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[21]  O. T. Pusatli,et al.  Software Measurement Activities in Small and Medium Enterprises: an Empirical Assessment , 2011 .

[22]  Banu Diri,et al.  Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem , 2009, Inf. Sci..

[23]  Lars Lundberg,et al.  The accuracy of early fault prediction in modified code , 2005 .

[24]  Robert C. Holte,et al.  Cost curves: An improved method for visualizing classifier performance , 2006, Machine Learning.

[25]  D. J. Hand,et al.  Good practice in retail credit scorecard assessment , 2005, J. Oper. Res. Soc..

[26]  Taghi M. Khoshgoftaar,et al.  An empirical study of predicting software faults with case-based reasoning , 2006, Software Quality Journal.

[27]  Taghi M. Khoshgoftaar,et al.  Software quality estimation with limited fault data: a semi-supervised learning perspective , 2007, Software Quality Journal.

[28]  Yue Jiang,et al.  Cost Curve Evaluation of Fault Prediction Models , 2008, 2008 19th International Symposium on Software Reliability Engineering (ISSRE).

[29]  Tong-Seng Quah,et al.  Application of neural networks for software quality prediction using object-oriented metrics , 2005, J. Syst. Softw..

[30]  John C. Munson,et al.  Building high‐quality software fault predictors , 2006, Softw. Pract. Exp..

[31]  Giovanni Denaro,et al.  Estimating software fault-proneness for tuning testing activities , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[32]  Ling Xu,et al.  Ordering Effects in Clustering , 1992, ML.

[33]  Taghi M. Khoshgoftaar,et al.  An empirical study of the impact of count models predictions on module-order models , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[34]  Ayse Basar Bener,et al.  Analysis of Naive Bayes' assumptions on software fault data: An empirical study , 2009, Data Knowl. Eng..

[35]  Lionel C. Briand,et al.  A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..

[36]  Nitesh V. Chawla,et al.  Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains , 2011, J. Artif. Intell. Res..

[37]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[38]  Sandro Morasca,et al.  Towards Industrially Relevant Fault-Proneness Models , 2003, Int. J. Softw. Eng. Knowl. Eng..

[39]  Banu Diri,et al.  A systematic review of software fault prediction studies , 2009, Expert Syst. Appl..

[40]  Javam C. Machado,et al.  The prediction of faulty classes using object-oriented design metrics , 2001, J. Syst. Softw..

[41]  Matthias Seeger,et al.  Learning from Labeled and Unlabeled Data , 2010, Encyclopedia of Machine Learning.

[42]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[43]  Taghi M. Khoshgoftaar,et al.  Unsupervised learning for expert-based software quality estimation , 2004, Eighth IEEE International Symposium on High Assurance Systems Engineering, 2004. Proceedings..

[44]  Taghi M. Khoshgoftaar,et al.  An application of fuzzy clustering to software quality prediction , 2000, Proceedings 3rd IEEE Symposium on Application-Specific Systems and Software Engineering Technology.

[45]  Rainer Koschke,et al.  Revisiting the evaluation of defect prediction models , 2009, PROMISE '09.