Techniques for evaluating fault prediction models

Many statistical techniques have been proposed to predict fault-proneness of program modules in software engineering. Choosing the “best” candidate among many available models involves performance assessment and detailed comparison, but these comparisons are not simple due to the applicability of varying performance measures. Classifying a software module as fault-prone implies the application of some verification activities, thus adding to the development cost. Misclassifying a module as fault free carries the risk of system failure, also associated with cost implications. Methodologies for precise evaluation of fault prediction models should be at the core of empirical software engineering research, but have attracted sporadic attention. In this paper, we overview model evaluation techniques. In addition to many techniques that have been used in software engineering studies before, we introduce and discuss the merits of cost curves. Using the data from a public repository, our study demonstrates the strengths and weaknesses of performance evaluation techniques and points to a conclusion that the selection of the “best” model cannot be made without considering project cost characteristics, which are specific in each development environment.

[1]  Valery Buzungu,et al.  Predicting Fault-prone Components in a Java Legacy System , 2006 .

[2]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[3]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[4]  Adam A. Porter,et al.  Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis , 1988, IEEE Trans. Software Eng..

[5]  Khaled El Emam,et al.  Comparing case-based reasoning classifiers for predicting high risk software components , 2001, J. Syst. Softw..

[6]  Miha Vuk,et al.  ROC curve, lift chart and calibration plot , 2006, Advances in Methodology and Statistics.

[7]  Taghi M. Khoshgoftaar,et al.  Tree-based software quality estimation models for fault prediction , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[8]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[9]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007 .

[10]  Niclas Ohlsson,et al.  Predicting Fault-Prone Software Modules in Telephone Switches , 1996, IEEE Trans. Software Eng..

[11]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[12]  Bojan Cukic,et al.  Robust prediction of fault-proneness by random forests , 2004, 15th International Symposium on Software Reliability Engineering.

[13]  Houari A. Sahraoui,et al.  Combining and adapting software quality predictive models by genetic algorithms , 2002, Proceedings 17th IEEE International Conference on Automated Software Engineering,.

[14]  Niall M. Adams,et al.  Comparing classifiers when the misallocation costs are uncertain , 1999, Pattern Recognit..

[15]  Foster J. Provost,et al.  ROC confidence bands: an empirical evaluation , 2005, ICML.

[16]  Swapna S. Gokhale,et al.  Regression Tree Modeling For The Prediction Of Software Quality , 1997 .

[17]  Sofus A. Macskassy,et al.  Pointwise ROC Confidence Bounds: An Empirical Evaluation , 2005 .

[18]  Xiuzhen Zhang,et al.  Comments on "Data Mining Static Code Attributes to Learn Defect Predictors" , 2007, IEEE Trans. Software Eng..

[19]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[20]  W. W. Daniel Applied Nonparametric Statistics , 1979 .

[21]  Charles X. Ling,et al.  Data Mining for Direct Marketing: Problems and Solutions , 1998, KDD.

[22]  Gary D. Boetticher Nearest neighbor sampling for better defect prediction , 2005, ACM SIGSOFT Softw. Eng. Notes.

[23]  M. F. Fuller,et al.  Practical Nonparametric Statistics; Nonparametric Statistical Inference , 1973 .

[24]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[25]  Mary E. Helander,et al.  Early Risk-Management by Identification of Fault-prone Modules , 2004, Empirical Software Engineering.

[26]  Robert C. Holte,et al.  Cost curves: An improved method for visualizing classifier performance , 2006, Machine Learning.

[27]  Taghi M. Khoshgoftaar,et al.  Predicting fault-prone modules with case-based reasoning , 1997, Proceedings The Eighth International Symposium on Software Reliability Engineering.

[28]  Murray H. Loew,et al.  Comparison of non-parametric methods for assessing classifier performance in terms of ROC parameters , 2004, 33rd Applied Imagery Pattern Recognition Workshop (AIPR'04).

[29]  Taghi M. Khoshgoftaar,et al.  A neural network approach for early detection of program modules having high risk in the maintenance phase , 1995, J. Syst. Softw..

[30]  Bojan Cukic,et al.  An empirical investigation of tree ensembles in biometrics and bioinformatics research , 2007 .

[31]  Hongfang Liu,et al.  Building effective defect-prediction models in practice , 2005, IEEE Software.

[32]  Tim Menzies,et al.  When can we test less? , 2003, Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717).

[33]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[34]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[35]  Venkata U. B. Challagulla,et al.  Empirical assessment of machine learning based software defect prediction techniques , 2005, 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems.

[36]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[37]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[38]  Ian Witten,et al.  Data Mining , 2000 .

[39]  Bojan Cukic,et al.  An Empirical Assessment on Program Module-Order Models , 2007 .