Predicting software defects: A cost-sensitive approach

Find software defects is a complex and slow task which consumes most of the development budgets. In order to try reducing the cost of test activities, many researches have used machine learning to predict whether a module is defect-prone or not. Defect detection is a cost-sensitive task whereby a misclassification is more costly than a correct classification. Yet, most of the researches do not consider classification costs in the prediction models. This paper introduces an empirical method based in a COCOMO (COnstructive COst MOdel) that aims to assess the cost of each classifier decision. This method creates a cost matrix that is used in conjunction with a threshold-moving approach in a ROC (Receiver Operating Characteristic) curve to select the best operating point regarding cost. Public data sets from NASA (National Aeronautics and Space Administration) IV&V (Independent Verification & Validation) Facility Metrics Data Program (MDP) are used to train the classifiers and to provide some development effort information. The experiments are carried out through a methodology that complies with validation and reproducibility requirements. The experimental results have shown that the proposed method is efficient and allows the interpretation of the classifier performance in terms of tangible cost values.

[1]  Yue Jiang,et al.  Cost Curve Evaluation of Fault Prediction Models , 2008, 2008 19th International Symposium on Software Reliability Engineering (ISSRE).

[2]  Taghi M. Khoshgoftaar,et al.  Cost-sensitive boosting in software quality modeling , 2002, 7th IEEE International Symposium on High Assurance Systems Engineering, 2002. Proceedings..

[3]  Yue Jiang,et al.  Misclassification cost-sensitive fault prediction models , 2009, PROMISE '09.

[4]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[5]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[6]  Michael R. Berthold,et al.  Boosting the Performance of RBF Networks with Dynamic Decay Adjustment , 1994, NIPS.

[7]  R.B. Misra,et al.  On determining the software testing cost to assure desired field reliability , 2004, Proceedings of the IEEE INDICON 2004. First India Annual Conference, 2004..

[8]  Hossein Arsham,et al.  Techniques for Monte Carlo Optimizing , 1998, Monte Carlo Methods Appl..

[9]  Yue Jiang,et al.  Techniques for evaluating fault prediction models , 2008, Empirical Software Engineering.

[10]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[11]  Tim Menzies,et al.  Problems with Precision , 2007 .

[12]  Jun Zheng,et al.  Cost-sensitive boosting neural networks for software defect prediction , 2010, Expert Syst. Appl..

[13]  Roger S. Pressman,et al.  Software Engineering: A Practitioner's Approach , 1982 .

[14]  Silvio Romero de Lemos Meira,et al.  Enhancing RBF-DDA Algorithm's Robustness: Neural Networks Applied to Prediction of Fault-Prone Software Modules , 2008, IFIP AI.

[15]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[16]  Kai Ming Ting,et al.  Boosting Cost-Sensitive Trees , 1998, Discovery Science.

[17]  Adriano Lorena Inácio de Oliveira,et al.  On the Influence of Parameter theta- on Performance of Rbf Neural Networks Trained with the Dynamic Decay Adjustment Algorithm , 2006, Int. J. Neural Syst..

[18]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[19]  Xiuzhen Zhang,et al.  Comments on "Data Mining Static Code Attributes to Learn Defect Predictors" , 2007, IEEE Trans. Software Eng..