Defect Prediction using Case-Based Reasoning: an Attribute Weighting Technique Based upon sensitivity Analysis in Neural Networks

Software defect prediction is an acknowledged approach used to achieve better product quality and to better utilize resources needed for that purpose. One known method for predicting the number of defects is to apply case-based reasoning (CBR). In this paper, different attribute weighting techniques for CBR-based defect prediction are analyzed. One of the weighting techniques used in this work, Sensitivity Analysis based on Neural Networks (SANN), is based on sensitivity analysis of the impact of attributes as part of neural network analysis. Neural networks are applicable when there are non-linear and complicated relationships among the attributes. Since weighting plays a key role in the CBR model, using an efficient weight calculation method can change the results. The results of SANN are compared with applying uniform weights and weights gained from Multiple Linear Regression (MLR). Evaluation of the accuracy of the overall method for applying the three different weighting techniques is done over five data sets, comprising about 5000 modules from NASA. Two quality measures are applied: Average Absolute Error (AAE) and Average Relative Error (ARE). In addition to the variation of weighting techniques, the impact of varying the number of nearest neighbors is studied. The three main results of the empirical analysis are: (i) In the majority of cases, SANN achieves the most accurate results; (ii) uniform weighting performs better than the MLR-based weighting heuristic; and (iii) there is no significant preference pattern for defining the number of similar objects used for prediction in CBR.

[1]  Ayse Basar Bener,et al.  Feature weighting heuristics for analogy-based effort estimation models , 2009, Expert Syst. Appl..

[2]  Edward B. Allen,et al.  Case-Based Software Quality Prediction , 2000, Int. J. Softw. Eng. Knowl. Eng..

[3]  Donald E. Neumann An Enhanced Neural Network Technique for Software Risk Analysis , 2002, IEEE Trans. Software Eng..

[4]  Banu Diri,et al.  A systematic review of software fault prediction studies , 2009, Expert Syst. Appl..

[5]  Taghi M. Khoshgoftaar,et al.  An empirical study of predicting software faults with case-based reasoning , 2006, Software Quality Journal.

[6]  Edward B. Allen,et al.  GP-based software quality prediction , 1998 .

[7]  Khaled El Emam,et al.  Comparing case-based reasoning classifiers for predicting high risk software components , 2001, J. Syst. Softw..

[8]  Iker Gondra,et al.  Applying machine learning to software fault-proneness prediction , 2008, J. Syst. Softw..

[9]  Taghi M. Khoshgoftaar,et al.  Tree-based software quality estimation models for fault prediction , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[10]  Martin J. Shepperd,et al.  Comparing Software Prediction Techniques Using Simulation , 2001, IEEE Trans. Software Eng..

[11]  Christopher M. Bishop,et al.  Neural Network for Pattern Recognition , 1995 .

[12]  Barry W. Boehm,et al.  Understanding and Controlling Software Costs , 1988, IEEE Trans. Software Eng..

[13]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[14]  Sang-Chan Park,et al.  Feature-Weighted CBR with Neural Network for Symbolic Features , 2006, ICIC.

[15]  Zhi-Hua Zhou,et al.  Sample-based software defect prediction with active and semi-supervised learning , 2012, Automated Software Engineering.

[16]  Michael M. Richter,et al.  A Comparative Study of Attribute Weighting Techniques for Software Defect Prediction Using Case-based Reasoning , 2010, SEKE.

[17]  Hareton K. N. Leung,et al.  Mining Static Code Metrics for a Robust Prediction of Software Defect-Proneness , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[18]  Lorenzo Strigini,et al.  On the Use of Testability Measures for Dependability Assessment , 1996, IEEE Trans. Software Eng..

[19]  Sallie M. Henry,et al.  The evaluation of software systems' structure using quantitative software metrics , 1984, Softw. Pract. Exp..

[20]  Javam C. Machado,et al.  The prediction of faulty classes using object-oriented design metrics , 2001, J. Syst. Softw..

[21]  Adam A. Porter,et al.  Experimental Software Engineering: A Report on the State of the Art , 1995, 1995 17th International Conference on Software Engineering.

[22]  Günther Ruhe,et al.  Software Effort Estimation by Analogy Using Attribute Selection Based on Rough Set Analysis , 2008, Int. J. Softw. Eng. Knowl. Eng..

[23]  Mary E. Helander,et al.  Early Risk-Management by Identification of Fault-prone Modules , 2004, Empirical Software Engineering.

[24]  Taghi M. Khoshgoftaar,et al.  Analyzing software measurement data with clustering techniques , 2004, IEEE Intelligent Systems.

[25]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[26]  Taghi M. Khoshgoftaar,et al.  Ordering Fault-Prone Software Modules , 2003, Software Quality Journal.

[27]  Jun Zheng,et al.  Cost-sensitive boosting neural networks for software defect prediction , 2010, Expert Syst. Appl..

[28]  Rudolf Ramler,et al.  Applying Heuristic Approaches for Predicting Defect-Prone Software Components , 2011, EUROCAST.

[29]  Xin Peng,et al.  Assessing Software Quality by Program Clustering and Defect Prediction , 2011, 2011 18th Working Conference on Reverse Engineering.

[30]  C. V. Ramamoorthy,et al.  Knowledge based tools for risk assessment in software development and reuse , 1993, Proceedings of 1993 IEEE Conference on Tools with Al (TAI-93).

[31]  Stephen G. MacDonell,et al.  Software Metrics Data Analysis—Exploring the Relative Performance of Some Commonly Used Modeling Techniques , 1999, Empirical Software Engineering.