Effective Estimation of Modules' Metrics in Software Defect Prediction

The prediction of software defects has recently attracted the attention of software quality researchers. Many predictive classification systems have already been proposed, which aim at early discovery of software modules that are fault- prone and versa. The proposed methods are usually assessed using datasets available from NASA Metrics Data repository. These datasets include a combination of design-level and code- level metrics for different modules. To apply a defect predictor, all metrics have to be measured for any of the modules (to be used as the classifier inputs). The measurement of some of these metrics is easy and can be done straight forward. However, there are a number of metrics which are more difficult or time- consuming to quantify. Moreover, many of them do not have an exact value; so, they may get different values when using different formulas or tools. In this paper, we first discuss this hypothesis that some strong dependencies exist among various features of these datasets. Based on this hypothesis, we search for short combinations of features from the first category (easy-to- measure features), which can describe any of the features from the second category (hard-to-measure features) with a high accuracy. Then, we introduce a set of fuzzy modeling systems, each of which estimates the value of one of the second category features from its specified determinants. The evaluation of the estimation systems is carried out by computing the MSE values for all features. The experimental results are promising. The presented estimation system provides usability of the defect prediction system rather than its accuracy. Using this system, the user will not have to measure all the required mentioned metrics for any of the modules. All the features of the second category will automatically be estimated with a high accuracy.

[1]  Seyed Mostafa Fakhrahmad,et al.  AD-Miner: A new incremental method for discovery of minimal approximate dependencies using logical operations , 2008, Intell. Data Anal..

[2]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[3]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[4]  Been-Chian Chien,et al.  Mining approximate dependencies using partitions on similarity-relation-based fuzzy databases , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[5]  Francescaromana Maradei,et al.  Measurement and Modeling , 2008 .

[6]  Peter A. Flach,et al.  Database Dependency Discovery: A Machine Learning Approach , 1999, AI Commun..

[7]  Hannu Toivonen,et al.  Efficient discovery of functional and approximate dependencies using partitions , 1998, Proceedings 14th International Conference on Data Engineering.

[8]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[9]  Lionel C. Briand,et al.  Assessing the Applicability of Fault-Proneness Models Across Object-Oriented Software Projects , 2002, IEEE Trans. Software Eng..

[10]  L X Wang,et al.  Fuzzy basis functions, universal approximation, and orthogonal least-squares learning , 1992, IEEE Trans. Neural Networks.

[11]  Maurice H. Halstead,et al.  Elements of software science , 1977 .

[12]  Jean-Marc Petit,et al.  Efficient Discovery of Functional Dependencies and Armstrong Relations , 2000, EDBT.