Assessing the Applicability of Fault-Proneness Models Across Object-Oriented Software Projects

A number of papers have investigated the relationships between design metrics and the detection of faults in object-oriented software. Several of these studies have shown that such models can be accurate in predicting faulty classes within one particular software product. In practice, however, prediction models are built on certain products to be used on subsequent software development projects. How accurate can these models be, considering the inevitable differences that may exist across projects and systems? Organizations typically learn and change. From a more general standpoint, can we obtain any evidence that such models are economically viable tools to focus validation and verification effort? This paper attempts to answer these questions by devising a general but tailorable cost-benefit model and by using fault and design data collected on two mid-size Java systems developed in the same environment. Another contribution of the paper is the use of a novel exploratory analysis technique - MARS (multivariate adaptive regression splines) to build such fault-proneness models, whose functional form is a-priori unknown. The results indicate that a model built on one system can be accurately used to rank classes within another system according to their fault proneness. The downside, however, is that, because of system differences, the predicted fault probabilities are not representative of the system predicted. However, our cost-benefit model demonstrates that the MARS fault-proneness model is potentially viable, from an economical standpoint. The linear model is not nearly as good, thus suggesting a more complex model is required.

[1]  David P. Darcy,et al.  Managerial Use of Metrics for Object-Oriented Software: An Exploratory Analysis , 1998, IEEE Trans. Software Eng..

[2]  I. Jolliffe Principal Component Analysis , 2002 .

[3]  Lyle H. Ungar,et al.  A comparison of two nonparametric estimation schemes: MARS and neural networks , 1993 .

[4]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[5]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[6]  Lionel C. Briand,et al.  Investigating quality factors in object-oriented designs: an industrial case study , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[7]  Lionel C. Briand,et al.  Using simulation to build inspection efficiency benchmarks for development projects , 1998, Proceedings of the 20th International Conference on Software Engineering.

[8]  Walcélio L. Melo,et al.  Polymorphism measures for early risk prediction , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[9]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[10]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[11]  Sandro Morasca,et al.  Property-Based Software Engineering Measurement , 1996, IEEE Trans. Software Eng..

[12]  George Dunteman,et al.  Use of Principal Components in Cluster Analysis , 1989 .

[13]  Premkumar T. Devanbu,et al.  An Investigation into Coupling Measures for C++ , 1997, Proceedings of the (19th) International Conference on Software Engineering.