Empirical validation of object-oriented metrics on open source software for fault prediction

Open source software systems are becoming increasingly important these days. Many companies are investing in open source projects and lots of them are also using such software in their own work. But, because open source software is often developed with a different management style than the industrial ones, the quality and reliability of the code needs to be studied. Hence, the characteristics of the source code of these projects need to be measured to obtain more information about it. This paper describes how we calculated the object-oriented metrics given by Chidamber and Kemerer to illustrate how fault-proneness detection of the source code of the open source Web and e-mail suite called Mozilla can be carried out. We checked the values obtained against the number of bugs found in its bug database - called Bugzilla - using regression and machine learning methods to validate the usefulness of these metrics for fault-proneness prediction. We also compared the metrics of several versions of Mozilla to see how the predicted fault-proneness of the software system changed during its development cycle.

[1]  Hausi A. Müller,et al.  The Software Bookshelf , 1997, IBM Syst. J..

[2]  Rudolf Ferenc,et al.  Data exchange with the columbus schema for c++ , 2002, Proceedings of the Sixth European Conference on Software Maintenance and Reengineering.

[3]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[4]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .

[5]  Emden R. Gansner,et al.  A C++ data model supporting reachability analysis and dead code detection , 1997, ESEC '97/FSE-5.

[6]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[7]  Christian Robottom Reis,et al.  An Overview of the Software Engineering Process and Tools in the Mozilla Project , 2002 .

[8]  Tibor Gyimóthy,et al.  Columbus - reverse engineering tool and schema for C++ , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[9]  Michael W. Godfrey,et al.  Secrets from the Monster: Extracting Mozilla’s Software Architecture , 2000 .

[10]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[11]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[12]  Hausi A. Müller,et al.  Predicting fault-proneness using OO metrics. An industrial case study , 2002, Proceedings of the Sixth European Conference on Software Maintenance and Reengineering.

[13]  Beszé Columbus - Reverse Engineering Tool and Schema for C++ , 2002 .

[14]  Lionel C. Briand,et al.  Assessing the Applicability of Fault-Proneness Models Across Object-Oriented Software Projects , 2002, IEEE Trans. Software Eng..

[15]  Tibor Gyimóthy,et al.  Extracting facts from open source software , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[16]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[17]  Lionel C. Briand,et al.  Exploring the relationships between design measures and software quality in object-oriented systems , 2000, J. Syst. Softw..

[18]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[19]  Ramanath Subramanyam,et al.  Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects , 2003, IEEE Trans. Software Eng..

[20]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[21]  Michael H. Kutner Applied Linear Statistical Models , 1974 .

[22]  Lionel C. Briand,et al.  Empirical Studies of Quality Models in Object-Oriented Systems , 2002, Adv. Comput..