Fault-Proneness of Open Source Systems: An Empirical Analysis

Developing quality software is a very complex job considering the complexity and size of software developed these days. Early prediction of software quality assists in optimizing testing resources. Many fault prediction models have been developed using several internal attributes and different machine learning techniques. However, the open-source community still lacks a concise knowledge about what types of internal attributes affect the software quality the most. In this work, an empirical investigation is conducted to explore the relationships between internal attributes of open-source systems and their fault-proneness. The results of the empirical analysis showed that by selecting only nine internal attributes, the fault prediction models accuracy did not decrease significantly. This indicates that only a subset of these internal attributes is worth collection and investigation. By focusing on a small set of internal attributes, the quality assurance team can save time and resources while achieving high accuracy fault- proneness predictions.

[1]  Olcay Taner Yildiz,et al.  Software defect prediction using Bayesian networks , 2012, Empirical Software Engineering.

[2]  Letha H. Etzkorn,et al.  Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes , 2007, IEEE Transactions on Software Engineering.

[3]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[4]  Tim Menzies,et al.  Class level fault prediction using software clustering , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[5]  Marian Jureczko,et al.  Using Object-Oriented Design Metrics to Predict Software Defects 1* , 2010 .

[6]  D. Spinellis,et al.  Chapter 1 Using Object-Oriented Design Metrics to Predict Software Defects , 2010 .

[7]  Yuming Zhou,et al.  Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults , 2006, IEEE Transactions on Software Engineering.

[8]  Banu Diri,et al.  Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem , 2009, Inf. Sci..

[9]  Cagatay Catal,et al.  Software fault prediction: A literature review and current trends , 2011, Expert Syst. Appl..

[10]  Lionel C. Briand,et al.  Replicated Case Studies for Investigating Quality Factors in Object-Oriented Designs , 2001, Empirical Software Engineering.

[11]  Jan Vanthienen,et al.  Software Defect Prediction Based on Association Rule Classification , 2010 .

[12]  Kenneth Magel,et al.  Empirical Evaluation of a New Coupling Metric: Combining Structural and Semantic Coupling , 2014 .

[13]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[14]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[15]  Lionel C. Briand,et al.  Investigating quality factors in object-oriented designs: an industrial case study , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[16]  Eugenia Stoimenova,et al.  Applied Nonparametric Statistical Methods , 2010 .