Mining Software Metrics from the Jazz Repository

This paper describes the extraction of source code metrics from the Jazz repository and the systematic application of data mining techniques to identify the most useful of those metrics for predicting the success or failure of an attempt to construct a working instance of the software product. Results are presented from a study using the J48 classification method used in conjunction with a number of attribute selection strategies applied to a set of source code metrics. These strategies involve the investigation of differing slices of code from the version control system and the cross-dataset classification of the various significant metrics in an attempt to work around the multicollinearity implicit in the available data. The results indicate that only a relatively small number of the available software metrics that have been considered have any significance for predicting the outcome of a build. These significant metrics are outlined and implication of the results discussed, particularly the relative difficulty of being able to predict failed build attempts.

[1]  Audris Mockus,et al.  Predicting risk of software changes , 2000, Bell Labs Technical Journal.

[2]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[3]  Thomas Zimmermann,et al.  Analytics for software development , 2010, FoSER '10.

[4]  Daniela E. Damian,et al.  Global Software Development and Delay: Does Distance Still Matter? , 2008, 2008 IEEE International Conference on Global Software Engineering.

[5]  Frank Maurer,et al.  Requirements attributes to predict requirements related defects , 2010, CASCON.

[6]  A.E. Hassan,et al.  The road ahead for Mining Software Repositories , 2008, 2008 Frontiers of Software Maintenance.

[7]  Mary E. Helander,et al.  Jazz as a research platform: experience from the Software Development Governance Group at IBM Research , 2008 .

[8]  Sandro Morasca,et al.  Deriving models of software fault-proneness , 2002, SEKE '02.

[9]  Christos Faloutsos,et al.  Detecting Fraudulent Personalities in Networks of Online Auctioneers , 2006, PKDD.

[10]  Gabriele Manduchi,et al.  Measuring software evolution at a nuclear fusion experiment site: a test case for the applicability of OO and reuse metrics in software characterization , 2002, Inf. Softw. Technol..

[11]  Abraham Kandel,et al.  Data mining in software metrics databases , 2004, Fuzzy Sets Syst..

[12]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[13]  Daniela E. Damian,et al.  Predicting build failures using social network analysis on developer communication , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[14]  Andy M. Connor,et al.  Predicting software build failure using source code metrics , 2011 .

[15]  Andreas Zeller,et al.  Mining the Jazz repository: Challenges and opportunities , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[16]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[17]  Russel Pears,et al.  Mining Software Metrics from Jazz , 2011, 2011 Ninth International Conference on Software Engineering Research, Management and Applications.

[18]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[19]  Barbara Kitchenham,et al.  What's up with software metrics? - A preliminary mapping study , 2010, J. Syst. Softw..

[20]  Jonathan I. Maletic,et al.  Journal of Software Maintenance and Evolution: Research and Practice Survey a Survey and Taxonomy of Approaches for Mining Software Repositories in the Context of Software Evolution , 2022 .