Merits of Organizational Metrics in Defect Prediction: An Industrial Replication

Defect prediction models presented in the literature lack generalization unless the original study can be replicated using new datasets and in different organizational settings. Practitioners can also benefit from replicating studies in their own environment by gaining insights and comparing their findings with those reported. In this work, we replicated an earlier study in order to investigate the merits of organizational metrics in building defect prediction models for large-scale enterprise software. We mined the organizational, code complexity, code churn and pre-release bug metrics of that large scale software and built defect prediction models for each metric set. In the original study, organizational metrics were found to achieve the highest performance. In our case, models based on organizational metrics performed better than models based on churn metrics but were outperformed by pre-release metric models. Further, we verified four individual organizational metrics as indicators for defects. We conclude that the performance of different metric sets in building defect prediction models depends on the project's characteristics and the targeted prediction level. Our replication of earlier research enabled assessing the validity and limitations of organizational metrics in a different context.

[1]  M. Host,et al.  Experimental context classification: incentives and experience of subjects , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[2]  M Wood,et al.  Replication of Experimental Results in Software Engineering , 2022 .

[3]  Jeffrey C. Carver Towards Reporting Guidelines for Experimental Replications: A Proposal , 2010 .

[4]  B SeamanCarolyn Qualitative Methods in Empirical Studies of Software Engineering , 1999 .

[5]  Audris Mockus,et al.  An Empirical Study of Speed and Communication in Globally Distributed Software Development , 2003, IEEE Trans. Software Eng..

[6]  Mario Piattini,et al.  Challenges and Improvements in Distributed Software Development: A Systematic Review , 2009, Adv. Softw. Eng..

[7]  Harald C. Gall,et al.  Don't touch my code!: examining the effects of ownership on software quality , 2011, ESEC/FSE '11.

[8]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[9]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[10]  Natalia Juristo Juzgado,et al.  Reviewing 25 Years of Testing Technique Experiments , 2004, Empirical Software Engineering.

[11]  Carolyn B. Seaman,et al.  Qualitative Methods in Empirical Studies of Software Engineering , 1999, IEEE Trans. Software Eng..

[12]  Claes Wohlin,et al.  Experimentation in Software Engineering , 2012, Springer Berlin Heidelberg.

[13]  Marcelo Cataldo,et al.  On the relationship between process maturity and geographic distribution: an empirical analysis of their impact on software quality , 2009, ESEC/FSE '09.

[14]  Tore Dybå,et al.  Evidence-based software engineering , 2016, Perspectives on Data Science for Software Engineering.

[15]  Jeffrey C. Carver,et al.  Replicating software engineering experiments: addressing the tacit knowledge problem , 2002, Proceedings International Symposium on Empirical Software Engineering.

[16]  Forrest Shull,et al.  Building Knowledge through Families of Experiments , 1999, IEEE Trans. Software Eng..

[17]  Do More People Make the Code More Defect Prone?: Social Network Analysis in OSS Projects , 2010, SEKE.

[18]  Bora Caglayan,et al.  Emergence of developer teams in the collaboration network , 2013, 2013 6th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE).

[19]  Victor R. Basili,et al.  The influence of organizational structure on software quality , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[20]  Laurie A. Williams,et al.  Predicting failures with developer networks and social network analysis , 2008, SIGSOFT '08/FSE-16.

[21]  M. E. Conway HOW DO COMMITTEES INVENT , 1967 .

[22]  Tore Dybå,et al.  The Future of Empirical Methods in Software Engineering Research , 2007, Future of Software Engineering (FOSE '07).

[23]  Christian Bird,et al.  The effect of branching strategies on software quality , 2012, Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement.

[24]  Ahmed E. Hassan,et al.  Studying the impact of social interactions on software quality , 2012, Empirical Software Engineering.

[25]  Forrest Shull Research 2.0? , 2012, IEEE Software.