Interactive churn metrics

A central part of software quality is finding bugs. One method of finding bugs is by measuring important aspects of the software product and the development process. In recent history, researchers have discovered evidence of a "code churn" effect whereby the degree to which a given source code file has changed over time is correlated with faults and vulnerabilities. Computing the code churn metric comes from counting source code differences in version control repositories. However, code churn does not take into account a critical factor of any software development team: the human factor, specifically who is making the changes. In this paper, we introduce a new class of human-centered metrics, "interactive churn metrics" as variants of code churn. Using the git blame tool, we identify the most recent developer who changed a given line of code in a file prior to a given revision. Then, for each line changed in a given revision, determined if the revision author was changing his or her own code ("self churn"), or the author was changing code last modified by somebody else ("interactive churn"). We derive and present several metrics from this concept. Finally, we conducted an empirical analysis of these metrics on the PHP programming language and its post-release vulnerabilities. We found that our interactive churn metrics are statistically correlated with post-release vulnerabilities and only weakly correlated with code churn metrics and source lines of code. The results indicate that interactive churn metrics are associated with software quality and are different from the code churn and source lines of code.

[1]  Taghi M. Khoshgoftaar,et al.  Emerald: Software Metrics and Models on the Desktop , 1996, IEEE Softw..

[2]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[3]  Laurie A. Williams,et al.  Predicting failures with developer networks and social network analysis , 2008, SIGSOFT '08/FSE-16.

[4]  D. Lee,et al.  Failure prediction and diagnosis for satellite monitoring systems using Bayesian networks , 2008, MILCOM 2008 - 2008 IEEE Military Communications Conference.

[5]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[6]  Andrew Meneely,et al.  Strengthening the Empirical Analysis of the Relationship , 2010 .

[7]  Satoru Miyano,et al.  A Structure Learning Algorithm for Inference of Gene Networks from Microarray Gene Expression Data Using Bayesian Networks , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[8]  Michael Gegick,et al.  Prioritizing software security fortification throughcode-level metrics , 2008, QoP '08.

[9]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[10]  T. G. Cummings Self-Regulating Work Groups: A Socio-Technical Synthesis , 1978 .

[11]  Charles D. Knutson,et al.  Author entropy vs. file size in the gnome suite of applications , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[12]  E. Trist,et al.  Some Social and Psychological Consequences of the Longwall Method of Coal-Getting , 1951 .

[13]  Laurie A. Williams,et al.  Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities , 2011, IEEE Transactions on Software Engineering.

[14]  Marco Torchiano,et al.  Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement , 2014, ESEM 2014.

[15]  Laurie A. Williams,et al.  Strengthening the empirical analysis of the relationship between Linus' Law and software security , 2010, ESEM '10.

[16]  Laurie A. Williams,et al.  Secure open source collaboration: an empirical study of linus' law , 2009, CCS.

[17]  Norman F. Schneidewind,et al.  Methodology For Validating Software Metrics , 1992, IEEE Trans. Software Eng..

[18]  P. Rao Statistical Research Methods in the Life Sciences , 1997 .

[19]  Nachiappan Nagappan Toward a software testing and reliability early warning metric suite , 2004, Proceedings. 26th International Conference on Software Engineering.

[20]  Michael Gegick,et al.  Toward Non-security Failures as a Predictor of Security Faults and Failures , 2009, ESSoS.

[21]  Sebastian G. Elbaum,et al.  Code churn: a measure for estimating the impact of code change , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).