Predicting defects using change genealogies

When analyzing version histories, researchers traditionally focused on single events: e.g. the change that causes a bug, the fix that resolves an issue. Sometimes however, there are indirect effects that count: Changing a module may lead to plenty of follow-up modifications in other places, making the initial change having an impact on those later changes. To this end, we group changes into change genealogies, graphs of changes reflecting their mutual dependencies and influences and develop new metrics to capture the spatial and temporal influence of changes. In this paper, we show that change genealogies offer good classification models when identifying defective source files: With a median precision of 73% and a median recall of 76%, change genealogy defect prediction models not only show better classification accuracies as models based on code complexity, but can also outperform classification models based on code dependency network metrics.

[1]  Nachiappan Nagappan,et al.  Predicting defects using network analysis on dependency graphs , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[2]  Ayse Basar Bener,et al.  Validation of network measures as indicators of defective modules in software systems , 2009, PROMISE '09.

[3]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[4]  V. Malheiros,et al.  A Visual Text Mining approach for Systematic Reviews , 2007, ESEM 2007.

[5]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[6]  Daniel M. Germán,et al.  Change Impact Graphs: Determining the Impact of Prior Code Changes , 2008, 2008 Eighth IEEE International Working Conference on Source Code Analysis and Manipulation.

[7]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[8]  Andreas Zeller,et al.  Predicting component failures at design time , 2006, ISESE '06.

[9]  P. Bonacich Power and Centrality: A Family of Measures , 1987, American Journal of Sociology.

[10]  Elaine J. Weyuker,et al.  Where the bugs are , 2004, ISSTA '04.

[11]  Harald C. Gall,et al.  Putting It All Together: Using Socio-technical Networks to Predict Failures , 2009, 2009 20th International Symposium on Software Reliability Engineering.

[12]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[13]  Rahul Premraj,et al.  Network Versus Code Metrics to Predict Defects: A Replication Study , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[14]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[15]  Nachiappan Nagappan,et al.  Using Software Dependencies and Churn Metrics to Predict Field Failures: An Empirical Case Study , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).

[16]  Andreas Zeller,et al.  What is the long-term impact of changes? , 2008, RSSE '08.

[17]  Daniel M. Germán,et al.  Change impact graphs: Determining the impact of prior codechanges , 2009, Inf. Softw. Technol..

[18]  Victor R. Basili,et al.  A validation of object oriented metrics as quality indicators , 1996 .

[19]  Andreas Zeller,et al.  Mining Cause-Effect-Chains from Version Histories , 2011, 2011 IEEE 22nd International Symposium on Software Reliability Engineering.

[20]  Max Kuhn,et al.  caret: Classification and Regression Training , 2015 .

[21]  Brendan Murphy,et al.  Can developer-module networks predict failures? , 2008, SIGSOFT '08/FSE-16.

[22]  Andreas Zeller,et al.  Change Bursts as Defect Predictors , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.

[23]  Kim Herzig Capturing the long-term impact of changes , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[24]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[25]  Ahmed E. Hassan,et al.  A Study of the Time Dependence of Code Changes , 2009, 2009 16th Working Conference on Reverse Engineering.

[26]  Andreas Zeller,et al.  It's not a bug, it's a feature: How misclassification impacts bug prediction , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[27]  Kim Herzig,et al.  Mining and untangling change genealogies , 2012 .

[28]  Victor R. Basili,et al.  The influence of organizational structure on software quality , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[29]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[30]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[31]  Elaine J. Weyuker,et al.  Does calling structure information improve the accuracy of fault prediction? , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[32]  Steven B. Andrews,et al.  Structural Holes: The Social Structure of Competition , 1995, The SAGE Encyclopedia of Research Design.