Classifying code changes and predicting defects using changegenealogies

Identifying bug fixes and using them to estimate or even predict software quality is a frequent task when mining version archives. The number of applied bug fixes serves as code quality metric identifying defect-prone and non-defect-prone code artifacts. But when is a set of applied code changes, we call it change set, considered a bug fix and which metrics should be used to building high quality defect prediction models? Most commonly, bug fixes are identified by analyzing commit messages—short, mostly unstructured pieces of plain text. Commit message containing keywords such as “fix” or “issue” followed by a bug report identifier, are considered to fix the corresponding bug report. Similar, most defect prediction models use metrics describing the structure, complexity, or dependencies of source code artifacts. Complex or central code is considered to be more defect-prone. But commit messages and code metrics describe the state of software artifacts and code changes at a particular point in time, disregarding their genealogies that describe how the current state description came to be. There are approaches measuring historic properties of code artifacts [1]–[5] and using code dependency graphs [6], [7] but non of these approaches tracks the structural dependency paths of code changes to measure the centrality and impact of change sets, although change sets are those development events that make the source code look as it does. Herzig et al. [8] used so called change genealogy graphs to model structural dependencies between change sets. The authors used these change genealogy graphs to measure and analyze the impact of change sets on other, later applied change sets. In this paper, we make use of change genealogy graphs to define a set of change genealogy network metrics describing the structural dependencies of change sets. We further investigate whether change genealogy metrics can be used to identify bug fixing change sets (without using commit messages and bug databases) and whether change genealogy metrics are expressive enough to build effective defect prediction models classifying source files to be defect-prone or not. Regarding the identification of bug fixing change sets, our assumption is that change sets applying bug fixes show significant dependency differences when compared to change sets applying new feature implementations. We suspect that implementing and adding a new feature implies adding new method definitions that impact a large set of later applied code changes, which add code fragments adding method calls to these newly defined methods. In contrast, we suspect bug fixes to be relatively small rarely defining new methods but modifying existing features and thus to have a small impact on later applied code changes. The impact of bug fixes is to modify the runtime behavior of the software system rather than causing future change sets to use different functionality. Similar, we suspect more central change sets—depending on a large set of earlier change sets and causing many later applied change sets to be dependent on itself—to be crucial to the software development process. Consequently, we suspect code artifacts that got many crucial and central code changes applied to be more defect prone than others. More specifically, we seek to answer the following research questions in our study: RQ1 How do bug fix classification models based on change genealogy metrics compare to classification models based on code complexity metrics (Section V)? RQ2 How do defect prediction models compare with defect prediction models based on code complexity or code dependency network metrics (Section VI)? We tested the classification and prediction abilities of our approaches on four open source projects. The results show that change genealogy metrics can be used to separate bug fixing from feature implementing change sets with an average precision of 72% and an average recall of 89%. Our results also show that defect prediction models based on change genealogy metrics can predict defect-prone source files with precision and recall values of up to 80%. On average the precision for change genealogy models lies at 69% and the average recall at 81%. Compared to prediction models based on code dependency network metrics, change genealogy based prediction models achieve better precision and comparable recall values.

[1]  Ahmed E. Hassan,et al.  A Study of the Time Dependence of Code Changes , 2009, 2009 16th Working Conference on Reverse Engineering.

[2]  Nachiappan Nagappan,et al.  Using Software Dependencies and Churn Metrics to Predict Field Failures: An Empirical Case Study , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).

[3]  Gail C. Murphy,et al.  Hipikat: recommending pertinent software development artifacts , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[4]  Audris Mockus,et al.  Identifying reasons for software changes using historic databases , 2000, Proceedings 2000 International Conference on Software Maintenance.

[5]  Andreas Zeller,et al.  What is the long-term impact of changes? , 2008, RSSE '08.

[6]  Richard C. Holt,et al.  The top ten list: dynamic fault prediction , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[7]  Andreas Zeller,et al.  Predicting faults from cached history , 2008, ISEC '08.

[8]  Daniel M. Germán,et al.  Change impact graphs: Determining the impact of prior codechanges , 2009, Inf. Softw. Technol..

[9]  A. Jefferson Offutt,et al.  Algorithmic analysis of the impacts of changes to object-oriented software , 2000, Proceedings. 34th International Conference on Technology of Object-Oriented Languages and Systems - TOOLS 34.

[10]  Harald C. Gall,et al.  Classifying Change Types for Qualifying Change Couplings , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[11]  Harvey P. Siy,et al.  Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[12]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[13]  Krishnendu Chatterjee,et al.  Analyzing the Impact of Change in Multi-threaded Programs , 2010, FASE.

[14]  Elaine J. Weyuker,et al.  Does calling structure information improve the accuracy of fault prediction? , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[15]  Claes Wohlin,et al.  Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering , 2006 .

[16]  Jeffrey C. Carver,et al.  Characterizing software architecture changes: A systematic review , 2010, Inf. Softw. Technol..

[17]  Ahmed E. Hassan,et al.  Automated classification of change messages in open source projects , 2008, SAC '08.

[18]  Nachiappan Nagappan,et al.  Predicting defects using network analysis on dependency graphs , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[19]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[20]  Steven B. Andrews,et al.  Structural Holes: The Social Structure of Competition , 1995, The SAGE Encyclopedia of Research Design.

[21]  Ann E. Nicholson,et al.  Using Bayesian belief networks for change impact analysis in architecture design , 2007, J. Syst. Softw..

[22]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[23]  Andreas Zeller,et al.  Predicting component failures at design time , 2006, ISESE '06.

[24]  Andreas Zeller,et al.  It's not a bug, it's a feature: How misclassification impacts bug prediction , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[25]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[26]  Martin P. Robillard,et al.  Non-essential changes in version histories , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[27]  Jennifer Pérez,et al.  Change Impact Analysis in Product-Line Architectures , 2011, ECSA.

[28]  Victor R. Basili,et al.  A validation of object oriented metrics as quality indicators , 1996 .

[29]  P. Bonacich Power and Centrality: A Family of Measures , 1987, American Journal of Sociology.

[30]  Harald C. Gall,et al.  Putting It All Together: Using Socio-technical Networks to Predict Failures , 2009, 2009 20th International Symposium on Software Reliability Engineering.

[31]  Harald C. Gall,et al.  Populating a Release History Database from version control and bug tracking systems , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[32]  Elaine J. Weyuker,et al.  Where the bugs are , 2004, ISSTA '04.

[33]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[34]  Daniel M. Germán,et al.  Change Impact Graphs: Determining the Impact of Prior Code Changes , 2008, 2008 Eighth IEEE International Working Conference on Source Code Analysis and Manipulation.

[35]  Andreas Zeller,et al.  Mining Cause-Effect-Chains from Version Histories , 2011, 2011 IEEE 22nd International Symposium on Software Reliability Engineering.

[36]  Max Kuhn,et al.  caret: Classification and Regression Training , 2015 .

[37]  Brendan Murphy,et al.  Can developer-module networks predict failures? , 2008, SIGSOFT '08/FSE-16.

[38]  Rahul Premraj,et al.  Network Versus Code Metrics to Predict Defects: A Replication Study , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[39]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[40]  Michele Marchesi,et al.  A machine learning approach for text categorization of fixing-issue commits on CVS , 2010, ESEM '10.

[41]  Ayse Basar Bener,et al.  Validation of network measures as indicators of defective modules in software systems , 2009, PROMISE '09.

[42]  Yi Zhang,et al.  Classifying Software Changes: Clean or Buggy? , 2008, IEEE Transactions on Software Engineering.

[43]  Andreas Zeller,et al.  Change Bursts as Defect Predictors , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.

[44]  Kim Herzig Capturing the long-term impact of changes , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[45]  Victor R. Basili,et al.  The influence of organizational structure on software quality , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.