Is It Dangerous to Use Version Control Histories to Study Source Code Evolution?

Researchers use file-based Version Control System (VCS) as the primary source of code evolution data. VCSs are widely used by developers, thus, researchers get easy access to historical data of many projects. Although it is convenient, research based on VCS data is incomplete and imprecise. Moreover, answering questions that correlate code changes with other activities (e.g., test runs, refactoring) is impossible. Our tool, CodingTracker, non-intrusively records fine-grained and diverse data during code development. CodingTracker collected data from 24 developers: 1,652 hours of development, 23,002 committed files, and 314,085 testcase runs. This allows us to answer: How much code evolution data is not stored in VCS? How much do developers intersperse refactorings and edits in the same commit? How frequently do developers fix failing tests by changing the test itself? How many changes are committed to VCS without being tested? What is the temporal and spacial locality of changes?

[1]  Romain Robbes,et al.  A Change-based Approach to Software Evolution , 2006, EVOL.

[2]  Emerson Murphy-Hill,et al.  Code Hot Spot: A tool for extraction and analysis of code change history , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[3]  Yann-Gaël Guéhéneuc,et al.  An exploratory study of identifier renamings , 2011, MSR '11.

[4]  Miryung Kim,et al.  Automatic Inference of Structural Changes for Matching across Program Versions , 2007, 29th International Conference on Software Engineering (ICSE'07).

[5]  Brad A. Myers,et al.  Capturing and analyzing low-level events from the code editor , 2011, PLATEAU '11.

[6]  Oscar Nierstrasz,et al.  Finding refactorings via change metrics , 2000, OOPSLA '00.

[7]  Ahmed E. Hassan,et al.  Identifying crosscutting concerns using historical code changes , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[8]  Harald C. Gall,et al.  Software evolution observations based on product release history , 1997, 1997 Proceedings International Conference on Software Maintenance.

[9]  Romain Robbes,et al.  An Approach to Software Evolution Based on Semantic Change , 2007, FASE.

[10]  濱野 純 入門Git : The fast version control system , 2009 .

[11]  Hoh Peter In,et al.  Micro interaction metrics for defect prediction , 2011, ESEC/FSE '11.

[12]  Jonathan I. Maletic,et al.  Journal of Software Maintenance and Evolution: Research and Practice Survey a Survey and Taxonomy of Approaches for Mining Software Repositories in the Context of Software Evolution , 2022 .

[13]  Joseph J. LaViola,et al.  Code bubbles: rethinking the user interface paradigm of integrated development environments , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[14]  M.M. Lehman,et al.  Programs, life cycles, and laws of software evolution , 1980, Proceedings of the IEEE.

[15]  Andreas Zeller,et al.  Mining Version Histories to Guide Software Changes , 2004 .

[16]  Harald C. Gall,et al.  Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction , 2007, IEEE Transactions on Software Engineering.

[17]  Audris Mockus,et al.  Does Code Decay? Assessing the Evidence from Change Management Data , 2001, IEEE Trans. Software Eng..

[18]  Meir M. Lehman,et al.  Program evolution: processes of software change , 1985 .

[19]  Romain Robbes,et al.  Of Change and Software , 2009 .

[20]  Serge Demeyer,et al.  Detecting move operations in versioning information , 2006, Conference on Software Maintenance and Reengineering (CSMR'06).

[21]  Simon Yuill Concurrent Versions System , 2008 .

[22]  Harald C. Gall,et al.  Mining Software Evolution to Predict Refactoring , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).

[23]  Thomas Zimmermann,et al.  Automatic Identification of Bug-Introducing Changes , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[24]  Katsuhisa Maruyama,et al.  An editing-operation replayer with highlights supporting investigation of program modifications , 2011, IWPSE-EVOL '11.

[25]  Andreas Zeller,et al.  When do changes induce fixes? , 2005, ACM SIGSOFT Softw. Eng. Notes.

[26]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[27]  Dave Thomas,et al.  ECOOP 2006 - Object-Oriented Programming , 2006 .

[28]  Harald C. Gall,et al.  Detection of logical coupling based on product release history , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[29]  Darko Marinov,et al.  On test repair using symbolic execution , 2010, ISSTA '10.

[30]  Jacky Chan,et al.  Supporting empirical studies by non-intrusive collection and visualization of fine-grained revision history , 2007, eclipse '07.

[31]  Eleni Stroulia,et al.  Analyzing the evolutionary history of the logical design of object-oriented software , 2005, IEEE Transactions on Software Engineering.

[32]  Martin P. Robillard,et al.  Non-essential changes in version histories , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[33]  Stas Negara,et al.  Use, disuse, and misuse of automated refactorings , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[34]  Mauro Pezzè,et al.  Automatically repairing test cases for evolving method declarations , 2010, 2010 IEEE International Conference on Software Maintenance.

[35]  Harald C. Gall,et al.  CVS release history data for detecting logical couplings , 2003, Sixth International Workshop on Principles of Software Evolution, 2003. Proceedings..

[36]  Stéphane Ducasse,et al.  Yesterday's Weather: guiding early reverse engineering efforts by summarizing the evolution of changes , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[37]  Premkumar T. Devanbu,et al.  BugCache for inspections: hit or miss? , 2011, ESEC/FSE '11.

[38]  Romain Robbes,et al.  SpyWare: a change-aware development toolset , 2008, ICSE '08.

[39]  Yi Zhang,et al.  Classifying Software Changes: Clean or Buggy? , 2008, IEEE Transactions on Software Engineering.

[40]  Thomas Zimmermann,et al.  Predicting Bugs from History , 2008, Software Evolution.

[41]  Ralph E. Johnson,et al.  Automated Detection of Refactorings in Evolving Components , 2006, ECOOP.

[42]  Carsten Görg,et al.  Detecting and visualizing refactorings from software archives , 2005, 13th International Workshop on Program Comprehension (IWPC'05).

[43]  Stephan Diehl,et al.  Identifying Refactorings from Source-Code Changes , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[44]  Sunghun Kim,et al.  Micro pattern evolution , 2006, MSR '06.

[45]  Thomas Zimmermann,et al.  When do changes induce fixes? On Fridays , 2005 .

[46]  Barry Boehm,et al.  A Replicate Empirical Comparison between Pair Development and Software Development with Inspection , 2007, ESEM 2007.

[47]  Daniel M. Germán,et al.  What do large commits tell us?: a taxonomical study of large commits , 2008, MSR '08.

[48]  Serge Demeyer,et al.  Software Evolution , 2010 .

[49]  Katsuhisa Maruyama,et al.  A change-aware development environment by recording editing operations of source code , 2008, MSR '08.

[50]  Perdita Stevens,et al.  Modelling Recursive Calls with UML State Diagrams , 2003, FASE.

[51]  Darko Marinov,et al.  ReAssert: Suggesting Repairs for Broken Unit Tests , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[52]  Yann-Gaël Guéhéneuc,et al.  A seismology-inspired approach to study change propagation , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).