Operation-Based, Fine-Grained Version Control Model for Tree-Based Representation

Existing version control systems are often based on text line-oriented models for change representation, which do not facilitate software developers in understanding code evolution. Other advanced change representation models that encompass more program semantics and structures are still not quite practical due to their high computational complexity. This paper presents OperV, a novel operation-based version control model that is able to support both coarse and fine levels of granularity in program source code. In OperV, a software system is represented by a project tree whose nodes represent all program entities, such as packages, classes, methods, etc. The changes of the system are represented via edit operations on the tree. OperV also provides the algorithms to differ, store, and retrieve the versions of such entities. These algorithms are based on the mapping of the nodes between versions of the project tree. This mapping technique uses 1) divide-and-conquer technique to map coarse- and fine-grained entities separately, 2) unchanged text regions to map unchanged leaf nodes, and 3) structure-based similarity of the sub-trees to map their root nodes bottom-up and then top-down. The empirical evaluation of OperV has shown that it is scalable, efficient, and could be useful in understanding program evolution.

[1]  Reidar Conradi,et al.  Change Oriented Versioning , 1989, ESEC.

[2]  Harald C. Gall,et al.  Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction , 2007, IEEE Transactions on Software Engineering.

[3]  Romain Robbes,et al.  An Approach to Software Evolution Based on Semantic Change , 2007, FASE.

[4]  John Tang Boyland,et al.  An infrastructure for development of object-oriented, multi-level configuration management services , 2005, ICSE.

[5]  Andreas Zeller,et al.  Unified versioning through feature logic , 1997, TSEM.

[6]  Susan Horwitz,et al.  Identifying the semantic and textual differences between two versions of a program , 1990, PLDI '90.

[7]  David J. DeWitt,et al.  X-Diff: an effective change detection algorithm for XML documents , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[8]  Mark Chu-Carroll,et al.  Supporting aggregation in fine grained software configuration management , 2002, SIGSOFT '02/FSE-10.

[9]  Boris Magnusson,et al.  The Unified Extensional Versioning Model , 1999, SCM.

[10]  Bernhard Westfechtel,et al.  Structure-oriented merging of revisions of software documents , 1991, SCM '91.

[11]  Hoan Anh Nguyen,et al.  Clone-Aware Configuration Management , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[12]  Eiichi Tanaka,et al.  The Tree-to-Tree Editing Problem , 1988, Int. J. Pattern Recognit. Artif. Intell..

[13]  Hoan Anh Nguyen,et al.  Accurate and Efficient Structural Characteristic Feature Extraction for Clone Detection , 2009, FASE.

[14]  David Leon,et al.  Dex: a semantic-graph differencing tool for studying changes in large code bases , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[15]  Reidar Conradi,et al.  Version models for software configuration management , 1998, CSUR.

[16]  Alessandro Orso,et al.  JDiff: A differencing technique and tool for object-oriented programs , 2006, Automated Software Engineering.

[17]  W. Keith Edwards,et al.  Flexible conflict detection and management in collaborative applications , 1997, UIST '97.

[18]  Steven P. Reiss,et al.  Configuration management with logical structures , 1996, Proceedings of IEEE 18th International Conference on Software Engineering.

[19]  Torbjörn Ekman,et al.  Refactoring-aware versioning in Eclipse , 2004, Electron. Notes Theor. Comput. Sci..

[20]  Walter F. Tichy,et al.  Rcs — a system for version control , 1985, Softw. Pract. Exp..

[21]  Thomas G. Szymanski,et al.  A fast algorithm for computing longest common subsequences , 1977, CACM.

[22]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..

[23]  Boris Magnusson,et al.  Fine Grained Version Control of Configurations in COOP/Orm , 1996, SCM.

[24]  Ernst Lippe,et al.  Operation-based merging , 1992 .

[25]  Randall D. Cronk,et al.  Tributaries and deltas , 1992 .

[26]  Gerardo Canfora,et al.  Ldiff: An enhanced line differencing tool , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[27]  Ian Gordon,et al.  Introduction to Gawk , 1996 .

[28]  Jennifer Widom,et al.  Change detection in hierarchically structured information , 1996, SIGMOD '96.

[29]  Rainer Koschke,et al.  Incremental Clone Detection , 2009, 2009 13th European Conference on Software Maintenance and Reengineering.

[30]  Perdita Stevens,et al.  Modelling Recursive Calls with UML State Diagrams , 2003, FASE.

[31]  Jonathan I. Maletic,et al.  Supporting source code difference analysis , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[32]  Ulf Asklund,et al.  Identifying Conflicts During Structural Merge , 1999 .

[33]  Udo Kelter,et al.  A fine-grained version and configuration model in analysis and design , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[34]  T.N. Nguyen,et al.  An infrastructure for development of object-oriented, multi-level configuration management services , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[35]  Walter F. Tichy,et al.  The string-to-string correction problem with block moves , 1984, TOCS.

[36]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[37]  Abraham Bernstein,et al.  Detecting similar Java classes using tree algorithms , 2006, MSR '06.

[38]  Kaizhong Zhang,et al.  Algorithms for the constrained editing distance between ordered labeled trees and related problems , 1995, Pattern Recognit..