A case study of cross-system porting in forked projects

Software forking---creating a variant product by copying and modifying an existing product---is often considered an ad hoc, low cost alternative to principled product line development. To maintain such forked products, developers often need to port an existing feature or bug-fix from one product variant to another. As a first step towards assessing whether forking is a sustainable practice, we conduct an in-depth case study of 18 years of the BSD product family history. Our study finds that maintaining forked projects involves significant effort of porting patches from other projects. Cross-system porting happens periodically and the porting rate does not necessarily decrease over time. A significant portion of active developers participate in porting changes from peer projects. Surprisingly, ported changes are less defect-prone than non-ported changes. Our work is the first to comprehensively characterize the temporal, spatial, and developer dimensions of cross-system porting in the BSD family, and our tool Repertoire is the first automated tool for detecting ported edits with high accuracy of 94% precision and 84% recall. Our study finds that the upkeep work of porting changes from peer projects is significant and currently, porting practice seems to heavily depend on developers doing their porting job on time. This result calls for new techniques to automate cross-system porting to reduce the maintenance cost of forked projects.

[1]  References , 1971 .

[2]  J. H. Zar,et al.  Significance Testing of the Spearman Rank Correlation Coefficient , 1972 .

[3]  George W. Adamson,et al.  The use of an association measure based on character structure to identify semantically related pairs of words and document titles , 1974, Inf. Storage Retr..

[4]  Thomas G. Szymanski,et al.  A fast algorithm for computing longest common subsequences , 1977, CACM.

[5]  Masud Mansuripur,et al.  Introduction to information theory , 1986 .

[6]  Ettore Merlo,et al.  Assessing the benefits of incorporating function clone detection in a development process , 1997, 1997 Proceedings International Conference on Software Maintenance.

[7]  P. Salus The Cathedral and the Bazaar , 2000 .

[8]  Audris Mockus,et al.  Identifying reasons for software changes using historic databases , 2000, Proceedings 2000 International Conference on Software Maintenance.

[9]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[10]  James R. Cordy,et al.  Comprehending reality - practical barriers to industrial adoption of software maintenance automation , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[11]  Harald C. Gall,et al.  Mining evolution data of a product family , 2005, MSR '05.

[12]  Katsuro Inoue,et al.  Measuring Similarity of Large Software Systems Based on Source Code Correspondence , 2005, PROFES.

[13]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[14]  Michael W. Godfrey,et al.  Cloning by accident: an empirical study of source code cloning across software systems , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[15]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[16]  Tudor Gîrba,et al.  How Developers Copy , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[17]  Julia L. Lawall,et al.  Understanding collateral evolution in Linux device drivers , 2006, EuroSys '06.

[18]  Stuart E. Schechter,et al.  Milk or Wine: Does Software Security Improve with Age? , 2006, USENIX Security Symposium.

[19]  Katsuro Inoue,et al.  Analysis of the Linux Kernel Evolution Using Code Clone Coverage , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[20]  Julia L. Lawall,et al.  Generic patch inference , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[21]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[22]  Daniel M. Germán,et al.  Code siblings: Technical and legal implications of copying code between applications , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[23]  Daniel M. Germán,et al.  License integration patterns: Addressing license mismatches in component-based development , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[24]  Yue Jia,et al.  Cloning and copying between GNOME projects , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[25]  Zhendong Su,et al.  A study of the uniqueness of source code , 2010, FSE '10.

[26]  Hoan Anh Nguyen,et al.  Recurring bug fixes in object-oriented programs , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[27]  Daniel M. Germán,et al.  An exploratory study of the evolution of software licensing , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[28]  Miryung Kim,et al.  Systematic editing: generating program transformations from an example , 2011, PLDI '11.

[29]  James R. Cordy,et al.  Exploring Large-Scale System Similarity Using Incremental Clone Detection and Live Scatterplots , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[30]  Gerardo Canfora,et al.  Social interactions around cross-system bug fixings: the case of FreeBSD and OpenBSD , 2011, MSR '11.