Identifying Features in Forks

Fork-based development has been widely used both in open source communities and in industry, because it gives developers flexibility to modify their own fork without affecting others. Unfortunately, this mechanism has downsides: When the number of forks becomes large, it is difficult for developers to get or maintain an overview of activities in the forks. Current tools provide little help. We introduce INFOX, an approach to automatically identify non-merged features in forks and to generate an overview of active forks in a project. The approach clusters cohesive code fragments using code and network-analysis techniques and uses information-retrieval techniques to label clusters with keywords. The clustering is effective, with 90% accuracy on a set of known features. In addition, a human-subject evaluation shows that INFOX can provide actionable insight for developers of forks.

[1]  Tommi Mikkonen,et al.  To Fork or Not to Fork: Fork Motivations in SourceForge Projects , 2011, OSS.

[2]  Alan Borning,et al.  Lightweight structural summarization as an aid to software evolution , 1996 .

[3]  Martin P. Robillard,et al.  Automatic generation of suggestions for program investigation , 2005, ESEC/FSE-13.

[4]  Huan Liu,et al.  Community Detection and Mining in Social Media , 2010, Community Detection and Mining in Social Media.

[5]  J. Rubin,et al.  Semantic Slicing of Software Version Histories , 2018, IEEE Transactions on Software Engineering.

[6]  Georgios Gousios,et al.  Lean GHTorrent: GitHub data on demand , 2014, MSR 2014.

[7]  Neil A. Ernst,et al.  Code forking in open-source software: a requirements perspective , 2010, ArXiv.

[8]  Ching Y. Suen,et al.  n-Gram Statistics for Natural Language Understanding and Text Processing , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Christian Bird,et al.  Assessing the value of branches with what-if analysis , 2012, SIGSOFT FSE.

[10]  Jan Bosch,et al.  From software product lines to software ecosystems , 2009, SPLC.

[11]  Christopher Exton,et al.  Assisting Concept Location in Software Comprehension , 2007, PPIG.

[12]  Emily Hill,et al.  Exploring the neighborhood with dora to expedite software maintenance , 2007, ASE '07.

[13]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Norman Wilde,et al.  The role of concepts in program comprehension , 2002, Proceedings 10th International Workshop on Program Comprehension.

[15]  Bogdan Dit,et al.  Feature location in source code: a taxonomy and survey , 2013, J. Softw. Evol. Process..

[16]  Andreas Zeller,et al.  The impact of tangled code changes , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[17]  Andrew M. St. Laurent Understanding Open Source and Free Software Licensing , 2004 .

[18]  Greg R. Vetter Open Source Licensing and Scattering Opportunism in Software Standards , 2007 .

[19]  Raimund Dachselt,et al.  FeatureCommander: colorful #ifdef world , 2011, SPLC '11.

[20]  Hidehiko Masuhara,et al.  Unravel Programming Sessions with THRESHER: Identifying Coherent and Complete Sets of Fine-granular Source Code Changes , 2017 .

[21]  Tommi Mikkonen,et al.  Perspectives on Code Forking and Sustainability in Open Source Software , 2012, OSS.

[22]  Stéphane Ducasse,et al.  Semantic clustering: Identifying topics in source code , 2007, Inf. Softw. Technol..

[23]  Yijun Yu,et al.  Improving the Tokenisation of Identifier Names , 2011, ECOOP.

[24]  James D. Herbsleb,et al.  Social coding in GitHub: transparency and collaboration in an open software repository , 2012, CSCW.

[25]  Han-Joon Kim,et al.  News Keyword Extraction for Topic Tracking , 2008, 2008 Fourth International Conference on Networked Computing and Advanced Information Management.

[26]  Andrzej Wasowski,et al.  Forked and integrated variants in an open-source firmware project , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[27]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[28]  Marco Aurélio Gerosa,et al.  Almost There: A Study on Quasi-Contributors in Open-Source Software Projects , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[29]  Márcio Ribeiro,et al.  The Love/Hate Relationship with the C Preprocessor: An Interview Study , 2015, ECOOP.

[30]  Y SuenChing n-Gram Statistics for Natural Language Understanding and Text Processing , 1979 .

[31]  Lori Pollock,et al.  An Empirical Study of the Concept Assignment Problem , 2007 .

[32]  Audris Mockus,et al.  Forking and coordination in multi-platform development: a case study , 2014, ESEM '14.

[33]  Radu Vanciu,et al.  Partial Domain Comprehension in Software Evolution and Maintenance , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[34]  Johnny Saldaña,et al.  The Coding Manual for Qualitative Researchers , 2009 .

[35]  Rainer Koschke,et al.  Locating Features in Source Code , 2003, IEEE Trans. Software Eng..

[36]  Qing Zhang,et al.  CVSSearch: searching through source code using CVS comments , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[37]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[38]  Kim Herzig Untangling Changes , 2011 .

[39]  James D. Herbsleb,et al.  Leveraging Transparency , 2013, IEEE Software.

[40]  Michael D. Ernst,et al.  An Empirical Analysis of C Preprocessor Use , 2002, IEEE Trans. Software Eng..

[41]  Krzysztof Czarnecki,et al.  An Exploratory Study of Cloning in Industrial Software Product Lines , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[42]  Linus Nyman Hackers on Forking , 2014, OpenSym.

[43]  Norman Wilde,et al.  Software reconnaissance: Mapping program features to code , 1995, J. Softw. Maintenance Res. Pract..

[44]  Jesús M. González-Barahona,et al.  A Comprehensive Study of Software Forks: Dates, Reasons and Outcomes , 2012, OSS.

[45]  Eric S. Raymond,et al.  Cathedral & the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary , 2001 .

[46]  Jürgen Bitzer,et al.  The Impact of Entry and Competition by Open Source Software on Innovation , 2005 .

[47]  Tim Menzies,et al.  On the use of relevance feedback in IR-based concept location , 2009, 2009 IEEE International Conference on Software Maintenance.

[48]  Andrian Marcus,et al.  An information retrieval approach to concept location in source code , 2004, 11th Working Conference on Reverse Engineering.

[49]  Marsha Chechik,et al.  A framework for managing cloned product variants , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[50]  Andrew M. Saint-Laurent,et al.  Understanding open source and free software licensing - guide to navigation licensing issues in existing and new software , 2004 .

[51]  Arie van Deursen,et al.  A Systematic Survey of Program Comprehension through Dynamic Analysis , 2008, IEEE Transactions on Software Engineering.

[52]  Marsha Chechik,et al.  Precise semantic history slicing through dynamic delta refinement , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[53]  Denys Poshyvanyk,et al.  Combining Formal Concept Analysis with Information Retrieval for Concept Location in Source Code , 2007, 15th IEEE International Conference on Program Comprehension (ICPC '07).

[54]  Li-Te Cheng,et al.  Shared waypoints and social tagging to support collaboration in software development , 2006, CSCW '06.

[55]  Karl Fogel,et al.  Producing open source software - how to run a successful free software project , 2005 .

[56]  Krzysztof Czarnecki,et al.  Three Cases of Feature-Based Variability Modeling in Industry , 2014, MoDELS.

[57]  Michael English,et al.  An empirical analysis of information retrieval based concept location techniques in software comprehension , 2008, Empirical Software Engineering.

[58]  Sven Apel,et al.  An analysis of the variability in forty preprocessor-based software product lines , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[59]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[60]  Václav Rajlich,et al.  Case study of feature location using dependence graph , 2000, Proceedings IWPC 2000. 8th International Workshop on Program Comprehension.

[61]  Gail C. Murphy,et al.  Hipikat: recommending pertinent software development artifacts , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[62]  Ralf Lämmel,et al.  Flexible product line engineering with a virtual platform , 2014, ICSE Companion.

[63]  Andrew P. Black,et al.  How we refactor, and how we know it , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[64]  SaltonGerard,et al.  Term-weighting approaches in automatic text retrieval , 1988 .

[65]  Bogdan Dit,et al.  Using Data Fusion and Web Mining to Support Feature Location in Software , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[66]  Riitta Jääskeläinen Think-aloud protocol , 2010 .

[67]  Eric S. Raymond,et al.  The cathedral and the bazaar - musings on Linux and Open Source by an accidental revolutionary , 2001 .

[68]  Georgios Gousios,et al.  Untangling fine-grained code changes , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[69]  Martin P. Robillard,et al.  Non-essential changes in version histories , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[70]  Emily Hill,et al.  Automatically capturing source code context of NL-queries for software maintenance and reuse , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[71]  Jonathan I. Maletic,et al.  srcML: An Infrastructure for the Exploration, Analysis, and Manipulation of Source Code: A Tool Demonstration , 2013, 2013 IEEE International Conference on Software Maintenance.

[72]  T. J. Emerson A discriminant metric for module cohesion , 1984, ICSE '84.

[73]  Janice Singer,et al.  Learning from project history: a case study for software development , 2004, CSCW.

[74]  Emily Hill,et al.  Using natural language program analysis to locate and understand action-oriented concerns , 2007, AOSD.

[75]  Shuvendu K. Lahiri,et al.  Helping Developers Help Themselves: Automatic Decomposition of Code Review Changesets , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[76]  Janice Singer,et al.  Hipikat: a project memory for software development , 2005, IEEE Transactions on Software Engineering.

[77]  Alfred V. Aho,et al.  CERBERUS: Tracing Requirements to Source Code Using Information Retrieval, Dynamic Analysis, and Program Analysis , 2008, 2008 16th IEEE International Conference on Program Comprehension.