How the evolution of emerging collaborations relates to code changes: an empirical study

Developers contributing to open source projects spontaneously group into "emerging'' teams, reflected by messages exchanged over mailing lists, issue trackers and other communication means. Previous studies suggested that such teams somewhat mirror the software modularity. This paper empirically investigates how, when a project evolves, emerging teams re-organize themselves-e.g., by splitting or merging. We relate the evolution of teams to the files they change, to investigate whether teams split to work on cohesive groups of files. Results of this study-conducted on the evolution history of four open source projects, namely Apache httpd, Eclipse JDT, Netbeans, and Samba-provide indications of what happens in the project when teams reorganize. Specifically, we found that emerging team splits imply working on more cohesive groups of files and emerging team merges imply working on groups of files that are cohesive from structural perspective. Such indications serve to better understand the evolution of software projects. More important, the observation of how emerging teams change can serve to suggest software remodularization actions.

[1]  Premkumar T. Devanbu,et al.  Latent social structure in open source projects , 2008, SIGSOFT '08/FSE-16.

[2]  Jin Xu,et al.  A Topological Analysis of the Open Souce Software Development Community , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[3]  K. Goulden,et al.  Effect Sizes for Research: A Broad Practical Approach , 2006 .

[4]  Param Vir Singh,et al.  The small-world effect , 2010, ACM Trans. Softw. Eng. Methodol..

[5]  James D. Herbsleb,et al.  Identification of coordination requirements: implications for the Design of collaboration and awareness tools , 2006, CSCW '06.

[6]  A Gordon,et al.  Classification, 2nd Edition , 1999 .

[7]  Emden R. Gansner,et al.  Using automatic clustering to produce high-level system organizations of source code , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[8]  Richard C. Holt,et al.  Linux as a case study: its extracted software architecture , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[9]  Rachel K. E. Bellamy,et al.  Moving into a new software project landscape , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[10]  Gabriele Bavota,et al.  An empirical study on the developers' perception of software coupling , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[11]  David Lo,et al.  Mining Collaboration Patterns from a Large Developer Network , 2010, 2010 17th Working Conference on Reverse Engineering.

[12]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[13]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[14]  Michael Gertz,et al.  Mining email social networks , 2006, MSR '06.

[15]  M. F. Fuller,et al.  Practical Nonparametric Statistics; Nonparametric Statistical Inference , 1973 .

[16]  James D. Herbsleb,et al.  Socio-technical congruence: a framework for assessing the impact of technical and work dependencies on software development productivity , 2008, ESEM '08.

[17]  Arie van Deursen,et al.  Communication in open source software development mailing lists , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[18]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[19]  Andrew Begel,et al.  Codebook: discovering and exploiting relationships in software repositories , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[20]  Gerardo Canfora,et al.  Social interactions around cross-system bug fixings: the case of FreeBSD and OpenBSD , 2011, MSR '11.

[21]  Kevin Crowston,et al.  The social structure of free and open source software development , 2005, First Monday.

[22]  Tibor Gyimóthy,et al.  Using information retrieval based coupling measures for impact analysis , 2009, Empirical Software Engineering.

[23]  Jesús M. González-Barahona,et al.  Applying Social Network Analysis to the Information in CVS Repositories , 2004, MSR.

[24]  Kathleen M. Carley,et al.  A Social Network Approach to Free/Open Source Software Simulation , 2005 .

[25]  Shing-Chi Cheung,et al.  Understanding a developer social network and its evolution , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[26]  Gina Venolia,et al.  The secret life of bugs: Going past the errors and omissions in software repositories , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[27]  Spiros Mancoridis,et al.  On the automatic modularization of software systems using the Bunch tool , 2006, IEEE Transactions on Software Engineering.

[28]  A. Mockus,et al.  Does the initial environment impact the future of developers , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[29]  Gerardo Canfora,et al.  Who is going to mentor newcomers in open source projects? , 2012, SIGSOFT FSE.

[30]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[31]  Anthony I. Wasserman,et al.  A Framework for Evaluating Managerial Styles in Open Source Projects , 2008, OSS.