Online division of labour: emergent structures in Open Source Software

The development Open Source Software fundamentally depends on the participation and commitment of volunteer developers to progress on a particular task. Several works have presented strategies to increase the on-boarding and engagement of new contributors, but little is known on how these diverse groups of developers self-organise to work together. To understand this, one must consider that, on one hand, platforms like GitHub provide a virtually unlimited development framework: any number of actors can potentially join to contribute in a decentralised, distributed, remote, and asynchronous manner. On the other, however, it seems reasonable that some sort of hierarchy and division of labour must be in place to meet human biological and cognitive limits, and also to achieve some level of efficiency. These latter features (hierarchy and division of labour) should translate into detectable structural arrangements when projects are represented as developer-file bipartite networks. Thus, in this paper we analyse a set of popular open source projects from GitHub, placing the accent on three key properties: nestedness, modularity and in-block nestedness –which typify the emergence of heterogeneities among contributors, the emergence of subgroups of developers working on specific subgroups of files, and a mixture of the two previous, respectively. These analyses show that indeed projects evolve into internally organised blocks. Furthermore, the distribution of sizes of such blocks is bounded, connecting our results to the celebrated Dunbar number both in off- and on-line environments. Our conclusions create a link between bio-cognitive constraints, group formation and online working environments, opening up a rich scenario for future research on (online) work team assembly (e.g. size, composition, and formation). From a complex network perspective, our results pave the way for the study of time-resolved datasets, and the design of suitable models that can mimic the growth and evolution of OSS projects.

[1]  S. Valverde,et al.  Multi-scale structure and geographic drivers of cross-infection within marine bacteria and phages , 2012, The ISME Journal.

[2]  Werner Ulrich,et al.  A consistent metric for nestedness analysis in ecological systems: reconciling concept and measurement , 2008 .

[3]  A. Arenas,et al.  Community detection in complex networks using extremal optimization. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Marco Aurélio Gerosa,et al.  A systematic literature review on the barriers faced by newcomers to open source software projects , 2015, Inf. Softw. Technol..

[5]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Charles M. Schweik,et al.  Internet Success: A Study of Open-Source Software Commons , 2012 .

[7]  James D. Herbsleb,et al.  Leveraging Transparency , 2013, IEEE Software.

[8]  Naoyasu Ubayashi,et al.  Revisiting the applicability of the pareto principle to core development teams in open source software projects , 2015, IWPSE.

[9]  M. Barber Modularity and community detection in bipartite networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Robin I. M. Dunbar Neocortex size as a constraint on group size in primates , 1992 .

[11]  Robert Boyd,et al.  Partial connectivity increases cultural accumulation within groups , 2016, Proceedings of the National Academy of Sciences.

[12]  Yamir Moreno,et al.  Emergence of consensus as a modular-to-nested transition in communication dynamics , 2015, Scientific Reports.

[13]  J. Bascompte,et al.  Compartmentalization increases food-web persistence , 2011, Proceedings of the National Academy of Sciences.

[14]  Serguei Saavedra,et al.  Strong contributors to network persistence are the most vulnerable to extinction , 2011, Nature.

[15]  César A. Hidalgo,et al.  The Dynamics of Nestedness Predicts the Evolution of Industrial Ecosystems , 2012, PloS one.

[16]  Sang Hoon Lee,et al.  Network nestedness as generalized core-periphery structures , 2016, Physical review. E.

[17]  N. Metropolis,et al.  An Efficient Heuristic Procedure for Partitioning Graphs , 2017 .

[18]  Ulf Olsson,et al.  Confidence Intervals for the Mean of a Log-Normal Distribution , 2005 .

[19]  Alex Arenas,et al.  UC Merced Proceedings of the Annual Meeting of the Cognitive Science Society Title Navigating word association norms to extract semantic information , 2009 .

[20]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[21]  James D. Herbsleb,et al.  Social coding in GitHub: transparency and collaboration in an open software repository , 2012, CSCW.

[22]  Christoph Treude,et al.  Overcoming Open Source Project Entry Barriers with a Portal for Newcomers , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[23]  Stuart J. Fitz-Gerald Schweik and English, 2012 - Internet Success: A Study of Open-Source Software Commons, C.M. Schweik, R.C. English. MIT Press (2012) , 2012, Int. J. Inf. Manag..

[24]  J. Bascompte,et al.  Structure in plant–animal interaction assemblages , 2006 .

[25]  Stuart Fitz-Gerald Book Review of: 'Internet success: a study of Open-Source Software commons' by C.M. Schweik and R.C. English , 2012 .

[26]  Wirt Atmar,et al.  The measure of order and disorder in the distribution of species in fragmented habitat , 1993, Oecologia.

[27]  Michiel van Genuchten,et al.  On the Impact of Being Open , 2015, IEEE Software.

[28]  Premkumar T. Devanbu,et al.  Latent social structure in open source projects , 2008, SIGSOFT '08/FSE-16.

[29]  Albert Solé-Ribalta,et al.  Revealing In-Block Nestedness: detection and benchmarking , 2018, Physical review. E.

[30]  Carlos J. Melián,et al.  The nested assembly of plant–animal mutualistic networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Robert Boyd,et al.  Divide and conquer: intermediate levels of population fragmentation maximize cultural accumulation , 2018, Philosophical Transactions of the Royal Society B: Biological Sciences.

[32]  Jordi Cabot,et al.  A Systematic Mapping Study of Software Development With GitHub , 2017, IEEE Access.

[33]  Jordi Bascompte,et al.  The architecture of mutualistic networks minimizes competition and increases biodiversity , 2009, Nature.

[34]  Albert Solé-Ribalta,et al.  Antagonistic Structural Patterns in Complex Networks , 2018, ArXiv.

[35]  Yamir Moreno,et al.  Structural and Dynamical Patterns on Online Social Networks: The Spanish May 15th Movement as a Case Study , 2011, PloS one.

[36]  Jordi Cabot,et al.  Gitana: A SQL-Based Git Repository Inspector , 2015, ER.

[37]  Jordi Cabot,et al.  Assessing the bus factor of Git repositories , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[38]  Q. Atkinson,et al.  Cultural assemblages show nested structure in humans and chimpanzees but not orangutans , 2013, Proceedings of the National Academy of Sciences.

[39]  Ken-ichi Matsumoto,et al.  Characteristics of Sustainable OSS Projects: A Theoretical and Empirical Study , 2015, 2015 IEEE/ACM 8th International Workshop on Cooperative and Human Aspects of Software Engineering.

[40]  Rohan Padhye,et al.  A study of external community contribution to open-source projects on GitHub , 2014, MSR 2014.

[41]  Marco Aurélio Gerosa,et al.  Promoting Engagement in Open Collaboration Communities by Means of Gamification , 2016, HCI.

[42]  Leif Singer,et al.  Creating a shared understanding of testing culture on a social coding site , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[43]  Hywel T. P. Williams,et al.  Coevolutionary diversification creates nested-modular structure in phage–bacteria interaction networks , 2013, Interface Focus.

[44]  Shing-Chi Cheung,et al.  Understanding a developer social network and its evolution , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[45]  Marco Tulio Valente,et al.  A novel approach for estimating Truck Factors , 2016, 2016 IEEE 24th International Conference on Program Comprehension (ICPC).

[46]  Wirt Atmar,et al.  Nested subsets and the structure of insular mammalian faunas and archipelagos , 1986 .

[47]  Werner Ulrich,et al.  A consumer's guide to nestedness analysis , 2009 .

[48]  A. Maritan,et al.  Emergence of structural and dynamical properties of ecological mutualistic networks , 2013, Nature.

[49]  Birgit Penzenstadler,et al.  Towards a definition of sustainability in and for software engineering , 2013, SAC '13.

[50]  Antonio Lima,et al.  Coding Together at Scale: GitHub as a Collaborative Social Network , 2014, ICWSM.

[51]  Alessandro Vespignani,et al.  Modeling Users' Activity on Twitter Networks: Validation of Dunbar's Number , 2011, PloS one.

[52]  R. Solé,et al.  Self-organization versus hierarchy in open-source social networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[53]  P. Kidwell,et al.  The mythical man-month: Essays on software engineering , 1996, IEEE Annals of the History of Computing.

[54]  Naoyasu Ubayashi,et al.  Magnet or Sticky? Measuring Project Characteristics from the Perspective of Developer Attraction and Retention , 2016, J. Inf. Process..

[55]  W. Fuller,et al.  Distribution of the Estimators for Autoregressive Time Series with a Unit Root , 1979 .

[56]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..