Latent social structure in open source projects

Commercial software project managers design project organizational structure carefully, mindful of available skills, division of labour, geographical boundaries, etc. These organizational "cathedrals" are to be contrasted with the "bazaar-like" nature of Open Source Software (OSS) Projects, which have no pre-designed organizational structure. Any structure that exists is dynamic, self-organizing, latent, and usually not explicitly stated. Still, in large, complex, successful, OSS projects, we do expect that subcommunities will form spontaneously within the developer teams. Studying these subcommunities, and their behavior can shed light on how successful OSS projects self-organize. This phenomenon could well hold important lessons for how commercial software teams might be organized. Building on known well-established techniques for detecting community structure in complex networks, we extract and study latent subcommunities from the email social network of several projects: Apache HTTPD, Python, PostgresSQL, Perl, and Apache ANT. We then validate them with software development activity history. Our results show that subcommunities do indeed spontaneously arise within these projects as the projects evolve. These subcommunities manifest most strongly in technical discussions, and are significantly connected with collaboration behaviour.

[1]  Audris Mockus,et al.  Formulation and preliminary test of an empirical theory of coordination in software engineering , 2003, ESEC/FSE-11.

[2]  Bill Curtis,et al.  A field study of the software design process for large systems , 1988, CACM.

[3]  Jesus M. Gonzalez-Barahona Community structure of modules in the Apache project , 2004, ICSE 2004.

[4]  Yunwen Ye,et al.  A socio-technical framework for supporting programmers , 2007, ESEC-FSE '07.

[5]  D HerbslebJames,et al.  Two case studies of open source software development , 2002 .

[6]  Kyongbum Lee,et al.  An algorithm for modularity analysis of directed and weighted biological networks based on edge-betweenness centrality , 2006, Bioinform..

[7]  E. Ziv,et al.  Information-theoretic approach to network modularity. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  M. Newman,et al.  On the uniform generation of random graphs with prescribed degree sequences , 2003, cond-mat/0312028.

[9]  Steven D. Eppinger,et al.  The Misalignment of Product Architecture and Organizational Structure in Complex Product Development , 2004, Manag. Sci..

[10]  Bruce A. Reed,et al.  A Critical Point for Random Graphs with a Given Degree Sequence , 1995, Random Struct. Algorithms.

[11]  Jin Xu,et al.  A Topological Analysis of the Open Souce Software Development Community , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[12]  M. Newman Analysis of weighted networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Nicolas Ducheneaut,et al.  Socialization in an Open Source Software Community: A Socio-Technical Analysis , 2005, Computer Supported Cooperative Work (CSCW).

[14]  Kim B. Clark,et al.  Architectural Innovation: The Reconfiguration of Existing Product Technologies and the Failure of , 1990 .

[15]  P. Rousseeuw,et al.  Wiley Series in Probability and Mathematical Statistics , 2005 .

[16]  Kouichi Kishida,et al.  Evolution patterns of open-source software systems and communities , 2002, IWPSE '02.

[17]  Eric S. Raymond,et al.  The cathedral and the bazaar - musings on Linux and Open Source by an accidental revolutionary , 2001 .

[18]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  U. Alon,et al.  Spontaneous evolution of modularity and network motifs. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Guido Hertel,et al.  Motivation of software developers in Open Source projects: an Internet-based survey of contributors to the Linux kernel , 2003 .

[21]  Kon Shing Kenneth Chung,et al.  Actor centrality correlates to project based coordination , 2006, CSCW '06.

[22]  M. Newman,et al.  Random graphs with arbitrary degree distributions and their applications. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  K. Sneppen,et al.  Detection of topological patterns in complex networks: correlation profile of the internet , 2002, cond-mat/0205379.

[24]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[25]  Ko Kuwabara,et al.  Linux: A Bazaar at the Edge of Chaos , 2000, First Monday.

[26]  M. E. Conway HOW DO COMMITTEES INVENT , 1967 .

[27]  Albert-László Barabási,et al.  Evolution of Networks: From Biological Nets to the Internet and WWW , 2004 .

[28]  Pamela J. Hinds,et al.  Structures that work: social structure, work structure and coordination ease in geographically distributed teams , 2006, CSCW '06.

[29]  James D. Herbsleb,et al.  Identification of coordination requirements: implications for the Design of collaboration and awareness tools , 2006, CSCW '06.

[30]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[31]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[32]  A Díaz-Guilera,et al.  Self-similar community structure in a network of human interactions. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[33]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[34]  Nicolas Ducheneaut,et al.  In Search of Coherence: A Review of E-Mail Research , 2005, Hum. Comput. Interact..

[35]  Michael Gertz,et al.  Mining email social networks in Postgres , 2006, MSR '06.

[36]  Christos Gkantsidis,et al.  The Markov Chain Simulation Method for Generating Connected Power Law Random Graphs , 2003, ALENEX.

[37]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[38]  James D. Herbsleb,et al.  Global Software Engineering: The Future of Socio-technical Coordination , 2007, Future of Software Engineering (FOSE '07).

[39]  J. S. Hunter,et al.  Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building. , 1979 .

[40]  Eric Lease Morgan,et al.  Review of The Cathedral & the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary by Eric S. Raymond, Sebastopol, Calif.: O'Reilly, 1999 , 2000 .

[41]  Jesús M. González-Barahona,et al.  Applying Social Network Analysis to the Information in CVS Repositories , 2004, MSR.

[42]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[43]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.

[44]  Jr. Frederick P. Brooks,et al.  The mythical man-month (anniversary ed.) , 1995 .

[45]  Kate Ehrlich,et al.  Leveraging expertise in global software teams: Going outside boundaries , 2006, 2006 IEEE International Conference on Global Software Engineering (ICGSE'06).

[46]  Premkumar T. Devanbu,et al.  Open Borders? Immigration in Open Source Projects , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[47]  T. Allen Managing the flow of technology , 1977 .

[48]  Ulrik Brandes,et al.  On Finding Graph Clusterings with Maximum Modularity , 2007, WG.

[49]  Dennis F. Galletta,et al.  Individual Centrality and Performance in Virtual R&D Groups: An Empirical Study , 2003, Manag. Sci..

[50]  Paul Dourish,et al.  Seeking the source: software source code as a social and technical artifact , 2005, GROUP.

[51]  D. L. Parnas,et al.  On the criteria to be used in decomposing systems into modules , 1972, Software Pioneers.

[52]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[53]  Bernardo A. Huberman,et al.  E-Mail as Spectroscopy: Automated Discovery of Community Structure within Organizations , 2005, Inf. Soc..

[54]  Daniela E. Damian,et al.  Essential communication practices for Extreme Programming in a global software development team , 2006, Inf. Softw. Technol..

[55]  Kathleen M. Carley,et al.  A Social Network Approach to Free/Open Source Software Simulation , 2005 .

[56]  Arend Hintze,et al.  Evolution of Complex Modular Biological Networks , 2007, PLoS Comput. Biol..

[57]  Mary E. Helander,et al.  Using Software Repositories to Investigate Socio-technical Congruence in Development Projects , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[58]  Peter Dalgaard,et al.  Introductory statistics with R , 2002, Statistics and computing.

[59]  P. Oscar Boykin,et al.  Personal Email Networks: An Effective Anti-Spam Tool , 2004, ArXiv.

[60]  Sergey N. Dorogovtsev,et al.  Evolution of Networks: From Biological Nets to the Internet and WWW (Physics) , 2003 .

[61]  Fred P. Brooks,et al.  The Mythical Man-Month , 1975, Reliable Software.

[62]  Kevin Crowston,et al.  The social structure of free and open source software development , 2005, First Monday.

[63]  J. S. Hunter,et al.  Statistics for experimenters : an introduction to design, data analysis, and model building , 1979 .

[64]  U. Alon Biological Networks: The Tinkerer as an Engineer , 2003, Science.

[65]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[66]  Audris Mockus,et al.  A case study of open source software development: the Apache server , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[67]  R. Guimerà,et al.  The worldwide air transportation network: Anomalous centrality, community structure, and cities' global roles , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[68]  H. Ibarra NETWORK CENTRALITY, POWER, AND INNOVATION INVOLVEMENT: DETERMINANTS OF TECHNICAL AND ADMINISTRATIVE ROLES , 1993 .

[69]  Pablo M. Gleiser,et al.  Community Structure in Jazz , 2003, Adv. Complex Syst..

[70]  Michael Gertz,et al.  Mining email social networks , 2006, MSR '06.