Reconstructing Open Source Software Ecosystems: Finding Structure in Digital Traces

We report on the computational reconstruction of 273 open source software ecosystems, consisting of 41,388 artifacts and couplings between them, extracted from digital traces of 34.4 million software artifacts. We argue that digital traces are a new kind of data source, and propose ‘exploratory data loops’ to exploit the benefits of digital trace data in early stages of a research program. We apply this schema to systematically assess data quality, inform sample selection, and detect patterns. Empirically, we show that highly distributed networks are unlikely to follow a hierarchically modular structure, despite popular belief. As is shown visually with two examples, very distinct structures can emerge from autonomous behavior. The results indicate that different, yet similarly effective, strategies may exist to organize for distributed innovation in digital ecosystems. The paper is concluded by outlining how follow-up work will harness the reconstructed ecosystems for detecting behavioral patterns in distributed networks.

[1]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[2]  Katherine J. Stewart,et al.  Open source project success: Resource access, flow, and integration , 2016, J. Strateg. Inf. Syst..

[3]  Konstantinos Manikas,et al.  Revisiting software ecosystems Research: A longitudinal literature study , 2016, J. Syst. Softw..

[4]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[5]  Stanley Wasserman,et al.  Models and Methods in Social Network Analysis: Introduction , 2005 .

[6]  Frederick P. Brooks,et al.  No Silver Bullet: Essence and Accidents of Software Engineering , 1987 .

[7]  Josep Lluís Cano Giner,et al.  Technology Ecosystem Governance , 2013, Organ. Sci..

[8]  Georg von Krogh,et al.  Open Source Software and the "Private-Collective" Innovation Model: Issues for Organization Science , 2003, Organ. Sci..

[9]  John W. Tukey,et al.  We Need Both Exploratory and Confirmatory , 1980 .

[10]  Tom Mens,et al.  Introduction and Roadmap: History and Challenges of Software Evolution , 2008, Software Evolution.

[11]  R. Ferrer i Cancho,et al.  Scale-free networks from optimal design , 2002, cond-mat/0204344.

[12]  Paul J. Laurienti,et al.  The Ubiquity of Small-World Networks , 2011, Brain Connect..

[13]  Paul R. Carlile,et al.  A Pragmatic View of Knowledge and Boundaries: Boundary Objects in New Product Development , 2002, Organ. Sci..

[14]  Ola Henfridsson,et al.  Balancing platform control and external contribution in third‐party development: the boundary resources model , 2013, Inf. Syst. J..

[15]  Lisen Selander,et al.  Capability search and redeem across digital ecosystems , 2013, J. Inf. Technol..

[16]  Daniel M. Germán,et al.  Macro-level software evolution: a case study of a large software compilation , 2009, Empirical Software Engineering.

[17]  Nicholas Berente,et al.  Computational Approaches for Analyzing Latent Social Structures in Open Source Organizing , 2013, ICIS.

[18]  Paul A. Pavlou,et al.  Research Commentary - Seeking the Configurations of Digital Ecodynamics: It Takes Three to Tango , 2010, Inf. Syst. Res..

[19]  Gary L. Lilien,et al.  Location, Location, Location: How Network Embeddedness Affects Project Success in Open Source Systems , 2006, Manag. Sci..

[20]  Bruno Latour,et al.  Tarde's idea of quantification , 2010 .

[21]  Youngjin Yoo,et al.  Distributed Tuning of Boundary Resources: The Case of Apple's iOS Service System , 2015, MIS Q..

[22]  Paul M. Leonardi,et al.  Digital materiality? How artifacts without matter, matter , 2010, First Monday.

[23]  R. Solé,et al.  Selection, Tinkering, and Emergence in Complex Networks - Crossing the Land of Tinkering , 2002 .

[24]  Steven H. Strogatz,et al.  Small-world networks , 1999 .

[25]  Paul Gray,et al.  Special Section: Data Mining , 1999, J. Manag. Inf. Syst..

[26]  K. Lyytinen,et al.  Digital Infrastructures : The Missing IS Research Agenda , 2010 .

[27]  Bin Zhang,et al.  Generative Diffusion of Innovations and Knowledge Networks in Open Source Projects , 2014, ICIS.

[28]  Falk Uebernickel,et al.  Untangling Generativity: two Perspectives on Unanticipated Change produced by Diverse Actors , 2016, ECIS.

[29]  Youngjin Yoo,et al.  The Tables Have Turned: How Can the Information Systems Field Contribute to Technology and Innovation Management Research? , 2013, J. Assoc. Inf. Syst..

[30]  Youngjin Yoo,et al.  The Evolution of Digital Ecosystems: A Case of WordPress from 2004 to 2014 , 2015, ICIS.

[31]  Georgios Gousios,et al.  The GHTorent dataset and tool suite , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[32]  Ashley A. Bush,et al.  Platform Evolution: Coevolution of Platform Architecture, Governance, and Environmental Dynamics , 2010 .

[33]  Robert Drazin,et al.  Equifinality: Functional Equivalence in Organization Design , 1997 .

[34]  John Scott,et al.  Using Correspondence Analysis for Joint Displays of Affiliation Networks , 2005 .

[35]  Kalle Lyytinen,et al.  Wakes of Innovation in Project Networks: The Case of Digital 3-D Representations in Architecture, Engineering, and Construction , 2007, Organ. Sci..

[36]  Georg von Krogh,et al.  The Promise of Research on Open Source Software , 2006, Manag. Sci..

[37]  Kalle Lyytinen,et al.  The New Organizing Logic of Digital Innovation: An Agenda for Information Systems Research , 2011 .

[38]  Eirini Kalliamvakou,et al.  An in-depth study of the promises and perils of mining GitHub , 2016, Empirical Software Engineering.

[39]  Nicholas Berente,et al.  Toward Generalizable Sociomaterial Inquiry: A Computational Approach for Zooming In and Out of Sociomaterial Routines , 2014, MIS Q..

[40]  M. Piraveenan,et al.  Emergence of scale-free characteristics in socio-ecological systems with bounded rationality , 2015, Scientific Reports.

[41]  Kevin Crowston,et al.  Validity Issues in the Use of Social Network Analysis with Digital Trace Data , 2011, J. Assoc. Inf. Syst..

[42]  Christian Bauckhage,et al.  Insights into Internet Memes , 2011, ICWSM.

[43]  Cláudia Maria Lima Werner,et al.  Towards the Analysis of Software Projects Dependencies: An Exploratory Visual Study of Software Ecosystems , 2013, IWSECO@ICSOB.

[44]  Param Vir Singh,et al.  Network Effects: The Influence of Structural Capital on Open Source Project Success , 2011, MIS Q..

[45]  Christopher R. Myers,et al.  Software systems as complex networks: structure, function, and evolvability of software collaboration graphs , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[46]  Eric K. Clemons,et al.  Modeling the Evolution of Generativity and the Emergence of Digital Ecosystems , 2014, ICIS.

[47]  Samer Faraj,et al.  Emergence of Power Laws in Online Communities: The Role of Social Mechanisms and Preferential Attachment , 2014, MIS Q..

[48]  Kevin Crowston,et al.  C OLLABORATION T HROUGH O PEN S UPERPOSITION : A T HEORY OF THE O PEN S OURCE W AY 1 , 2016 .

[49]  Ludvig Bohlin,et al.  Community detection and visualization of networks with the map equation framework , 2014 .

[50]  Deen Freelon,et al.  On the Interpretation of Digital Trace Data in Communication and Social Computing Research , 2014 .

[51]  Kwan-Liu Ma,et al.  Large-Scale Graph Visualization and Analytics , 2013, Computer.

[52]  Mehdi Jazayeri,et al.  Some Trends in Web Application Development , 2007, Future of Software Engineering (FOSE '07).

[53]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[54]  Peter Molnar NPM & left-pad: Have We Forgotten How To Program? , 2016 .

[55]  Hadley Wickham,et al.  A Cognitive Interpretation of Data Analysis , 2014 .

[56]  Felix Naumann,et al.  Data profiling revisited , 2014, SGMD.

[57]  Kevin Crowston,et al.  Free/Libre open-source software development: What we know and what we do not know , 2012, CSUR.

[58]  Premkumar T. Devanbu,et al.  Fair and balanced?: bias in bug-fix datasets , 2009, ESEC/FSE '09.

[59]  Kalle Lyytinen,et al.  Distributed Innovation in Classes of Networks , 2008, Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008).

[60]  Walt Scacchi,et al.  Free and open source development practices in the game community , 2004, IEEE Software.

[61]  M. Zimmer “But the data is already public”: on the ethics of research in Facebook , 2010, Ethics and Information Technology.

[62]  Kalle Lyytinen,et al.  Design theory for dynamic complexity in information infrastructures: the case of building internet , 2010, J. Inf. Technol..

[63]  B. Kogut,et al.  Open-source Software Development and Distributed Innovation , 2001 .

[64]  Lars Mathiassen,et al.  Managing technological change in the digital age: the role of architectural frames , 2014, J. Inf. Technol..

[65]  Markus M. Geipel,et al.  Self-Organization applied to Dynamic Network Layout , 2007, ArXiv.

[66]  Alan MacCormack,et al.  Hidden Structure: Using Network Methods to Map System Architecture , 2014 .

[67]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[68]  Arun Rai,et al.  Editor's comments: synergies between big data and theory , 2016 .

[69]  Rahul C. Basole Topological analysis and visualization of interfirm collaboration networks in the electronics industry , 2016, Decis. Support Syst..

[70]  Kelly Blincoe,et al.  Ecosystems in GitHub and a Method for Ecosystem Identification Using Reference Coupling , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[71]  Rikard Lindgren,et al.  Digital Traces of Information Systems: Sociomateriality made Researchable , 2013, ICIS.

[72]  Alan MacCormack,et al.  Exploring the Structure of Complex Software Designs: An Empirical Study of Open Source and Proprietary Code , 2006, Manag. Sci..

[73]  Nicholas Berente,et al.  Design Principles for IT in Doubly Distributed Design Networks , 2008, ICIS.

[74]  Benoit Baudry,et al.  "May the fork be with you": novel metrics to analyze collaboration on GitHub , 2014, WETSoM 2014.

[75]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[76]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[77]  Kalle Lyytinen,et al.  Digital product innovation within four classes of innovation networks , 2016, Inf. Syst. J..

[78]  Jannis Kallinikos,et al.  The Ambivalent Ontology of Digital Artifacts , 2013, MIS Q..

[79]  Robert E. Kraut,et al.  Editorial Overview - The Interplay Between Digital and Social Networks , 2008, Inf. Syst. Res..

[80]  James D. Herbsleb,et al.  Social coding in GitHub: transparency and collaboration in an open software repository , 2012, CSCW.

[81]  Ulrik Brandes,et al.  Untangling Hairballs - From 3 to 14 Degrees of Separation , 2014, GD.

[82]  Varun Grover,et al.  NEW STATE OF PLAY IN INFORMATION SYSTEMS RESEARCH : THE PUSH TO THE EDGES 1 , 2015 .

[83]  Jan vom Brocke,et al.  Utilizing big data analytics for information systems research: challenges, promises and guidelines , 2016, Eur. J. Inf. Syst..

[84]  D. Watts A twenty-first century science , 2007, Nature.