Discovering community patterns in open-source: a systematic approach and its evaluation

Abstract“There can be no vulnerability without risk; there can be no community without vulnerability; there can be no peace, and ultimately no life, without community.” - [M. Scott Peck] The open-source phenomenon has reached the point in which it is virtually impossible to find large applications that do not rely on it. Such grand adoption may turn into a risk if the community regulatory aspects behind open-source work (e.g., contribution guidelines or release schemas) are left implicit and their effect untracked. We advocate the explicit study and automated support of such aspects and propose Yoshi (Y ielding O pen-S ource H ealth I nformation), a tool able to map open-source communities onto community patterns, sets of known organisational and social structure types and characteristics with measurable core attributes. This mapping is beneficial since it allows, for example, (a) further investigation of community health measuring established characteristics from organisations research, (b) reuse of pattern-specific best-practices from the same literature, and (c) diagnosis of organisational anti-patterns specific to open-source, if any. We evaluate the tool in a quantitative empirical study involving 25 open-source communities from GitHub, finding that the tool offers a valuable basis to monitor key community traits behind open-source development and may form an effective combination with web-portals such as OpenHub or Bitergia. We made the proposed tool open source and publicly available.

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  Helmut Krcmar,et al.  Identification of Success and Failure Factors of Two Agile Software Development Teams in an Open Source Organization , 2009, 2009 Fourth IEEE International Conference on Global Software Engineering.

[3]  Alma T. Mintu Cultures and Organizations: Software of the Mind , 1992 .

[4]  Ronald Rousseau,et al.  Social network analysis: a powerful strategy, also for the information sciences , 2002, J. Inf. Sci..

[5]  Ken-ichi Matsumoto,et al.  Characteristics of Sustainable OSS Projects: A Theoretical and Empirical Study , 2015, 2015 IEEE/ACM 8th International Workshop on Cooperative and Human Aspects of Software Engineering.

[6]  Alexander Serebrenik,et al.  Perceptions of Diversity on Git Hub: A User Survey , 2015, 2015 IEEE/ACM 8th International Workshop on Cooperative and Human Aspects of Software Engineering.

[7]  Rick Kazman,et al.  The Architect's Role in Community Shepherding , 2016, IEEE Software.

[8]  T. Elkins,et al.  Leadership in research and development organizations: A literature review and conceptual framework , 2003 .

[9]  Kevin Crowston,et al.  Free/Libre open-source software development: What we know and what we do not know , 2012, CSUR.

[10]  Luis Felipe Giraldo,et al.  Dynamic Task Performance, Cohesion, and Communications in Human Groups , 2016, IEEE Transactions on Cybernetics.

[11]  David G. Glance Release criteria for the Linux kernel , 2004, First Monday.

[12]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[13]  T. C. Edwin Cheng,et al.  Institutional isomorphism and the adoption of information technology for supply chain management , 2006, Comput. Ind..

[14]  Felipe Ortega,et al.  ALERT: Active Support and Real-Time Coordination Based on Event Processing in Open Source Software Development , 2011, 2011 15th European Conference on Software Maintenance and Reengineering.

[15]  H. D. Rombach,et al.  The Goal Question Metric Approach , 1994 .

[16]  Patricia Lago,et al.  Organizational social structures for software engineering , 2013, CSUR.

[17]  Ken-ichi Matsumoto,et al.  Software population pyramids: the current and the future of OSS development communities , 2014, ESEM '14.

[18]  Pedro Antunes,et al.  Beyond formal processes: augmenting workflow with group interaction techniques , 1995, COCS '95.

[19]  Marco Aurélio Gerosa,et al.  Social Barriers Faced by Newcomers Placing Their First Contribution in Open Source Software Projects , 2015, CSCW.

[20]  K. Prandy The social interaction approach to the measurement and analysis of social stratification , 1999 .

[21]  Jonas Gamalielsson,et al.  Sustainability of Open Source software communities beyond a fork: How and why has the LibreOffice project evolved? , 2014, J. Syst. Softw..

[22]  D. R. White,et al.  Structural cohesion and embeddedness: A hierarchical concept of social groups , 2003 .

[23]  Elli Georgiadou,et al.  Empirical Measurement of the Effects of Cultural Diversity on Software Quality Management , 2002, Software Quality Journal.

[24]  Hyo-Sook Kim,et al.  A Multilevel Study of Antecedents and a Mediator of Employee—Organization Relationships , 2007 .

[25]  Patricia Lago,et al.  On the Nature of GSE Organizational Social Structures: An Empirical Study , 2012, 2012 IEEE Seventh International Conference on Global Software Engineering.

[26]  Bart Nooteboom,et al.  Optimal Cognitive Distance and Absorptive Capacity , 2005 .

[27]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Tibor Gyimóthy,et al.  Software Product Quality Models , 2014, Evolving Software Systems.

[29]  Alexander Serebrenik,et al.  STRESS: A Semi-Automated, Fully Replicable Approach for Project Selection , 2017, 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[30]  Ludovico Prattico Governance of Open Source Software Foundations: Who Holds the Power? , 2012 .

[31]  Marco Tulio Valente,et al.  Why we refactor? confessions of GitHub contributors , 2016, SIGSOFT FSE.

[32]  Cemal Yilmaz,et al.  Software Metrics , 2008, Wiley Encyclopedia of Computer Science and Engineering.

[33]  Vincent A. Traag,et al.  Significant Scales in Community Structure , 2013, Scientific Reports.

[34]  Charles R. Severance The Apache Software Foundation: Brian Behlendorf , 2012, Computer.

[35]  Stephan Druskat A Proposal for the Measurement and Documentation of Research Software Sustainability in Interactive Metadata Repositories , 2016, ArXiv.

[36]  Maurizio Morisio,et al.  Characteristics of open source projects , 2003, Seventh European Conference onSoftware Maintenance and Reengineering, 2003. Proceedings..

[37]  Michael J. Muller,et al.  Understanding the benefit and costs of communities of practice , 2002, CACM.

[38]  Akito Monden,et al.  Investigating and Projecting Population Structures in Open Source Software Projects: A Case Study of Projects in GitHub , 2016, IEICE Trans. Inf. Syst..

[39]  Mark Shevlin,et al.  Employee attitudes to the distribution of organizational influence: Who should have the most influence on which issues? , 2011 .

[40]  Juan Ramon Oreja-Rodriguez,et al.  Knowledge structures of organisational environments: study of perceived uncertainty , 2006, Int. J. Knowl. Learn..

[41]  R. Cross,et al.  A practical guide to social networks. , 2005, Harvard business review.

[42]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[43]  Norman E. Fenton,et al.  Software Metrics: A Rigorous Approach , 1991 .

[44]  W. J. Conover,et al.  Practical Nonparametric Statistics , 1972 .

[45]  Marco Tulio Valente,et al.  Understanding the Factors That Impact the Popularity of GitHub Repositories , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[46]  Gabriele Bavota,et al.  Mining Version Histories for Detecting Code Smells , 2015, IEEE Transactions on Software Engineering.

[47]  Daniela E. Damian,et al.  Global Software Development and Delay: Does Distance Still Matter? , 2008, 2008 IEEE International Conference on Global Software Engineering.

[48]  Dariusz Jemielniak,et al.  Inequalities in Open Source Software Development: Analysis of Contributor’s Commits in Apache Software Foundation Projects , 2016, PloS one.

[49]  Mikhail Tikhonov,et al.  Community-level cohesion without cooperation , 2015, eLife.

[50]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[51]  D. Sandy Staples,et al.  The Governance and Control of Open Source Software Projects , 2013, J. Manag. Inf. Syst..

[52]  Nicole Novielli,et al.  Towards discovering the role of emotions in stack overflow , 2014, SSE@SIGSOFT FSE.

[53]  Victor R. Basili,et al.  The influence of organizational structure on software quality , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[54]  Kevin Crowston,et al.  Assessing the health of open source communities , 2006, Computer.

[55]  Charles M. Schweik Sustainability in Open Source Software Commons: Lessons Learned from an Empirical Study of SourceForge Projects , 2013 .

[56]  E. Wenger Communities of Practice: Learning, Meaning, and Identity , 1998 .

[57]  Jesús M. González-Barahona,et al.  Evolution of the core team of developers in libre software projects , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[58]  Gert Jan Hofstede,et al.  Cultures and Organizations: Software of the Mind, 3rd ed. , 2010 .

[59]  W. Powell,et al.  The iron cage revisited institutional isomorphism and collective rationality in organizational fields , 1983 .

[60]  J. Hintze,et al.  Violin plots : A box plot-density trace synergism , 1998 .

[61]  Dirk Homscheid,et al.  Between organization and community: investigating turnover intention factors of firm-sponsored open source software developers , 2016, WebSci.

[62]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[63]  Yuanfang Cai,et al.  Design rule spaces: a new form of architecture insight , 2014, ICSE.

[64]  Rafael Prikladnicki Propinquity in global software engineering: examining perceived distance in globally distributed project teams , 2012, J. Softw. Maintenance Res. Pract..

[65]  Alexander Serebrenik,et al.  How do community smells influence code smells? , 2018, ICSE.

[66]  Eirini Kalliamvakou,et al.  An in-depth study of the promises and perils of mining GitHub , 2016, Empirical Software Engineering.

[67]  Alexander Serebrenik,et al.  Code of conduct in open source projects , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[68]  Slinger Jansen,et al.  Measuring the health of open source software ecosystems: Beyond the scope of project health , 2014, Inf. Softw. Technol..

[69]  Yan Li,et al.  Leadership characteristics and developers' motivation in open source software development , 2012, Inf. Manag..

[70]  Shie Mannor,et al.  Community Detection via Measure Space Embedding , 2015, NIPS.

[71]  Prof. K. D. Raju Is the Future of Software Development in Open Source? Proprietary vs. Open Source Software: A Cross Country Analysis , 2007 .

[72]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[73]  Kamran Ali Chatha Multi-process modelling approach to complex organisation design , 2004 .

[74]  Chongjun Yang,et al.  An active crawler for discovering geospatial Web services and their distribution pattern – A case study of OGC Web Map Service , 2010, Int. J. Geogr. Inf. Sci..

[75]  Stephen P. Borgatti,et al.  Social Networks and Organizations , 2012 .

[76]  Giuliano Antoniol,et al.  The Use of Text Retrieval and Natural Language Processing in Software Engineering , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[77]  J. Fredrickson The Strategic Decision Process and Organizational Structure , 1986 .

[78]  Susan H. Nielsen Software Quality Management and Organisational Fit , 1995, Australas. J. Inf. Syst..

[79]  Yuosre F. Badir,et al.  The impacts of person-organisation fit and perceived organisational support on innovative work behaviour: the mediating effects of knowledge sharing behaviour , 2015, Int. J. Inf. Syst. Chang. Manag..

[80]  Margaret M. Burnett,et al.  Open Source Barriers to Entry, Revisited: A Sociotechnical Perspective , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[81]  Patricia Lago,et al.  Uncovering Latent Social Communities in Software Development , 2013, IEEE Software.

[82]  Paul Clements,et al.  Software architecture in practice , 1999, SEI series in software engineering.

[83]  Claes Wohlin,et al.  Experimentation in software engineering: an introduction , 2000 .

[84]  Bert M. Sadowski,et al.  Transition of governance in a mature open software source community: Evidence from the Debian case , 2008, Inf. Econ. Policy.

[85]  Sogol Balali,et al.  Newcomers’ Barriers. . . Is That All? An Analysis of Mentors’ and Newcomers’ Barriers in OSS Projects , 2018, Computer Supported Cooperative Work (CSCW).

[86]  Kirti Ruikar,et al.  Communities of practice in construction case study organisations : questions and insights , 2009 .

[87]  Harri Ryynänen,et al.  A social network analysis of internal communication in a matrix organisation - the context of project business , 2012, Int. J. Bus. Inf. Syst..

[88]  Premkumar T. Devanbu,et al.  Gender and Tenure Diversity in GitHub Teams , 2015, CHI.

[89]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[90]  Etienne Wenger,et al.  Communities of Practice: Learning, Meaning, and Identity , 1998 .

[91]  Marjo Kauppinen,et al.  The role of user involvement in requirements quality and project success , 2005, 13th IEEE International Conference on Requirements Engineering (RE'05).

[92]  G. Hofstede,et al.  Cultures and Organizations: Software of the Mind , 1991 .

[93]  Peter Molzberger Analyzing Mental Representation by Means of NLP (Neuro Linguistic Programming) , 1986, WOPPLOT.

[94]  Robert L. Nord,et al.  Reducing Friction in Software Development , 2016, IEEE Software.

[95]  A. Medus,et al.  Detection of community structures in networks via global optimization , 2005 .

[96]  Iman Keivanloo,et al.  A Linked Data platform for mining software repositories , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[97]  Philip J. Guo,et al.  Paradise unplugged: identifying barriers for female participation on stack overflow , 2016, SIGSOFT FSE.

[98]  Alexander Serebrenik,et al.  Poster: How Do Community Smells Influence Code Smells? , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[99]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[100]  Massimiliano Di Penta,et al.  Combining Quantitative and Qualitative Studies in Empirical Software Engineering Research , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[101]  Alexander Serebrenik,et al.  On negative results when using sentiment analysis tools for software engineering research , 2017, Empirical Software Engineering.

[102]  Philippe Kruchten,et al.  Social debt in software engineering: insights from industry , 2015, Journal of Internet Services and Applications.

[103]  Jean-Yves Antoine,et al.  Weighted Krippendorff’s alpha is a more reliable metrics for multi-coders ordinal annotations: experimental studies on emotion, opinion and coreference annotation , 2014, EACL.

[104]  Mario Piattini,et al.  Problems and Solutions in Distributed Software Development: A Systematic Review , 2008, SEAFOOD.

[105]  Giampaolo Garzarelli,et al.  Capability Coordination in Modular Organization: Voluntary FS/OSS Production and the Case of Debian GNU/Linux , 2003 .

[106]  Daniel Gatica-Perez,et al.  Estimating Cohesion in Small Groups Using Audio-Visual Nonverbal Behavior , 2010, IEEE Transactions on Multimedia.

[107]  Aniket Kittur,et al.  Organizing without formal organization: group identification, goal setting and social modeling in directing online production , 2012, CSCW.

[108]  Yair Wand,et al.  Organizational memory information systems: a transactive memory approach , 2005, Decis. Support Syst..

[109]  Gregorio Robles,et al.  Tools and Datasets for Mining Libre Software Repositories , 2011 .

[110]  Yoshiharu Kohayakawa,et al.  JumpNet: Improving Connectivity and Robustness in Unstructured P2P Networks by Randomness , 2008, Internet Math..

[111]  Harald C. Gall,et al.  Putting It All Together: Using Socio-technical Networks to Predict Failures , 2009, 2009 20th International Symposium on Software Reliability Engineering.

[112]  Chiara Francalanci,et al.  An Empirical Study on the Relationship among Software Design Quality , Development Effort , and Governance in Open Source Projects , 2008 .

[113]  Morgan P. Miles,et al.  Exploring entrepreneurial marketing , 2015 .

[114]  Leo Egghe,et al.  a Measure for the Cohesion of Weighted Networks , 2003, J. Assoc. Inf. Sci. Technol..

[115]  Meiyappan Nagappan,et al.  Curating GitHub for engineered software projects , 2017, Empirical Software Engineering.

[116]  Charles R. Severance The Art of Teaching Computer Science: Niklaus Wirth , 2012, Computer.

[117]  Andrea De Lucia,et al.  [Journal First] The Scent of a Smell: An Extensive Comparison Between Textual and Structural Smells , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[118]  Brendan Murphy,et al.  Can developer-module networks predict failures? , 2008, SIGSOFT '08/FSE-16.

[119]  Robert E. Kraut,et al.  Coordination in software development , 1995, CACM.

[120]  Li-Te Cheng,et al.  Jazzing up Eclipse with collaborative tools , 2003, eclipse '03.

[121]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.

[122]  Jörg D. Becker,et al.  Proceedings of the Workshop on Parallel Processing: Logic, Organization, and Technology , 1986 .

[123]  Marco Tulio Valente,et al.  Why modern open source projects fail , 2017, ESEC/SIGSOFT FSE.

[124]  Giuliano Casale,et al.  Cognitive Distance and Research Output in Computing Education: A Case-Study , 2019, IEEE Transactions on Education.

[125]  Marco Aurélio Gerosa,et al.  Applying the 3C model to groupware development , 2005, Int. J. Cooperative Inf. Syst..