Scaling Open Source Communities: An Empirical Study of the Linux Kernel

Large-scale open source communities, such as the Linux kernel, have gone through decades of development, substantially growing in scale and complexity. In the traditional workflow, maintainers serve as “gatekeepers” for the subsystems that they maintain. As the number of patches and authors significantly increases, maintainers come under considerable pressure, which may hinder the operation and even the sustainability of the community. A few subsystems have begun to use new workflows to address these issues. However, it is unclear to what extent these new workflows are successful, or how to apply them. Therefore, we conduct an empirical study on the multiple-committer model (MCM) that has provoked extensive discussion in the Linux kernel community. We explore the effect of the model on the i915 subsystem with respect to four dimensions: pressure, latency, complexity, and quality assurance. We find that after this model was adopted, the burden of the i915 maintainers was significantly reduced. Also, the model scales well to allow more committers. After analyzing the online documents and interviewing the maintainers of i915, we propose that overloaded subsystems which have trustworthy candidate committers are suitable for adopting the model. We further suggest that the success of the model is closely related to a series of measures for risk mitigation–sufficient precommit testing, strict review process, and the use of tools to simplify work and reduce errors. We employ a network analysis approach to locate candidate committers for the target subsystems and validate this approach and contextual success factors through email interviews with their maintainers. To the best of our knowledge, this is the first study focusing on how to scale open source communities. We expect that our study will help the rapidly growing Linux kernel and other similar communities to adapt to changes and remain sustainable.

[1]  Audris Mockus,et al.  Who Will Stay in the FLOSS Community? Modeling Participant’s Initial Behavior , 2015, IEEE Transactions on Software Engineering.

[2]  Theodore S. Rappaport,et al.  Wireless communications - principles and practice , 1996 .

[3]  Michael W. Godfrey,et al.  Developer Dashboards: The Need for Qualitative Analytics , 2013, IEEE Software.

[4]  Ulrik Brandes,et al.  Network analysis of collaboration structure in Wikipedia , 2009, WWW '09.

[5]  Bob Wescott Every Computer Performance Book: How to Avoid and Solve Performance Problems on The Computers You Work With , 2013 .

[6]  Maurizio Morisio,et al.  Evidences in the evolution of OS projects through Changelog Analyses , 2003 .

[7]  Nicolas Ducheneaut,et al.  Socialization in an Open Source Software Community: A Socio-Technical Analysis , 2005, Computer Supported Cooperative Work (CSCW).

[8]  Georgios Gousios,et al.  Work Practices and Challenges in Pull-Based Development: The Integrator's Perspective , 2014, ICSE.

[9]  Marco Tulio Valente,et al.  Why modern open source projects fail , 2017, ESEC/SIGSOFT FSE.

[10]  Andrea Bonaccorsi,et al.  Why Open Source Software Can Succeed , 2003 .

[11]  Minghui Zhou,et al.  How to Communicate when Submitting Patches , 2019, Proc. ACM Hum. Comput. Interact..

[12]  Daniela Cruzes,et al.  Recommended Steps for Thematic Synthesis in Software Engineering , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[13]  Dewayne E. Perry,et al.  Metrics and laws of software evolution-the nineties view , 1997, Proceedings Fourth International Software Metrics Symposium.

[14]  Shane McIntosh,et al.  The impact of code review coverage and code review participation on software quality: a case study of the qt, VTK, and ITK projects , 2014, MSR 2014.

[15]  Daniel M. Germán,et al.  Peer Review on Open-Source Software Projects: Parameters, Statistical Models, and Theory , 2014, TSEM.

[16]  Ashish Sureka,et al.  Mining Peer Code Review System for Computing Effort and Contribution Metrics for Patch Reviewers , 2014, 2014 IEEE 4th Workshop on Mining Unstructured Data.

[17]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[18]  J. Sheth,et al.  Psychology of innovation resistance : the less developed concept (LDC) in diffusion research / BEBR No. 622 , 1979 .

[19]  Xin Tan Reducing the workload of the Linux kernel maintainers: multiple-committer model , 2019, ESEC/SIGSOFT FSE.

[20]  V. Braun,et al.  Using thematic analysis in psychology , 2006 .

[21]  Margaret-Anne D. Storey,et al.  Software Bots , 2017, IEEE Software.

[22]  Darko Marinov,et al.  Trade-offs in continuous integration: assurance, security, and flexibility , 2017, ESEC/SIGSOFT FSE.

[23]  Linus Torvalds,et al.  Just for Fun: The Story of an Accidental Revolutionary , 2001 .

[24]  D. Harhoff,et al.  Profiting from Voluntary Information Spillovers: How Users Benefit by Freely Revealing Their Innovations , 2003 .

[25]  Akito Monden,et al.  Patch Reviewer Recommendation in OSS Projects , 2013, 2013 20th Asia-Pacific Software Engineering Conference (APSEC).

[26]  Audris Mockus,et al.  On the scalability of Linux kernel maintainers' work , 2017, ESEC/SIGSOFT FSE.

[27]  Georg von Krogh,et al.  Special issue on open source software development , 2003 .

[28]  Darko Marinov,et al.  Usage, costs, and benefits of continuous integration in open-source projects , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[29]  Michael W. Godfrey,et al.  Code Review Quality: How Developers See It , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[30]  Jennifer J. Richler,et al.  Effect size estimates: current use, calculations, and interpretation. , 2012, Journal of experimental psychology. General.

[31]  N. Nachar The Mann ‐ Whitney U: A Test for Assessing Whether Two Independent Samples Come from the Same Distribution , 2007 .

[32]  Bronwyn H Hall,et al.  Adoption of New Technology , 2003 .

[33]  Debaro Huyler,et al.  Research Design: Qualitative, Quantitative, and Mixed Methods Approaches, by John Creswell and J. David Creswell. Thousand Oaks, CA: Sage Publication, Inc. 275 pages, $67.00 (Paperback). , 2019, New Horizons in Adult Education and Human Resource Development.

[34]  J. West,et al.  Challenges of Open Innovation: The Paradox of Firm Investment in Open-Source Software , 2006 .

[35]  Adam Croom,et al.  Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure / Ford Foundation , 2016 .

[36]  Tom Mens,et al.  Evolving Software Systems , 2014, Springer Berlin Heidelberg.

[37]  Michael Hilton Understanding and improving continuous integration , 2016, SIGSOFT FSE.

[38]  Tom Mens,et al.  Studying Evolving Software Ecosystems based on Ecological Models , 2014, Evolving Software Systems.

[39]  Walt Scacchi,et al.  Understanding Open Source Software Evolution: Applying, Breaking, and Rethinking the Laws of Software Evolution , 2003 .

[40]  Minnesh Kaliprasad,et al.  The human factor. I: Attracting, retaining, and motivating capable people , 2006 .

[41]  Marco Aurélio Gerosa,et al.  Should I Stale or Should I Close? An Analysis of a Bot That Closes Abandoned Issues and Pull Requests , 2019, 2019 IEEE/ACM 1st International Workshop on Bots in Software Engineering (BotSE).

[42]  Robert E. Cole,et al.  From a Firm-Based to a Community-Based Model of Knowledge Creation: The Case of the Linux Kernel Development , 2003, Organ. Sci..

[43]  Daniel M. Germán,et al.  The GNOME project: a case study of open source, global software development , 2003, Softw. Process. Improv. Pract..

[44]  Margaret-Anne Storey,et al.  Code Reviewing in the Trenches : Understanding Challenges and Best Practices , 2017 .

[45]  Donald Hedeker,et al.  A Practical Guide to Calculating Cohen’s f2, a Measure of Local Effect Size, from PROC MIXED , 2012, Front. Psychology.

[46]  Timo Aaltonen,et al.  The role of trust in OSS communities - Case Linux Kernel community , 2007, OSS.

[47]  Ken-ichi Matsumoto,et al.  Good or Bad Committers? A Case Study of Committers' Cautiousness and the Consequences on the Bug Fixing Process in the Eclipse Project , 2011, 2011 Joint Conference of the 21st International Workshop on Software Measurement and the 6th International Conference on Software Process and Product Measurement.

[48]  Theodore S. Rappaport,et al.  Wireless Communications -- Principles and Practice, Second Edition. (The Book End) , 2002 .

[49]  TanXin,et al.  How to Communicate when Submitting Patches , 2019 .

[50]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[51]  Daniel M. Germán,et al.  Continuously mining distributed version control systems: an empirical study of how Linux uses Git , 2014, Empirical Software Engineering.

[52]  Audris Mockus,et al.  A case study of open source software development: the Apache server , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[53]  Georgios Gousios,et al.  Work practices and challenges in pull-based development: the contributor's perspective , 2015, ICSE.

[54]  Miguel P Caldas,et al.  Research design: qualitative, quantitative, and mixed methods approaches , 2003 .

[55]  Audris Mockus,et al.  Effectiveness of code contribution: from patch-based to pull-request-based tools , 2016, SIGSOFT FSE.

[56]  Andrew V. Goldberg,et al.  Finding a Maximum Density Subgraph , 1984 .