Surgical teams on GitHub: Modeling performance of GitHub project development processes

Abstract Context: Better methods of evaluating process performance of OSS projects can benefit decision makers who consider adoption of OSS software in a company. This article studies the closure of issues (bugs and features) in GitHub projects, which is an important measure of OSS development process performance and quality of support that project users receive from the developer team. Objective: The goal of this article is a better understanding of the factors that affect issue closure rates in OSS projects. Methodology: The GHTorrent repository is used to select a large sample of mature, active OSS projects. Using survival analysis, we calculate short-term, and long-term issue closure rates. We formulate several hypotheses regarding the impact of OSS project and team characteristics, such as measures of work centralization, measures that reflect internal project workflows, and developer social networks measures on issue closure rates. Based on the proposed features and several control features, a model is built that can predict issue closure rate. The model allows to test our hypotheses. Results: We find that large teams that have many project members have lower issue closure rates than smaller teams. Similarly, increased work centralization increases issue closure rates. While desirable social network characteristics have a positive impact on the amount of commits in a project, they do not have significant influence on issue closure. Conclusion: Overall, findings from empirical analysis support the classic notion of Brook’s – the “surgical team” – in the context of OSS project development process performance on GitHub. The model of issue closure rates proposed in this article is a first step towards an improved understanding and prediction of this important measure of OSS development process performance.

[1]  A. Agresti An introduction to categorical data analysis , 1997 .

[2]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[3]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[4]  Weiqiang Zhang,et al.  An Empirical Study of Bug Fixing Rate , 2015, 2015 IEEE 39th Annual Computer Software and Applications Conference.

[5]  C. Gini Measurement of Inequality of Incomes , 1921 .

[6]  Yuanyuan Zhou,et al.  Bug characteristics in open source software , 2013, Empirical Software Engineering.

[7]  J. Tukey Comparing individual means in the analysis of variance. , 1949, Biometrics.

[8]  Amit Kumar,et al.  Evolution of developer social network and its impact on bug fixing process , 2013, ISEC.

[9]  Sarfraz Khurshid,et al.  Understanding the triaging and fixing processes of long lived bugs , 2015, Inf. Softw. Technol..

[10]  James D. Herbsleb,et al.  Influence of social and technical factors for evaluating contribution in GitHub , 2014, ICSE.

[11]  Klaas-Jan Stol,et al.  Challenges in using open source software in product development: a review of the literature , 2010, FLOSS '10.

[12]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data , 1980 .

[13]  Jacques Klein,et al.  Got issues? Who cares about it? A large scale investigation of issue trackers from GitHub , 2013, 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE).

[14]  Aniket Kittur,et al.  Harnessing the wisdom of crowds in wikipedia: quality through coordination , 2008, CSCW.

[15]  Sanjay Misra,et al.  A Review of Models for Evaluating Quality in Open Source Software , 2013 .

[16]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[17]  T. Hassard,et al.  Applied Linear Regression , 2005 .

[18]  Adam Wierzbicki,et al.  GitHub Projects. Quality Analysis of Open-Source Software , 2014, SocInfo.

[19]  Ying Zou,et al.  Studying the fix-time for bugs in large open source projects , 2011, Promise '11.

[20]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[21]  Harald C. Gall,et al.  Predicting the fix time of bugs , 2010, RSSE '10.

[22]  Iulian Neamtiu,et al.  Bug-fix time prediction models: can we do better? , 2011, MSR '11.

[23]  Lucas D. Panjer Predicting Eclipse Bug Lifetimes , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[24]  Eirini Kalliamvakou,et al.  An in-depth study of the promises and perils of mining GitHub , 2016, Empirical Software Engineering.

[25]  Chandrasekar Subramaniam,et al.  Application of Survival Model to Understand Open Source Software Release , 2015, Pac. Asia J. Assoc. Inf. Syst..

[26]  Martin Michlmayr,et al.  A Statistical Analysis of Defects in Debian and Strategies for Improving Quality in Free Software Projects , 2006 .

[27]  James D. Herbsleb,et al.  Let's talk about it: evaluating contributions through discussion in GitHub , 2014, SIGSOFT FSE.

[28]  Abraham Bernstein,et al.  When process data quality affects the number of bugs: Correlations in software engineering datasets , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[29]  Shane McIntosh,et al.  An empirical study of the impact of modern code review practices on software quality , 2015, Empirical Software Engineering.

[30]  P. Turek,et al.  WikiTeams: How Do They Achieve Success? , 2011, IEEE Potentials.

[31]  Wen Wen,et al.  The Impact of Intellectual Property Enforcement on Open Source Software Adoption , 2010, ICIS.

[32]  Kevin Crowston,et al.  Information systems success in free and open source software development: theory and measures , 2006, Softw. Process. Improv. Pract..

[33]  Sanford Weisberg,et al.  An R Companion to Applied Regression , 2010 .

[34]  Salem S. Bahamdain Open Source Software (OSS) Quality Assurance: A Survey Paper , 2015, FNC/MobiSPC.

[35]  Piotr Turek,et al.  Learning about team collaboration from Wikipedia edit history , 2010, Int. Sym. Wikis.

[36]  Fred P. Brooks,et al.  The Mythical Man-Month , 1975, Reliable Software.

[37]  Slinger Jansen,et al.  Measuring the health of open source software ecosystems: Beyond the scope of project health , 2014, Inf. Softw. Technol..

[38]  Sandro Morasca,et al.  A Survey on Open Source Software Trustworthiness , 2011, IEEE Software.

[39]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[40]  Ahmed E. Hassan,et al.  Studying the needed effort for identifying duplicate issues , 2015, Empirical Software Engineering.

[41]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[42]  Piotr Turek,et al.  Learning About the Quality of Teamwork from Wikiteams , 2010, 2010 IEEE Second International Conference on Social Computing.

[43]  Andreas Zeller,et al.  How Long Will It Take to Fix This Bug? , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[44]  Kevin Crowston,et al.  Free/Libre open-source software development: What we know and what we do not know , 2012, CSUR.

[45]  Audris Mockus,et al.  Towards building a universal defect prediction model with rank transformed predictors , 2016, Empirical Software Engineering.

[46]  Jing Wang,et al.  Survival factors for Free Open Source Software projects: A multi-stage perspective , 2012 .

[47]  Dietmar Pfahl,et al.  Using Dynamic and Contextual Features to Predict Issue Lifetime in GitHub Projects , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[48]  Adam Wierzbicki,et al.  Interdisciplinary Matchmaking: Choosing Collaborators by Skill, Acquaintance and Trust , 2010, Computational Social Network Analysis.

[49]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[50]  Michael R. Wade,et al.  A Comprehensive Review and Synthesis of Open Source Research , 2010, J. Assoc. Inf. Syst..

[51]  Harald C. Gall,et al.  Analyzing and relating bug report data for feature tracking , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..

[52]  Michael Y. Hu,et al.  Human agency, social networks, and FOSS project success , 2012 .

[53]  Prashant Palvia,et al.  Factors affecting the success of Open Source Software , 2012, J. Syst. Softw..

[54]  Chandrasekar Subramaniam,et al.  Determinants of open source software project success: A longitudinal study , 2009, Decis. Support Syst..

[55]  InduShobha N. Chengalur-Smith,et al.  Sustainability of Free/Libre Open Source Projects: A Longitudinal Study , 2010, J. Assoc. Inf. Syst..

[56]  Param Vir Singh,et al.  Network Effects: The Influence of Structural Capital on Open Source Project Success , 2011, MIS Q..

[57]  Deepak Khazanchi,et al.  A Study on Defect Density of Open Source Software , 2010, 2010 IEEE/ACIS 9th International Conference on Computer and Information Science.

[58]  Shih-Wei Chou,et al.  The factors that affect the performance of open source software development – the perspective of social capital and expertise integration , 2011, Inf. Syst. J..

[59]  Cagatay Catal,et al.  Software fault prediction: A literature review and current trends , 2011, Expert Syst. Appl..

[60]  Tibor Gyimóthy,et al.  Software Product Quality Models , 2014, Evolving Software Systems.

[61]  Martin Pinzger,et al.  Guest editorial: mining software repositories , 2016, Empirical Software Engineering.

[62]  T. Hothorn,et al.  Simultaneous Inference in General Parametric Models , 2008, Biometrical journal. Biometrische Zeitschrift.

[63]  Thomas J. Smith,et al.  A Comparison of Logistic Regression Pseudo R 2 Indices , 2013 .