New Developer Metrics for Open Source Software Development Challenges: An Empirical Study of Project Recommendation Systems

Software collaboration platforms where millions of developers from diverse locations can contribute to the common open source projects have recently become popular. On these platforms, various information is obtained from developer activities that can then be used as developer metrics to solve a variety of challenges. In this study, we proposed new developer metrics extracted from the issue, commit, and pull request activities of developers on GitHub. We created developer metrics from the individual activities and combined certain activities according to some common traits. To evaluate these metrics, we created an item-based project recommendation system. In order to validate this system, we calculated the similarity score using two methods and assessed top-n hit scores using two different approaches. The results for all scores with these methods indicated that the most successful metrics were binary_issue_related, issue_commented, binary_pr_related, and issue_opened. To verify our results, we compared our metrics with another metric generated from a very similar study and found that most of our metrics gave better scores that metric. In conclusion, the issue feature is more crucial for GitHub compared with other features. Moreover, commenting activity in projects can be equally as valuable as code contributions. The most of binary metrics that were generated, regardless of the number of activities, also showed remarkable results. In this context, we presented improvable and noteworthy developer metrics that can be used for a wide range of open-source software development challenges, such as user characterization, project recommendation, and code review assignment.

[1]  Eleni Stroulia,et al.  Crowdsourced bug triaging , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[2]  Hajimu Iida,et al.  Who should review my code? A file location-based code-reviewer recommendation approach for Modern Code Review , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[3]  Gang Yin,et al.  Reviewer Recommender of Pull-Requests in GitHub , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[4]  Nour El Islem Karabadji,et al.  Improving memory-based user collaborative filtering with evolutionary multi-objective optimization , 2018, Expert Syst. Appl..

[5]  Wenyuan Xu,et al.  Personalized project recommendation on GitHub , 2017, Science China Information Sciences.

[6]  Premkumar T. Devanbu,et al.  Developer onboarding in GitHub: the role of prior social links and language experience , 2015, ESEC/SIGSOFT FSE.

[7]  Harald C. Gall,et al.  Don't touch my code!: examining the effects of ownership on software quality , 2011, ESEC/FSE '11.

[8]  Shui Yu,et al.  FUIR: Fusing user and item information to deal with data sparsity by using side information in recommendation systems , 2016, J. Netw. Comput. Appl..

[9]  Gurpreet Kaur,et al.  Software Reliability, Metrics, Reliability Improvement Using Agile Process , 2014 .

[10]  David Lo,et al.  On the usefulness of ownership metrics in open-source software projects , 2015, Inf. Softw. Technol..

[11]  B. Diri,et al.  Summarising Big Data: Common GitHub Dataset for Software Engineering Challenges , 2020, ArXiv.

[12]  Martin P. Robillard,et al.  Recommendation Systems for Software Engineering , 2010, IEEE Software.

[13]  Eirini Kalliamvakou,et al.  An in-depth study of the promises and perils of mining GitHub , 2016, Empirical Software Engineering.

[14]  Ayse Tosun,et al.  [Research Paper] Periodic Developer Metrics in Software Defect Prediction , 2018, 2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[15]  Collin McMillan,et al.  Detecting similar software applications , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[16]  Lalita Sharma,et al.  A Survey of Recommendation System: Research Challenges , 2013 .

[17]  Gang Yin,et al.  Reviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment? , 2016, Inf. Softw. Technol..

[18]  S. Piantadosi Zipf’s word frequency law in natural language: A critical review and future directions , 2014, Psychonomic Bulletin & Review.

[19]  Eirini Kalliamvakou,et al.  Understanding "watchers" on GitHub , 2014, MSR 2014.

[20]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[21]  Leonardo Gresta Paulino Murta,et al.  What factors influence the reviewer assignment to pull requests? , 2018, Inf. Softw. Technol..

[22]  David Lo,et al.  Why and how developers fork what from whom in GitHub , 2017, Empirical Software Engineering.

[23]  Jordi Cabot,et al.  Three Metrics to Explore the Openness of GitHub projects , 2014, ArXiv.

[24]  Adam Wierzbicki,et al.  Choose a Job You Love: Predicting Choices of GitHub Developers , 2016, 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI).

[25]  Leonardo Gresta Paulino Murta,et al.  Developers assignment for analyzing pull requests , 2015, SAC.

[26]  Walid Maalej,et al.  Potentials and challenges of recommendation systems for software development , 2008, RSSE '08.

[27]  Leonardo Gresta Paulino Murta,et al.  Automatic assignment of integrators to pull requests: The importance of selecting appropriate attributes , 2018, J. Syst. Softw..

[28]  Bing Xie,et al.  Recommending relevant projects via user behaviour: an exploratory study on github , 2014, CrowdSoft 2014.

[29]  Adam Wierzbicki,et al.  Surgical teams on GitHub: Modeling performance of GitHub project development processes , 2018, Inf. Softw. Technol..

[30]  Guibing Guo,et al.  Resolving data sparsity and cold start in recommender systems , 2012, UMAP.

[31]  James D. Herbsleb,et al.  Influence of social and technical factors for evaluating contribution in GitHub , 2014, ICSE.

[32]  Vinay Tiwari,et al.  Open Source Software and Reliability Metrics , 2013 .

[33]  Rose F. Gamble,et al.  Trust Perceptions of Metadata in Open-Source Software: The Role of Performance and Reputation , 2020, Syst..

[34]  Bin Li,et al.  Modeling the evolution of development topics using Dynamic Topic Models , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[35]  Chao Liu,et al.  Recommending GitHub Projects for Developer Onboarding , 2018, IEEE Access.

[36]  Sebastian G. Elbaum,et al.  Code churn: a measure for estimating the impact of code change , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[37]  Ken-ichi Matsumoto,et al.  A Study of the Characteristics of Developers' Activities in GitHub , 2013, 2013 20th Asia-Pacific Software Engineering Conference (APSEC).

[38]  Michele Marchesi,et al.  On the influence of maintenance activity types on the issue resolution time , 2014, PROMISE.

[39]  James D. Herbsleb,et al.  Social coding in GitHub: transparency and collaboration in an open software repository , 2012, CSCW.