EARec: Leveraging Expertise and Authority for Pull-Request Reviewer Recommendation in GitHub

Pull-Request (PR) is a primary way of code contribution from developers to improve quality of software projects in GitHub. For a popular GitHub project, tens of PR are submitted daily, while only a small number of developers, i.e core developers, have the grant to judge whether to merge these changes into the main branches or not. Due to the time-consumption of PR review and the diversity of PR aspects, it is becoming a big challenge for core developers to quickly discover the useful PR. Currently, recommending appropriate reviewers (developers) for incoming PR to quickly collect meaningful comments, is treated as an effective and crowdsourced way to help core developers to make decisions and thus accelerate project development. In this paper, we propose a reviewer recommendation approach (EARec) which simultaneously considers developer expertise and authority. Specifically, we first construct a graph of incoming PR and possible reviewers, and then take advantage of text similarity of PR and social relations of reviewers to find the appropriate reviewers. The experimental analysis on MSR Mining Challenge Dataset\footnote{http://ghtorrent.org/msr14.html} provides good evaluation for our approach in terms of precision and recall.

[1]  Georgios Gousios,et al.  Work practices and challenges in pull-based development: the contributor's perspective , 2015, ICSE.

[2]  Premkumar T. Devanbu,et al.  Wait for It: Determinants of Pull Request Evaluation Latency on GitHub , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[3]  Gang Yin,et al.  Who Should Review this Pull-Request: Reviewer Recommendation to Expedite Crowd Collaboration , 2014, 2014 21st Asia-Pacific Software Engineering Conference.

[4]  James D. Herbsleb,et al.  Let's talk about it: evaluating contributions through discussion in GitHub , 2014, SIGSOFT FSE.

[5]  Nasir D. Memon,et al.  A robust model for paper reviewer assignment , 2014, RecSys '14.

[6]  Georgios Gousios,et al.  Work Practices and Challenges in Pull-Based Development: The Integrator's Perspective , 2014, ICSE.

[7]  Gang Yin,et al.  Reviewer Recommender of Pull-Requests in GitHub , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[8]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[9]  James D. Herbsleb,et al.  Influence of social and technical factors for evaluating contribution in GitHub , 2014, ICSE.

[10]  Chanchal Kumar Roy,et al.  An insight into the pull requests of GitHub , 2014, MSR 2014.

[11]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[12]  Georgios Gousios,et al.  The GHTorent dataset and tool suite , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[13]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[14]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[15]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[16]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.