How do Multiple Pull Requests Change the Same Code: A Study of Competing Pull Requests in GitHub

GitHub is a widely used collaborative platform for global software development. A pull request plays an important role in bridging code changes with version controlling. Developers can freely and parallelly submit pull requests to base branches and wait for the merge of their contributions. However, several developers may submit pull requests to edit the same lines of code; such pull requests result in a latent collaborative conflict. We refer such pull requests that tend to change the same lines and remain open during an overlapping time period to as competing pull requests. In this paper, we conduct a study on 9,476 competing pull requests from 60 Java repositories in GitHub. The data are collected by mining pull requests that are submitted in 2017 from top Java projects with the most forks. We explore how multiple pull requests change the same code via answering four research questions, including the distribution of competing pull requests, the involved developers, the changed lines of code, and the impact on pull request integration. Our study shows that there indeed exist competing pull requests in GitHub: in 45 out of 60 repositories, over 31% of pull requests belong to competing pull requests; 20 repositories have more than 100 groups of competing pull requests, each of which is submitted by over five developers; 42 repositories have over 10% of competing pull requests with over 10 same lines of code. Meanwhile, we observe that attributes of competing pull requests do not have strong impacts on pull request integration, comparing with other types of pull requests. Our study provides a preliminary analysis for further research that aims to detect and eliminate conflicts among competing pull requests.

[1]  Sven Apel,et al.  Structured merge with auto-tuning: balancing precision and performance , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[2]  Yuriy Brun,et al.  Early Detection of Collaboration Conflicts and Risks , 2013, IEEE Transactions on Software Engineering.

[3]  Richard M. Stallman,et al.  Comparing and Merging Files , 2016 .

[4]  Sven Apel,et al.  Semistructured merge: rethinking merge in revision control systems , 2011, ESEC/FSE '11.

[5]  Georgios Gousios,et al.  Work practices and challenges in pull-based development: the contributor's perspective , 2015, ICSE.

[6]  Georgios Gousios,et al.  Oops, My Tests Broke the Build: An Explorative Analysis of Travis CI with GitHub , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[7]  Paulo Borba,et al.  Understanding semi-structured merge conflict characteristics in open-source Java projects , 2017, Empirical Software Engineering.

[8]  Audris Mockus,et al.  Patterns of folder use and project popularity: a case study of github repositories , 2014, ESEM '14.

[9]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[10]  Audris Mockus,et al.  Effectiveness of code contribution: from patch-based to pull-request-based tools , 2016, SIGSOFT FSE.

[11]  Baowen Xu,et al.  Revisit of Automatic Debugging via Human Focus-Tracking Analysis , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[12]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[13]  Thomas Zimmermann Mining Workspace Updates in CVS , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[14]  Paulo Borba,et al.  Evaluating and improving semistructured merge , 2017, Proc. ACM Program. Lang..

[15]  Foutse Khomh,et al.  Evaluating the impact of design pattern and anti-pattern dependencies on changes and faults , 2015, Empirical Software Engineering.

[16]  Matias Martinez,et al.  B-Refactoring: Automatic test code refactoring to improve dynamic analysis , 2016, Information and Software Technology.

[17]  James D. Herbsleb,et al.  Social coding in GitHub: transparency and collaboration in an open software repository , 2012, CSCW.

[18]  David Lo,et al.  Understanding inactive yet available assignees in GitHub , 2017, Inf. Softw. Technol..

[19]  Yuriy Brun,et al.  Proactive detection of collaboration conflicts , 2011, ESEC/FSE '11.

[20]  Anita Sarma,et al.  Cassandra: Proactive conflict minimization through optimized task scheduling , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[21]  Shane McIntosh,et al.  Predicting Build Co-changes with Source Code Change and Commit Categories , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[22]  Georgios Gousios,et al.  Strong agile metrics: mining log data to determine predictive power of software metrics for continuous delivery teams , 2017, ESEC/SIGSOFT FSE.

[23]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[24]  Harvey P. Siy,et al.  Parallel changes in large scale software development: an observational case study , 1998, Proceedings of the 20th International Conference on Software Engineering.