Predicting Merge Conflicts in Collaborative Software Development

Background. During collaborative software development, developers often use branches to add features or fix bugs. When merging changes from two branches, conflicts may occur if the changes are inconsistent. Developers need to resolve these conflicts before completing the merge, which is an error-prone and time-consuming process. Early detection of merge conflicts, which warns developers about resolving conflicts before they become large and complicated, is among the ways of dealing with this problem. Existing techniques do this by continuously pulling and merging all combinations of branches in the background to notify developers as soon as a conflict occurs, which is a computationally expensive process. One potential way for reducing this cost is to use a machine-learning based conflict predictor that filters out the merge scenarios that are not likely to have conflicts, i.e.safe merge scenarios.Aims. In this paper, we assess if conflict prediction is feasible.Method. We design a classifier for predicting merge conflicts, based on 9 light-weight Git feature sets. To evaluate our predictor, we perform a large-scale study on 267,657 merge scenarios from 744 GitHub repositories in seven programming languages.Results. Our results show that we achieve high f1-scores, varying from 0.95 to 0.97 for different programming languages, when predicting safe merge scenarios. The f1-score is between 0.57 and 0.68 for the conflicting merge scenarios.Conclusions. Predicting merge conflicts is feasible in practice, especially in the context of predicting safe merge scenarios as a pre-filtering step for speculative merging.

[1]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[2]  Christian Bird,et al.  Assessing the value of branches with what-if analysis , 2012, SIGSOFT FSE.

[3]  Meiyappan Nagappan,et al.  Curating GitHub for engineered software projects , 2016, PeerJ Prepr..

[4]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[5]  Daniel M. Germán,et al.  The promises and perils of mining git , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[6]  Danny Dig,et al.  How do centralized and distributed version control systems impact software changes? , 2014, ICSE.

[7]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[8]  Sven Apel,et al.  Semistructured merge: rethinking merge in revision control systems , 2011, ESEC/FSE '11.

[9]  Mark Carpenter,et al.  The New Statistical Analysis of Data , 2000, Technometrics.

[10]  Michael W. Godfrey,et al.  Studying Pull Request Merges: A Case Study of Shopify's Active Merchant , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[11]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[12]  David Lo,et al.  Early prediction of merged code changes to prioritize reviewing tasks , 2018, Empirical Software Engineering.

[13]  Sven Apel,et al.  Indicators for merge conflicts in the wild: survey and empirical study , 2018, Automated Software Engineering.

[14]  António Rito Silva,et al.  Improving early detection of software merge conflicts , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[15]  Anita Sarma,et al.  Cassandra: Proactive conflict minimization through optimized task scheduling , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[16]  Jim Buffenbarger,et al.  Syntactic Software Merging , 1995, SCM.

[17]  Sven Apel,et al.  Structured merge with auto-tuning: balancing precision and performance , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[18]  Maurice G. Kendall,et al.  The Distribution of Spearman's Coefficient of Rank Correlation in a Universe in which all Rankings Occur an Equal Number of Times: , 1939 .

[19]  Leonardo Murta,et al.  On the Nature of Merge Conflicts: A Study of 2,731 Open Source Java Projects Hosted by GitHub , 2020, IEEE Transactions on Software Engineering.

[20]  Catarina Costa,et al.  TIPMerge: recommending experts for integrating changes across branches , 2016, SIGSOFT FSE.

[21]  Paulo Borba,et al.  Understanding semi-structured merge conflict characteristics in open-source Java projects , 2017, Empirical Software Engineering.

[22]  Shane McKee,et al.  Software Practitioner Perspectives on Merge Conflicts and Resolutions , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[23]  Ralph E. Johnson,et al.  Effective Software Merging in the Presence of Object-Oriented Refactorings , 2008, IEEE Transactions on Software Engineering.

[24]  Sarah Nadi,et al.  Are Refactorings to Blame? An Empirical Study of Refactorings in Merge Conflicts , 2019, 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[25]  Sarah Nadi,et al.  Scalable Software Merging Studies with MERGANSER , 2019, 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR).

[26]  Paulo Borba,et al.  Analyzing Conflict Predictors in Open-Source Java Projects , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[27]  Michele Lanza,et al.  Syde: a tool for collaborative software development , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[28]  Yijing Li,et al.  Learning from class-imbalanced data: Review of methods and applications , 2017, Expert Syst. Appl..

[29]  Bernhard Westfechtel,et al.  Structure-oriented merging of revisions of software documents , 1991, SCM '91.

[30]  Bertrand Meyer,et al.  Awareness and Merge Conflicts in Distributed Software Development , 2014, 2014 IEEE 9th International Conference on Global Software Engineering.

[31]  Yuriy Brun,et al.  Proactive detection of collaboration conflicts , 2011, ESEC/FSE '11.

[32]  Tom Mens,et al.  A State-of-the-Art Survey on Software Merging , 2002, IEEE Trans. Software Eng..

[33]  Katsuhisa Maruyama,et al.  Supporting Merge Conflict Resolution by Using Fine-Grained Code Change History , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[34]  Yuriy Brun,et al.  Early Detection of Collaboration Conflicts and Risks , 2013, IEEE Transactions on Software Engineering.

[35]  André van der Hoek,et al.  Palantir: Early Detection of Development Conflicts Arising from Parallel Code Changes , 2012, IEEE Transactions on Software Engineering.

[36]  Paulo Borba,et al.  Evaluating and improving semistructured merge , 2017, Proc. ACM Program. Lang..