Causes of merge conflicts: a case study of ElasticSearch

Software branching and merging allows collaborative development and creating software variants, commonly referred to as clone & own. While simple and cheap, a trade-off is the need to merge code and to resolve merge conflicts, which frequently occur in practice. When resolving conflicts, a key challenge for developer is to understand the changes that led to the conflict. While merge conflicts and their characteristics are reasonably well understood, that is not the case for the actual changes that cause them. We present a case study of the changes---on the code and on the project-level (e.g., feature addition, refactoring, feature improvement)---that lead to conflicts. We analyzed the development history of ElasticSearch, a large open-source project that heavily relies on branching (forking) and merging. We inspected 40 merge conflicts in detail, sampled from 534 conflicts not resolvable by a semi-structured merge tool. On a code (structural) level, we classified the semantics of changes made. On a project-level, we categorized the decisions that motivated these changes. We contribute a categorization of code- and project-level changes and a detailed dataset of 40 conflict resolutions with a description of both levels of changes. Similar to prior studies, most of our conflicts are also small; while our categorization of code-level changes surprisingly differs from that of prior work. Refactoring, feature additions and feature enhancements are the most common causes of merge conflicts, most of which could potentially be avoided with better development tooling.

[1]  Anita Sarma,et al.  Cassandra: Proactive conflict minimization through optimized task scheduling , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[2]  Andrzej Wasowski,et al.  Intention-Based Integration of Software Variants , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[3]  Andreas Burger,et al.  FLOrIDA: Feature LOcatIon DAshboard for extracting and visualizing feature traces , 2017, VaMoS.

[4]  Marsha Chechik,et al.  Cloned product variants: from ad-hoc to managed software product lines , 2015, International Journal on Software Tools for Technology Transfer.

[5]  Andreas Burger,et al.  Semi-Automated Feature Traceability with Embedded Annotations , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[6]  Andrzej Wasowski,et al.  Forked and integrated variants in an open-source firmware project , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[7]  Jacob Krüger,et al.  Facing the Truth: Benchmarking the Techniques for the Evolution of Variant-Rich Systems , 2019, SPLC.

[8]  Patrizio Pelliccione,et al.  Variability Modeling of Service Robots: Experiences and Challenges , 2019, VaMoS.

[9]  Yuriy Brun,et al.  Proactive detection of collaboration conflicts , 2011, ESEC/FSE '11.

[10]  Georgios Gousios,et al.  Work Practices and Challenges in Pull-Based Development: The Integrator's Perspective , 2014, ICSE.

[11]  Arie van Deursen,et al.  Supporting Developers' Coordination in the IDE , 2015, CSCW.

[12]  Michael W. Godfrey,et al.  Software process recovery using Recovered Unified Process Views , 2010, 2010 IEEE International Conference on Software Maintenance.

[13]  Oscar Díaz,et al.  Tuning GitHub for SPL development: branching models & repository operations for product engineers , 2015, SPLC.

[14]  Jan-Philipp Steghöfer,et al.  The state of adoption and the challenges of systematic variability management in industry , 2020, Empirical Software Engineering.

[15]  Georgios Gousios,et al.  Work practices and challenges in pull-based development: the contributor's perspective , 2015, ICSE.

[16]  Marsha Chechik,et al.  A framework for managing cloned product variants , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[17]  Leticia Montalvillo-Mendizabal,et al.  Reducing coordination overhead in SPLs: peering in on peers , 2018, SPLC.

[18]  Paulo Borba,et al.  Evaluating and improving semistructured merge , 2017, Proc. ACM Program. Lang..

[19]  Abram Hindle Software Process Recovery: Recovering Process from Artifacts , 2010, 2010 17th Working Conference on Reverse Engineering.

[20]  António Rito Silva,et al.  Improving early detection of software merge conflicts , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[21]  Mark Staples,et al.  Experiences adopting software product line development without a product line architecture , 2004, 11th Asia-Pacific Software Engineering Conference.

[22]  Alexander Egyed,et al.  Reengineering legacy applications into software product lines: a systematic mapping , 2017, Empirical Software Engineering.

[23]  Krzysztof Czarnecki,et al.  An Exploratory Study of Cloning in Industrial Software Product Lines , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[24]  Paulo Borba,et al.  Understanding semi-structured merge conflict characteristics in open-source Java projects , 2017, Empirical Software Engineering.

[25]  Ralph Johnson,et al.  Refactoring-aware Software Configuration Management , 2006 .

[26]  Jacob Krüger,et al.  Activities and costs of re-engineering cloned variants into an integrated platform , 2020, VaMoS.

[27]  Tom Mens,et al.  A State-of-the-Art Survey on Software Merging , 2002, IEEE Trans. Software Eng..

[28]  Leonardo Murta,et al.  On the Nature of Merge Conflicts: A Study of 2,731 Open Source Java Projects Hosted by GitHub , 2020, IEEE Transactions on Software Engineering.

[29]  Shane McKee,et al.  Software Practitioner Perspectives on Merge Conflicts and Resolutions , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[30]  Thorsten Berger,et al.  Visualization of Feature Locations with the Tool FeatureDashboard , 2019, SPLC.

[31]  Michal Antkiewicz,et al.  Maintaining feature traceability with embedded annotations , 2015, SPLC.

[32]  Sven Apel,et al.  Semistructured merge: rethinking merge in revision control systems , 2011, ESEC/FSE '11.

[33]  Gunter Saake,et al.  Feature-Oriented Software Product Lines , 2013, Springer Berlin Heidelberg.

[34]  André van der Hoek,et al.  Palantir: Early Detection of Development Conflicts Arising from Parallel Code Changes , 2012, IEEE Transactions on Software Engineering.

[35]  Sarah Nadi,et al.  Clone-Based Variability Management in the Android Ecosystem , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[36]  Tammo Freese Refactoring-aware version control , 2006, ICSE '06.

[37]  Krzysztof Czarnecki,et al.  A survey of variability modeling in industrial practice , 2013, VaMoS.