MergeBERT: Program Merge Conflict Resolution via Neural Transformers

Collaborative software development is an integral part of the modern software development life cycle, essential to the success of largescale software projects. When multiple developers make concurrent changes around the same lines of code, a merge conflict may occur. Such conflicts stall pull requests and continuous integration pipelines for hours to several days, seriously hurting developer productivity. In this paper, we introduce MergeBERT, a novel neural program merge framework based on the token-level three-way differencing and a transformer encoder model. Exploiting restricted nature of merge conflict resolutions, we reformulate the task of generating the resolution sequence as a classification task over a set of primitive merge patterns extracted from real-world merge commit data. Our model achieves 64–69% precision of merge resolution synthesis, yielding nearly a 2× performance improvement over existing structured and neural program merge tools. Finally, we demonstrate versatility of our model, which is able to perform program merge in a multilingual setting with Java, JavaScript, TypeScript, and C# programming languages, generalizing zero-shot to unseen languages.

[1]  Ulf Asklund,et al.  Identifying Conflicts During Structural Merge , 1999 .

[2]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[3]  Matias Martinez,et al.  Fine-grained and accurate source code differencing , 2014, ASE.

[4]  Sumit Gulwani,et al.  Can Program Synthesis be Used to Learn Merge Conflict Resolutions? An Empirical Analysis , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Sven Apel,et al.  Semistructured Merge in Revision Control Systems , 2010, VaMoS.

[7]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8]  Sven Apel,et al.  Structured merge with auto-tuning: balancing precision and performance , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[9]  Xiaocheng Feng,et al.  CodeBERT: A Pre-Trained Model for Programming and Natural Languages , 2020, EMNLP.

[10]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[11]  Tom Mens,et al.  A State-of-the-Art Survey on Software Merging , 2002, IEEE Trans. Software Eng..

[12]  Premkumar T. Devanbu,et al.  A Survey of Machine Learning for Big Code and Naturalness , 2017, ACM Comput. Surv..

[13]  Jiancheng Lv,et al.  Poolingformer: Long Document Modeling with Pooling Attention , 2021, ICML.

[14]  Thomas Zimmermann Mining Workspace Updates in CVS , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[15]  Premkumar T. Devanbu,et al.  On the naturalness of software , 2016, Commun. ACM.

[16]  Kai-Wei Chang,et al.  Unified Pre-training for Program Understanding and Generation , 2021, NAACL.

[17]  Paulo Borba,et al.  Semistructured Merge in JavaScript Systems , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[18]  Sven Apel,et al.  Semistructured merge: rethinking merge in revision control systems , 2011, ESEC/FSE '11.

[19]  Andrea Janes,et al.  Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[20]  Georgios Gousios,et al.  Work practices and challenges in pull-based development: the contributor's perspective , 2015, ICSE.

[21]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[22]  Graham Neubig,et al.  Learning to Represent Edits , 2018, ICLR.

[23]  Neel Sundaresan,et al.  IntelliCode compose: code generation using transformer , 2020, ESEC/SIGSOFT FSE.

[24]  Christian Bird,et al.  Assessing the value of branches with what-if analysis , 2012, SIGSOFT FSE.

[25]  Neel Sundaresan,et al.  Unit Test Case Generation with Transformers , 2020, ArXiv.

[26]  Bernhard Westfechtel,et al.  Structure-oriented merging of revisions of software documents , 1991, SCM '91.

[27]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[28]  Neel Sundaresan,et al.  PyMT5: Multi-mode Translation of Natural Language and Python Code with Transformers , 2020, EMNLP.

[29]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[30]  Isil Dillig,et al.  Verified three-way program merge , 2018, Proc. ACM Program. Lang..

[31]  Shuvendu K. Lahiri,et al.  DeepMerge: Learning to Merge Programs , 2021, IEEE Transactions on Software Engineering.

[32]  Thibault Sellam,et al.  BLEURT: Learning Robust Metrics for Text Generation , 2020, ACL.