Framing Program Repair as Code Completion

Many techniques have contributed to the advancement of auto-mated program repair, such as: generate and validate approaches, constraint-based solvers and even neural machine translation. Si-multaneously, artificial intelligence has allowed the creation of general-purpose pre-trained models that support several down-stream tasks. In this paper, we describe a technique that takes advantage of a generative model - CodeGPT - to automatically repair buggy programs by making use of its code completion capa-bilities. We also elaborate on where to perform code completion in a buggy line and how we circumvent the open-ended nature of code generation to appropriately fit the new code in the original pro-gram. Furthermore, we validate our approach on the ManySStuBs4J dataset containing real-world open-source projects and show that our tool is able to fix 1739 programs out of 6415 - a 27% repair rate. The repaired programs range from single-line changes to multiple line modifications. In fact, our technique is able to fix programs which were missing relatively complex expressions prior to being analyzed. In the end, we present case studies that showcase different scenarios our technique was able to handle.

[1]  J. Saraiva,et al.  On Understanding Contextual Changes of Failures , 2021, 2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS).

[2]  Hadi Hemmati,et al.  Applying CodeBERT for Automated Program Repair of Java Simple Bugs , 2021, 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR).

[3]  Thibaud Lutellier,et al.  CURE: Code-Aware Neural Machine Translation for Automatic Program Repair , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[4]  Neel Sundaresan,et al.  CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation , 2021, NeurIPS Datasets and Benchmarks.

[5]  Shin Hwei Tan,et al.  Automated Patch Transplantation , 2020, ACM Trans. Softw. Eng. Methodol..

[6]  Baishakhi Ray,et al.  Patching as Translation: the Data and the Metaphor , 2020, 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[7]  Yitong Li,et al.  CoCoNuT: combining context-aware neural translation models using ensemble for program repair , 2020, ISSTA.

[8]  Shaohua Wang,et al.  DLFix: Context-based Code Transformation Learning for Automated Program Repair , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[9]  Neel Sundaresan,et al.  IntelliCode compose: code generation using transformer , 2020, ESEC/SIGSOFT FSE.

[10]  Ting Liu,et al.  CodeBERT: A Pre-Trained Model for Programming and Natural Languages , 2020, FINDINGS.

[11]  Claire Le Goues,et al.  Automated program repair , 2019, Commun. ACM.

[12]  Charles Sutton,et al.  How Often Do Single-Statement Bugs Occur? The ManySStuBs4J Dataset , 2019, 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR).

[13]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[14]  Denys Poshyvanyk,et al.  SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair , 2018, IEEE Transactions on Software Engineering.

[15]  Martin Monperrus,et al.  Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs , 2018, IEEE Transactions on Software Engineering.

[16]  Rui Abreu,et al.  Leveraging Qualitative Reasoning to Improve SFL , 2018, IJCAI.

[17]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[18]  Michael D. Ernst,et al.  Evaluating and Improving Fault Localization , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[19]  Rui Abreu,et al.  A Survey on Software Fault Localization , 2016, IEEE Transactions on Software Engineering.

[20]  Martin Monperrus,et al.  DynaMoth: Dynamic Code Synthesis for Automatic Program Repair , 2016, 2016 IEEE/ACM 11th International Workshop in Automation of Software Test (AST).

[21]  Mark Harman,et al.  Automated software transplantation , 2015, ISSTA.

[22]  Dawei Qi,et al.  SemFix: Program repair via semantic analysis , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[23]  Rui Abreu,et al.  GZoltar: an eclipse plug-in for testing and debugging , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[24]  Andrea Arcuri,et al.  Evolutionary repair of faulty software , 2011, Appl. Soft Comput..

[25]  Mira Mezini,et al.  Learning from examples to improve code completion systems , 2009, ESEC/SIGSOFT FSE.

[26]  James A. Jones,et al.  Visualization of test information to assist fault localization , 2002, Proceedings of the 24th International Conference on Software Engineering. ICSE 2002.

[27]  Richard J. Lipton,et al.  Hints on Test Data Selection: Help for the Practicing Programmer , 1978, Computer.

[28]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[29]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[30]  Claire Le Goues,et al.  GenProg: A Generic Method for Automatic Software Repair , 2012, IEEE Transactions on Software Engineering.