Grammar-Based Patches Generation for Automated Program Repair

Automated program repair (APR) aims to find an automatic solution to program language bugs without human intervention, and it can potentially reduce debugging costs and improve software quality. Conventional approaches adopt learning-based methods such as sequence-to-sequence models for the patches generation. However, they tend to ignore the code structure information and suffer from grammar and syntax errors. To consider the grammar and syntax information, in this paper, we propose a grammar-based ruleto-rule model, which regards the repair process as the transformation of grammar rules, and leverages two encoders modeling both the original token sequence and the grammar rules, enhanced with a new tree-based self-attention. Besides, to guarantee grammar correctness, we employ a grammatically restricted inference method to generate each grammar rule in a legally constrained sub-search-space considering the generated previous rules. Experimental evaluations on a Java dataset demonstrate that the proposed approach significantly outperforms the state-of-the-art baselines in terms of generated code accuracy.

[1]  Aditya Kanade,et al.  Neural Program Repair by Jointly Learning to Localize and Repair , 2019, ICLR.

[2]  Jing Wang,et al.  Fault Localization Analysis Based on Deep Neural Network , 2016 .

[3]  Denys Poshyvanyk,et al.  SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair , 2018, IEEE Transactions on Software Engineering.

[4]  Ming Wen,et al.  Context-Aware Patch Generation for Better Automated Program Repair , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[5]  Dawei Qi,et al.  SemFix: Program repair via semantic analysis , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[6]  Neel Sundaresan,et al.  CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation , 2021, NeurIPS Datasets and Benchmarks.

[7]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[8]  Abhik Roychoudhury,et al.  Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[9]  Pushmeet Kohli,et al.  Neuro-Symbolic Program Corrector for Introductory Programming Assignments , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[10]  David Lo,et al.  Deep Code Comment Generation , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[11]  Ming Zhou,et al.  GraphCodeBERT: Pre-training Code Representations with Data Flow , 2020, ICLR.

[12]  Rahul Gupta,et al.  DeepFix: Fixing Common C Language Errors by Deep Learning , 2017, AAAI.

[13]  Gabriele Bavota,et al.  An Empirical Investigation into Learning Bug-Fixing Patches in the Wild via Neural Machine Translation , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[14]  Percy Liang,et al.  Graph-based, Self-Supervised Program Repair from Diagnostic Feedback , 2020, ICML.

[15]  Xiaodong Gu,et al.  Deep Code Search , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[16]  Yuhua Qi,et al.  The strength of random search on automated program repair , 2014, ICSE.

[17]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[18]  Ming Zhou,et al.  CodeBLEU: a Method for Automatic Evaluation of Code Synthesis , 2020, ArXiv.

[19]  Shan Lu,et al.  Automated atomicity-violation fixing , 2011, PLDI '11.

[20]  Andreas Zeller,et al.  How Long Will It Take to Fix This Bug? , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[21]  Claire Le Goues,et al.  Automatically finding patches using genetic programming , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[22]  Neel Sundaresan,et al.  IntelliCode compose: code generation using transformer , 2020, ESEC/SIGSOFT FSE.