论文信息 - TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer

TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer

The problem of fixing errors in programs has attracted substantial interest over the years. The key challenge for building an effective code fixing tool is to capture a wide range of errors and meanwhile maintain high accuracy. In this paper, we address this challenge and present a new learning-based system, called TFix. TFix works directly on program text and phrases the problem of code fixing as a text-to-text task. In turn, this enables it to leverage a powerful Transformer based model pre-trained on natural language and fine-tuned to generate code fixes (via a large, highquality dataset obtained from GitHub commits). TFix is not specific to a particular programming language or class of defects and, in fact, improved its precision by simultaneously fine-tuning on 52 different error types reported by a popular static analyzer. Our evaluation on a massive dataset of JavaScript programs shows that TFix is practically effective: it is able to synthesize code that fixes the error in ∼67 percent of cases and significantly outperforms existing learning-based approaches.

[1] Yuriy Brun,et al. Is the cure worse than the disease? overfitting in automated program repair , 2015, ESEC/SIGSOFT FSE.

[2] Andrew Rice,et al. Learning to Fix Build Errors with Graph2Diff Neural Networks , 2019, ICSE.

[3] Xiaocheng Feng,et al. CodeBERT: A Pre-Trained Model for Programming and Natural Languages , 2020, EMNLP.

[4] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[5] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6] Sarfraz Khurshid,et al. Towards Practical Program Repair with On-demand Candidate Generation , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[7] Yitong Li,et al. CoCoNuT: combining context-aware neural translation models using ensemble for program repair , 2020, ISSTA.

[8] Omer Levy,et al. Structural Language Models of Code , 2019, ICML.

[9] Dawei Qi,et al. SemFix: Program repair via semantic analysis , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[10] Isil Dillig,et al. LambdaNet: Probabilistic Type Inference using Graph Neural Networks , 2020, ICLR.

[11] Andrea Janes,et al. Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[12] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[13] Uri Alon,et al. code2vec: learning distributed representations of code , 2018, Proc. ACM Program. Lang..

[14] Martin Monperrus,et al. NPEFix: Automatic Runtime Repair of Null Pointer Exceptions in Java , 2015, ArXiv.

[15] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[16] Ke Wang,et al. Dynamic Neural Program Embedding for Program Repair , 2017, ICLR.

[17] Graham Neubig,et al. Learning to Represent Edits , 2018, ICLR.

[18] Loris D'Antoni,et al. Learning Quick Fixes from Code Repositories , 2018, SBES.

[19] Matias Martinez,et al. A Comprehensive Study of Automatic Program Repair on the QuixBugs Benchmark , 2018, 2019 IEEE 1st International Workshop on Intelligent Bug Fixing (IBF).

[20] Le Song,et al. Hoppity: Learning Graph Transformations to Detect and Fix Bugs in Programs , 2020, ICLR.

[21] Alexander M. Rush,et al. OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[22] Hang Li,et al. “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[23] Johannes Bader,et al. Getafix: learning to fix bugs automatically , 2019, Proc. ACM Program. Lang..

[24] Yulei Sui,et al. Flow2Vec: value-flow-based precise code embedding , 2020, Proc. ACM Program. Lang..

[25] Rahul Gupta,et al. Deep Reinforcement Learning for Programming Language Correction , 2018, ArXiv.

[26] Rémi Louf,et al. Transformers : State-ofthe-art Natural Language Processing , 2019 .

[27] Jure Leskovec,et al. Language-Agnostic Representation Learning of Source Code from Structure and Context , 2021, ICLR.

[28] Marcelo de Almeida Maia,et al. Dissection of a bug dataset: Anatomy of 395 patches from Defects4J , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[29] Monperrus Martin. Automatic Software Repair: a Bibliography , 2020 .

[30] Rahul Gupta,et al. DeepFix: Fixing Common C Language Errors by Deep Learning , 2017, AAAI.

[31] Aditya Kanade,et al. Neural Program Repair by Jointly Learning to Localize and Repair , 2019, ICLR.

[32] Zheng Gao,et al. Typilus: neural type hints , 2020, PLDI.

[33] Denys Poshyvanyk,et al. SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair , 2018, IEEE Transactions on Software Engineering.

[34] Daniela Micucci,et al. Automatic Software Repair: A Survey , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[35] Armando Solar-Lezama,et al. sk_p: a neural program corrector for MOOCs , 2016, SPLASH.

[36] Oleksandr Polozov,et al. Generative Code Modeling with Graphs , 2018, ICLR.

[37] Zack Coker,et al. Program transformations to fix C integers , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[38] Rishabh Singh,et al. Global Relational Models of Source Code , 2020, ICLR.

[39] Fan Long,et al. An analysis of patch plausibility and correctness for generate-and-validate patch generation systems , 2015, ISSTA.

[40] Abhik Roychoudhury,et al. Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[41] Marc Brockschmidt,et al. Learning to Represent Programs with Graphs , 2017, ICLR.

[42] Fan Long,et al. Automatic patch generation by learning correct code , 2016, POPL.

[43] Koushik Sen,et al. DeepBugs: a learning approach to name-based bug detection , 2018, Proc. ACM Program. Lang..

[44] Aditya Kanade,et al. Learning and Evaluating Contextual Embedding of Source Code , 2019, ICML.

[45] Percy Liang,et al. Graph-based, Self-Supervised Program Repair from Diagnostic Feedback , 2020, ICML.

[46] Martin Monperrus,et al. Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs , 2018, IEEE Transactions on Software Engineering.