论文信息 - Leveraging Automated Unit Tests for Unsupervised Code Translation

Leveraging Automated Unit Tests for Unsupervised Code Translation

With little to no parallel data available for programming languages, unsupervised methods are well-suited to source code translation. However, the majority of unsupervised machine translation approaches rely on back-translation, a method developed in the context of natural language translation and one that inherently involves training on noisy inputs. Unfortunately, source code is highly sensitive to small changes; a single token can result in compilation failures or erroneous programs, unlike natural languages where small inaccuracies may not change the meaning of a sentence. To address this issue, we propose to leverage an automated unit-testing system to filter out invalid translations, thereby creating a fully tested parallel corpus. We found that fine-tuning an unsupervised model with this filtered data set significantly reduces the noise in the translations so-generated, comfortably outperforming the state-of-the-art for all language pairs studied. In particular, for Java→ Python and Python→ C++ we outperform the best previous methods by more than 16% and 24% respectively, reducing the error rate by more than 35%.

[1] Rico Sennrich,et al. Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[2] Yuanyuan Zhang,et al. Search-based software engineering: Trends, techniques and applications , 2012, CSUR.

[3] Nikolai Tillmann,et al. Transferring an automated test generation tool to practice: from pex to fakes and code digger , 2014, ASE.

[4] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[5] Mark Harman,et al. Deploying Search Based Software Engineering with Sapienz at Facebook , 2018, SSBSE.

[6] Mark Harman,et al. An Analysis and Survey of the Development of Mutation Testing , 2011, IEEE Transactions on Software Engineering.

[7] Yue Wang,et al. Code Completion with Neural Attention and Pointer Networks , 2017, IJCAI.

[8] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[9] Ke Wang,et al. Dynamic Neural Program Embedding for Program Repair , 2017, ICLR.

[10] Kai-Wei Chang,et al. Unified Pre-training for Program Understanding and Generation , 2021, NAACL.

[11] Guillaume Lample,et al. Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[12] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[13] Dawn Xiaodong Song,et al. Tree-to-tree Neural Networks for Program Translation , 2018, NeurIPS.

[14] Mark Harman,et al. AUSTIN: An open source tool for search based software testing of C programs , 2013, Inf. Softw. Technol..

[15] Neel Sundaresan,et al. Unit Test Case Generation with Transformers , 2020, ArXiv.

[16] Denys Poshyvanyk,et al. SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair , 2018, IEEE Transactions on Software Engineering.

[17] Gordon Fraser,et al. Achieving scalable mutation-based generation of whole test suites , 2015, Empirical Software Engineering.

[18] Fang Liu,et al. A Self-Attentional Neural Architecture for Code Completion with Multi-Task learning , 2019, 2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC).

[19] Myra B. Cohen,et al. An orchestrated survey of methodologies for automated software test case generation , 2013, J. Syst. Softw..

[20] Abram Hindle,et al. Using machine translation for converting Python 2 to Python 3 code , 2015, PeerJ Prepr..

[21] Paolo Tonella,et al. Symbolic search-based testing , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[22] Marc Brockschmidt,et al. Learning to Represent Programs with Graphs , 2017, ICLR.

[23] Dawson R. Engler,et al. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[24] Yuanyuan Zhang,et al. Achievements, Open Problems and Challenges for Search Based Software Testing , 2015, 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST).

[25] Miles Osborne,et al. Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[26] Huda Khayrallah,et al. On the Impact of Various Types of Noise on Neural Machine Translation , 2018, NMT@ACL.

[27] Anh Tuan Nguyen,et al. Lexical statistical machine translation for language migration , 2013, ESEC/FSE 2013.

[28] Guillaume Lample,et al. Unsupervised Translation of Programming Languages , 2020, NeurIPS.

[29] Vijayaraghavan Murali,et al. Industry-Scale IR-Based Bug Localization: A Perspective from Facebook , 2020, 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[30] Satish Chandra,et al. Code Prediction by Feeding Trees to Transformers , 2020, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[31] Koushik Sen,et al. Symbolic execution for software testing: three decades later , 2013, CACM.

[32] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[33] Maik Riechert,et al. Fast and Memory-Efficient Neural Code Completion , 2020, 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR).

[34] Martin T. Vechev,et al. Phrase-Based Statistical Translation of Programming Languages , 2014, Onward!.

[35] Yonatan Belinkov,et al. Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[36] Le Song,et al. Hoppity: Learning Graph Transformations to Detect and Fix Bugs in Programs , 2020, ICLR.

[37] Guillaume Lample,et al. DOBF: A Deobfuscation Pre-Training Objective for Programming Languages , 2021, NeurIPS.

[38] Aditya Kanade,et al. Learning and Evaluating Contextual Embedding of Source Code , 2020, ICML.

[39] Percy Liang,et al. Graph-based, Self-Supervised Program Repair from Diagnostic Feedback , 2020, ICML.

[40] Glenford J. Myers,et al. Art of Software Testing , 1979 .

[41] Richard J. Lipton,et al. Hints on Test Data Selection: Help for the Practicing Programmer , 1978, Computer.

[42] Gabriele Bavota,et al. An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation , 2018, ACM Trans. Softw. Eng. Methodol..

[43] Rico Sennrich,et al. A Parallel Corpus of Python Functions and Documentation Strings for Automated Code Documentation and Code Generation , 2017, IJCNLP.

[44] Yves Le Traon,et al. An Empirical Study on Mutation, Statement and Branch Coverage Fault Revelation That Avoids the Unreliable Clean Program Assumption , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[45] Gordon Fraser,et al. EvoSuite: automatic test suite generation for object-oriented software , 2011, ESEC/FSE '11.

[46] David L. Spooner,et al. Automatic Generation of Floating-Point Test Data , 1976, IEEE Transactions on Software Engineering.

[47] Andrew Rice,et al. Learning to Fix Build Errors with Graph2Diff Neural Networks , 2019, ICSE.