Leveraging Automated Unit Tests for Unsupervised Code Translation

With little to no parallel data available for programming languages, unsupervised methods are well-suited to source code translation. However, the majority of unsupervised machine translation approaches rely on back-translation, a method developed in the context of natural language translation and one that inherently involves training on noisy inputs. Unfortunately, source code is highly sensitive to small changes; a single token can result in compilation failures or erroneous programs, unlike natural languages where small inaccuracies may not change the meaning of a sentence. To address this issue, we propose to leverage an automated unit-testing system to filter out invalid translations, thereby creating a fully tested parallel corpus. We found that fine-tuning an unsupervised model with this filtered data set significantly reduces the noise in the translations so-generated, comfortably outperforming the state-of-the-art for all language pairs studied. In particular, for Java→ Python and Python→ C++ we outperform the best previous methods by more than 16% and 24% respectively, reducing the error rate by more than 35%.

[1]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[2]  Yuanyuan Zhang,et al.  Search-based software engineering: Trends, techniques and applications , 2012, CSUR.

[3]  Nikolai Tillmann,et al.  Transferring an automated test generation tool to practice: from pex to fakes and code digger , 2014, ASE.

[4]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[5]  Mark Harman,et al.  Deploying Search Based Software Engineering with Sapienz at Facebook , 2018, SSBSE.

[6]  Mark Harman,et al.  An Analysis and Survey of the Development of Mutation Testing , 2011, IEEE Transactions on Software Engineering.

[7]  Yue Wang,et al.  Code Completion with Neural Attention and Pointer Networks , 2017, IJCAI.

[8]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[9]  Ke Wang,et al.  Dynamic Neural Program Embedding for Program Repair , 2017, ICLR.

[10]  Kai-Wei Chang,et al.  Unified Pre-training for Program Understanding and Generation , 2021, NAACL.

[11]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[12]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[13]  Dawn Xiaodong Song,et al.  Tree-to-tree Neural Networks for Program Translation , 2018, NeurIPS.

[14]  Mark Harman,et al.  AUSTIN: An open source tool for search based software testing of C programs , 2013, Inf. Softw. Technol..

[15]  Neel Sundaresan,et al.  Unit Test Case Generation with Transformers , 2020, ArXiv.

[16]  Denys Poshyvanyk,et al.  SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair , 2018, IEEE Transactions on Software Engineering.

[17]  Gordon Fraser,et al.  Achieving scalable mutation-based generation of whole test suites , 2015, Empirical Software Engineering.

[18]  Fang Liu,et al.  A Self-Attentional Neural Architecture for Code Completion with Multi-Task learning , 2019, 2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC).

[19]  Myra B. Cohen,et al.  An orchestrated survey of methodologies for automated software test case generation , 2013, J. Syst. Softw..

[20]  Abram Hindle,et al.  Using machine translation for converting Python 2 to Python 3 code , 2015, PeerJ Prepr..

[21]  Paolo Tonella,et al.  Symbolic search-based testing , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[22]  Marc Brockschmidt,et al.  Learning to Represent Programs with Graphs , 2017, ICLR.

[23]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[24]  Yuanyuan Zhang,et al.  Achievements, Open Problems and Challenges for Search Based Software Testing , 2015, 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST).

[25]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[26]  Huda Khayrallah,et al.  On the Impact of Various Types of Noise on Neural Machine Translation , 2018, NMT@ACL.

[27]  Anh Tuan Nguyen,et al.  Lexical statistical machine translation for language migration , 2013, ESEC/FSE 2013.

[28]  Guillaume Lample,et al.  Unsupervised Translation of Programming Languages , 2020, NeurIPS.

[29]  Vijayaraghavan Murali,et al.  Industry-Scale IR-Based Bug Localization: A Perspective from Facebook , 2020, 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[30]  Satish Chandra,et al.  Code Prediction by Feeding Trees to Transformers , 2020, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[31]  Koushik Sen,et al.  Symbolic execution for software testing: three decades later , 2013, CACM.

[32]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[33]  Maik Riechert,et al.  Fast and Memory-Efficient Neural Code Completion , 2020, 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR).

[34]  Martin T. Vechev,et al.  Phrase-Based Statistical Translation of Programming Languages , 2014, Onward!.

[35]  Yonatan Belinkov,et al.  Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[36]  Le Song,et al.  Hoppity: Learning Graph Transformations to Detect and Fix Bugs in Programs , 2020, ICLR.

[37]  Guillaume Lample,et al.  DOBF: A Deobfuscation Pre-Training Objective for Programming Languages , 2021, NeurIPS.

[38]  Aditya Kanade,et al.  Learning and Evaluating Contextual Embedding of Source Code , 2020, ICML.

[39]  Percy Liang,et al.  Graph-based, Self-Supervised Program Repair from Diagnostic Feedback , 2020, ICML.

[40]  Glenford J. Myers,et al.  Art of Software Testing , 1979 .

[41]  Richard J. Lipton,et al.  Hints on Test Data Selection: Help for the Practicing Programmer , 1978, Computer.

[42]  Gabriele Bavota,et al.  An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation , 2018, ACM Trans. Softw. Eng. Methodol..

[43]  Rico Sennrich,et al.  A Parallel Corpus of Python Functions and Documentation Strings for Automated Code Documentation and Code Generation , 2017, IJCNLP.

[44]  Yves Le Traon,et al.  An Empirical Study on Mutation, Statement and Branch Coverage Fault Revelation That Avoids the Unreliable Clean Program Assumption , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[45]  Gordon Fraser,et al.  EvoSuite: automatic test suite generation for object-oriented software , 2011, ESEC/FSE '11.

[46]  David L. Spooner,et al.  Automatic Generation of Floating-Point Test Data , 1976, IEEE Transactions on Software Engineering.

[47]  Andrew Rice,et al.  Learning to Fix Build Errors with Graph2Diff Neural Networks , 2019, ICSE.