Dynamic Neural Program Embedding for Program Repair

Neural program embeddings have shown much promise recently for a variety of program analysis tasks, including program synthesis, program repair, fault localization, etc. However, most existing program embeddings are based on syntactic features of programs, such as raw token sequences or abstract syntax trees. Unlike images and text, a program has an unambiguous semantic meaning that can be difficult to capture by only considering its syntax (i.e. syntactically similar pro- grams can exhibit vastly different run-time behavior), which makes syntax-based program embeddings fundamentally limited. This paper proposes a novel semantic program embedding that is learned from program execution traces. Our key insight is that program states expressed as sequential tuples of live variable values not only captures program semantics more precisely, but also offer a more natural fit for Recurrent Neural Networks to model. We evaluate different syntactic and semantic program embeddings on predicting the types of errors that students make in their submissions to an introductory programming class and two exercises on the CodeHunt education platform. Evaluation results show that our new semantic program embedding significantly outperforms the syntactic program embeddings based on token sequences and abstract syntax trees. In addition, we augment a search-based program repair system with the predictions obtained from our se- mantic embedding, and show that search efficiency is also significantly improved.

[1]  Rishabh Singh,et al.  Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks , 2016, ArXiv.

[2]  Rishabh Singh,et al.  Search, align, and repair: data-driven feedback generation for introductory programming exercises , 2017, PLDI.

[3]  T. K. Vintsyuk Speech discrimination by dynamic programming , 1968 .

[4]  Nando de Freitas,et al.  Neural Programmer-Interpreters , 2015, ICLR.

[5]  Thomas Ball,et al.  The concept of dynamic analysis , 1999, ESEC/FSE-7.

[6]  Rahul Gupta,et al.  DeepFix: Fixing Common C Language Errors by Deep Learning , 2017, AAAI.

[7]  Pushmeet Kohli,et al.  RobustFill: Neural Program Learning under Noisy I/O , 2017, ICML.

[8]  Leonidas J. Guibas,et al.  Learning Program Embeddings to Propagate Feedback on Student Code , 2015, ICML.

[9]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[10]  Samuel R. Bowman Can recursive neural tensor networks learn logical reasoning? , 2014, ICLR.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Armando Solar-Lezama,et al.  sk_p: a neural program corrector for MOOCs , 2016, SPLASH.

[13]  Dawn Xiaodong Song,et al.  Making Neural Programming Architectures Generalize via Recursion , 2017, ICLR.

[14]  Wojciech Zaremba,et al.  Learning to Discover Efficient Mathematical Identities , 2014, NIPS.

[15]  Tao Wang,et al.  Convolutional Neural Networks over Tree Structures for Programming Language Processing , 2014, AAAI.