Latent Execution for Neural Program Synthesis Beyond Domain-Specific Languages

Program synthesis from input-output (IO) examples has been a long-standing challenge. While recent works demonstrated limited success on domain-specific languages (DSL), it remains highly challenging to apply them to real-world programming languages, such as C. Due to complicated syntax and token variation, there are three major challenges: (1) unlike many DSLs, programs in languages like C need to compile first and are not executed via interpreters; (2) the program search space grows exponentially when the syntax and semantics of the programming language become more complex; and (3) collecting a large-scale dataset of real-world programs is non-trivial. To address these challenges, we propose LaSynth that learns the latent representation to approximate the execution of partially generated programs, even if they are incomplete in syntax (addressing (1)). The learned execution significantly improves the performance of next token prediction over existing approaches, facilitating search (addressing (2)). Finally, once trained with randomly generated ground-truth programs and their IO pairs, LaSynth can synthesize more concise programs that resemble human-written code. Furthermore, retraining our model with these synthesized programs yields better performance with fewer samples for both Karel and C program synthesis, indicating the promise of leveraging the learned program synthesizer to improve the dataset quality for input-output program synthesis (addressing (3)). When evaluating on whether the program execution outputs match the IO pairs, LaSynth achieves 55.2% accuracy on generating simple C code with tens of tokens including loops and branches, outperforming existing approaches without executors by around 20%.

[1]  Dawn Song,et al.  Synthesize, Execute and Debug: Learning to Repair for Neural Program Synthesis , 2020, NeurIPS.

[2]  Raia Hadsell,et al.  Neural Execution of Graph Algorithms , 2020, ICLR.

[3]  Matthew J. Hausknecht,et al.  Neural Program Meta-Induction , 2017, NIPS.

[4]  Charles Sutton,et al.  Learning to Represent Programs with Property Signatures , 2020, ICLR.

[5]  Lihong Li,et al.  Neuro-Symbolic Program Synthesis , 2016, ICLR.

[6]  Armando Solar-Lezama,et al.  Representing Partial Programs with Blended Abstract Semantics , 2020, ArXiv.

[7]  Illia Polosukhin,et al.  Neural Program Search: Solving Data Processing Tasks from Description and Examples , 2018, ICLR 2018.

[8]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[9]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[10]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[11]  Rishabh Singh,et al.  Neural Program Synthesis with a Differentiable Fixer , 2020, ArXiv.

[12]  Nathanael Fijalkow,et al.  Data Generation for Neural Programming by Example , 2020, AISTATS.

[13]  Pushmeet Kohli,et al.  Analysing Mathematical Reasoning Abilities of Neural Models , 2019, ICLR.

[14]  Kevin Swersky,et al.  Neural Execution Engines: Learning to Execute Subroutines , 2020, NeurIPS.

[15]  Dawn Xiaodong Song,et al.  Improving Neural Program Synthesis with Inferred Execution Traces , 2018, NeurIPS.

[16]  Armando Solar-Lezama,et al.  Write, Execute, Assess: Program Synthesis with a REPL , 2019, NeurIPS.

[17]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[18]  Sumit Gulwani,et al.  Neural-Guided Deductive Search for Real-Time Program Synthesis from Examples , 2018, ICLR.

[19]  Matthew J. Hausknecht,et al.  Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis , 2018, ICLR.

[20]  Sumit Gulwani,et al.  Spreadsheet data manipulation using examples , 2012, CACM.

[21]  Pushmeet Kohli,et al.  RobustFill: Neural Program Learning under Noisy I/O , 2017, ICML.

[22]  Dawn Song,et al.  Execution-Guided Neural Program Synthesis , 2018, ICLR.

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Sebastian Nowozin,et al.  DeepCoder: Learning to Write Programs , 2016, ICLR.

[25]  Xuejun Yang,et al.  Finding and understanding bugs in C compilers , 2011, PLDI '11.

[26]  Armando Solar-Lezama,et al.  Learning to Infer Program Sketches , 2019, ICML.

[27]  Hugo Larochelle,et al.  Learning to Execute Programs with Instruction Pointer Attention Graph Neural Networks , 2020, NeurIPS.

[28]  Alexander Suh,et al.  Creating Synthetic Datasets via Evolution for Neural Program Synthesis , 2020, ArXiv.

[29]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[30]  Guillaume Lample,et al.  Deep Learning for Symbolic Mathematics , 2019, ICLR.

[31]  Richard E. Pattis,et al.  Karel the Robot: A Gentle Introduction to the Art of Programming , 1994 .

[32]  Lior Wolf,et al.  Automatic Program Synthesis of Long Programs with a Learned Garbage Collector , 2018, NeurIPS.

[33]  Hyeonwoo Noh,et al.  Neural Program Synthesis from Diverse Demonstration Videos , 2018, ICML.

[34]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[35]  Rishabh Singh,et al.  BUSTLE: Bottom-up program-Synthesis Through Learning-guided Exploration , 2020, ICLR.

[36]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[37]  Rishabh Singh,et al.  Synthetic Datasets for Neural Program Synthesis , 2019, ICLR.

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).