Bayesian Sketch Learning for Program Synthesis

We present a Bayesian statistical approach to the problem of automatic program synthesis. Our synthesizer starts by learning, offline and from an existing corpus, a probabilistic model of real-world programs. During synthesis, it is provided some ambiguous and incomplete evidence about the nature of the programming task that the user wants automated, for example sets of API calls or data types that are relevant for the task. Given this input, the synthesizer infers a posterior distribution over type-safe programs that assigns higher likelihood to programs that, according to the learned model, are more likely to match the evidence. We realize this approach using two key ideas. First, our learning techniques operate not over code but syntactic abstractions, or sketches, of programs. During synthesis, we infer a posterior distribution over sketches, then concretize samples from this distribution into type-safe programs using combinatorial techniques. Second, our statistical model explicitly models the full intent behind a synthesis task as a latent variable. To infer sketches, we first estimate a posterior distribution on the intent, then use samples from this posterior to generate a distribution over possible sketches. We show that our model can be implemented effectively using the new neural architecture of Bayesian encoder-decoders, which can be trained with stochastic gradient descent and yields a simple inference procedure. We implement our ideas in a system, called BAYOU, for the synthesis of API-heavy Java methods. We train BAYOU on a large corpus of Android apps, and find that the trained system can often synthesize complex methods given just a few API method names or data types as evidence. The experiments also justify the design choice of using a latent intent variable and the levels of abstraction at which sketches and evidence are defined.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Rajeev Alur,et al.  Syntax-guided synthesis , 2013, 2013 Formal Methods in Computer-Aided Design.

[3]  Eran Yahav,et al.  Code completion with statistical language models , 2014, PLDI.

[4]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[5]  Charles A. Sutton,et al.  Mining idioms from source code , 2014, SIGSOFT FSE.

[6]  Alexander Aiken,et al.  Stochastic program optimization , 2016, Commun. ACM.

[7]  Armando Solar-Lezama,et al.  The Sketching Approach to Program Synthesis , 2009, APLAS.

[8]  Anh Tuan Nguyen,et al.  A statistical semantic language model for source code , 2013, ESEC/FSE 2013.

[9]  Student,et al.  THE PROBABLE ERROR OF A MEAN , 1908 .

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[12]  Sumit Gulwani,et al.  FlashMeta: a framework for inductive program synthesis , 2015, OOPSLA.

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Sumit Gulwani,et al.  FlashRelate: extracting relational data from semi-structured spreadsheets using examples , 2015, PLDI.

[15]  Pushmeet Kohli,et al.  Adaptive Neural Compilation , 2016, NIPS.

[16]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[17]  C. Cordell Green,et al.  Application of Theorem Proving to Problem Solving , 1969, IJCAI.

[18]  Peter-Michael Osera,et al.  Type-and-example-directed program synthesis , 2015, PLDI.

[19]  Sumit Gulwani,et al.  Automating string processing in spreadsheets using input-output examples , 2011, POPL '11.

[20]  Andreas Krause,et al.  Learning programs from noisy data , 2016, POPL.

[21]  Martin T. Vechev,et al.  PHOG: Probabilistic Model for Code , 2016, ICML.

[22]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[23]  Marcin Andrychowicz,et al.  Neural Random Access Machines , 2015, ERCIM News.

[24]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[25]  Quoc V. Le,et al.  Neural Programmer: Inducing Latent Programs with Gradient Descent , 2015, ICLR.

[26]  Alan W. Biermann,et al.  The Inference of Regular LISP Programs from Examples , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[27]  Anh Tuan Nguyen,et al.  Graph-Based Statistical Language Model for Code , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[28]  Sumit Gulwani,et al.  Template-based program verification and program synthesis , 2013, International Journal on Software Tools for Technology Transfer.

[29]  Pushmeet Kohli,et al.  RobustFill: Neural Program Learning under Noisy I/O , 2017, ICML.

[30]  Sumit Gulwani,et al.  Recursive Program Synthesis , 2013, CAV.

[31]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[32]  Sanjit A. Seshia,et al.  Combinatorial sketching for finite programs , 2006, ASPLOS XII.

[33]  Sumit Gulwani,et al.  Inductive programming meets the real world , 2015, Commun. ACM.

[34]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[35]  Tim Rocktäschel,et al.  Programming with a Differentiable Forth Interpreter , 2016, ICML.

[36]  Pushmeet Kohli,et al.  TerpreT: A Probabilistic Programming Language for Program Induction , 2016, ArXiv.

[37]  Phillip D. Summers,et al.  A Methodology for LISP Program Construction from Examples , 1977, J. ACM.

[38]  Sebastian Nowozin,et al.  DeepCoder: Learning to Write Programs , 2016, ICLR.

[39]  Lihong Li,et al.  Neuro-Symbolic Program Synthesis , 2016, ICLR.

[40]  Isil Dillig,et al.  Synthesizing data structure transformations from input-output examples , 2015, PLDI.

[41]  Sushil Krishna Bajracharya,et al.  Sourcerer: mining and searching internet-scale software repositories , 2008, Data Mining and Knowledge Discovery.

[42]  Liang Lu,et al.  Top-down Tree Long Short-Term Memory Networks , 2015, NAACL.

[43]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[44]  Armando Solar-Lezama,et al.  JSketch: sketching for Java , 2015, ESEC/SIGSOFT FSE.

[45]  Zohar Manna,et al.  Toward automatic program synthesis , 1971, Symposium on Semantics of Algorithmic Languages.

[46]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[47]  GulwaniSumit,et al.  Inductive programming meets the real world , 2015 .

[48]  Andreas Krause,et al.  Predicting Program Properties from "Big Code" , 2015, POPL.