Maximal multi-layer specification synthesis

There has been a significant interest in applying programming-by-example to automate repetitive and tedious tasks. However, due to the incomplete nature of input-output examples, a synthesizer may generate programs that pass the examples but do not match the user intent. In this paper, we propose MARS, a novel synthesis framework that takes as input a multi-layer specification composed by input-output examples, textual description, and partial code snippets that capture the user intent. To accurately capture the user intent from the noisy and ambiguous description, we propose a hybrid model that combines the power of an LSTM-based sequence-to-sequence model with the apriori algorithm for mining association rules through unsupervised learning. We reduce the problem of solving a multi-layer specification synthesis to a Max-SMT problem, where hard constraints encode well-typed concrete programs and soft constraints encode the user intent learned by the hybrid model. We instantiate our hybrid model to the data wrangling domain and compare its performance against Morpheus, a state-of-the-art synthesizer for data wrangling tasks. Our experiments demonstrate that our approach outperforms MORPHEUS in terms of running time and solved benchmarks. For challenging benchmarks, our approach can suggest candidates with rankings that are an order of magnitude better than MORPHEUS which leads to running times that are 15x faster than MORPHEUS.

[1]  Martín Abadi,et al.  Learning a Natural Language Interface with Neural Programmer , 2016, ICLR.

[2]  Sai Zhang,et al.  Automatically synthesizing SQL queries from input-output examples , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[3]  Isil Dillig,et al.  Component-based synthesis of table consolidation and transformation tasks from examples , 2016, PLDI.

[4]  Isil Dillig,et al.  Program synthesis using conflict-driven learning , 2017, PLDI.

[5]  Theodore Johnson,et al.  Exploratory Data Mining and Data Cleaning , 2003 .

[6]  Sumit Gulwani,et al.  SmartSynth: synthesizing smartphone automation scripts from natural language , 2013, MobiSys '13.

[7]  Eran Yahav,et al.  Programming Not Only by Example , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[8]  Sumit Gulwani,et al.  Automating string processing in spreadsheets using input-output examples , 2011, POPL '11.

[9]  Quoc V. Le,et al.  Neural Programmer: Inducing Latent Programs with Gradient Descent , 2015, ICLR.

[10]  Sumit Gulwani,et al.  FlashRelate: extracting relational data from semi-structured spreadsheets using examples , 2015, PLDI.

[11]  Butler W. Lampson,et al.  A Machine Learning Framework for Programming by Example , 2013, ICML.

[12]  Andreas Krause,et al.  Learning programs from noisy data , 2016, POPL.

[13]  Isil Dillig,et al.  Synthesizing transformations on hierarchically structured data , 2016, PLDI.

[14]  Eran Yahav,et al.  Code completion with statistical language models , 2014, PLDI.

[15]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[16]  Sumit Gulwani,et al.  Compositional Program Synthesis from Natural Language and Examples , 2015, IJCAI.

[17]  Sumit Gulwani,et al.  Program Synthesis Using Natural Language , 2015, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[18]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[19]  Raymond J. Mooney,et al.  Language to Code: Learning Semantic Parsers for If-This-Then-That Recipes , 2015, ACL.

[20]  Alvin Cheung,et al.  Synthesizing highly expressive SQL queries from input-output examples , 2017, PLDI.

[21]  NAVID YAGHMAZADEH,et al.  SQLizer: query synthesis from natural language , 2017, Proc. ACM Program. Lang..

[22]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[23]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[24]  Rastislav Bodík,et al.  Programming with angelic nondeterminism , 2010, POPL '10.

[25]  Isil Dillig,et al.  Synthesizing data structure transformations from input-output examples , 2015, PLDI.

[26]  Emina Torlak,et al.  Toward tool support for interactive synthesis , 2015, Onward!.

[27]  Nikolaj Bjørner,et al.  νZ - An Optimizing SMT Solver , 2015, TACAS.