Component-based synthesis of table consolidation and transformation tasks from examples

This paper presents a novel component-based synthesis algorithm that marries the power of type-directed search with lightweight SMT-based deduction and partial evaluation. Given a set of components together with their over-approximate first-order specifications, our method first generates a program sketch over a subset of the components and checks its feasibility using an SMT solver. Since a program sketch typically represents many concrete programs, the use of SMT-based deduction greatly increases the scalability of the algorithm. Once a feasible program sketch is found, our algorithm completes the sketch in a bottom-up fashion, using partial evaluation to further increase the power of deduction for rejecting partially-filled program sketches. We apply the proposed synthesis methodology for automating a large class of data preparation tasks that commonly arise in data science. We have evaluated our synthesis algorithm on dozens of data wrangling and consolidation tasks obtained from on-line forums, and we show that our approach can automatically solve a large class of problems encountered by R users.

[1]  Sanjit A. Seshia,et al.  Combinatorial sketching for finite programs , 2006, ASPLOS XII.

[2]  Rastislav Bodík,et al.  Jungloid mining: helping to navigate the API jungle , 2005, PLDI '05.

[3]  Emanuel Kitzelmann,et al.  A Combined Analytical and Search-Based Approach for the Inductive Synthesis of Functional Programs , 2011, KI - Künstliche Intelligenz.

[4]  Sumit Gulwani,et al.  FlashExtract: a framework for data extraction by examples , 2014, PLDI.

[5]  Ranjit Jhala,et al.  Refinement types for TypeScript , 2016, PLDI.

[6]  David Walker,et al.  Example-directed synthesis: a type-theoretic interpretation , 2016, POPL.

[7]  Armando Solar-Lezama,et al.  Program synthesis from polymorphic refinement types , 2015, PLDI.

[8]  Isil Dillig,et al.  Synthesizing data structure transformations from input-output examples , 2015, PLDI.

[9]  Jeffrey Heer,et al.  Wrangler: interactive visual specification of data transformation scripts , 2011, CHI.

[10]  Theodore Johnson,et al.  Exploratory Data Mining and Data Cleaning , 2003 .

[11]  Sumit Gulwani,et al.  FlashMeta: a framework for inductive program synthesis , 2015, OOPSLA.

[12]  Aws Albarghouthi,et al.  MapReduce program synthesis , 2016, PLDI.

[13]  Sai Zhang,et al.  Automatically synthesizing SQL queries from input-output examples , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[14]  Patrick Maxim Rondon,et al.  Liquid types , 2008, PLDI '08.

[15]  Ruzica Piskac,et al.  Complete completion using types and weights , 2013, PLDI.

[16]  GulwaniSumit,et al.  FlashMeta: a framework for inductive program synthesis , 2015 .

[17]  Sumit Gulwani,et al.  FlashRelate: extracting relational data from semi-structured spreadsheets using examples , 2015, PLDI.

[18]  EigenmannRudolf,et al.  Context-sensitive domain-independent algorithm composition and selection , 2006 .

[19]  Isil Dillig,et al.  Component-based synthesis for complex APIs , 2017, POPL.

[20]  Eran Yahav,et al.  Code completion with statistical language models , 2014, PLDI.

[21]  Rudolf Eigenmann,et al.  Context-sensitive domain-independent algorithm composition and selection , 2006, PLDI '06.

[22]  Sumit Gulwani,et al.  Spreadsheet table transformations from examples , 2011, PLDI '11.

[23]  Sumit Gulwani,et al.  Oracle-guided component-based program synthesis , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[24]  Sumit Gulwani,et al.  Recursive Program Synthesis , 2013, CAV.

[25]  Sumit Gulwani,et al.  Automating string processing in spreadsheets using input-output examples , 2011, POPL '11.

[26]  Sumit Gulwani,et al.  Synthesis of loop-free programs , 2011, PLDI '11.

[27]  Isil Dillig,et al.  Synthesizing transformations on hierarchically structured data , 2016, PLDI.

[28]  Peter-Michael Osera,et al.  Type-and-example-directed program synthesis , 2015, PLDI.

[29]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[30]  Armando Solar-Lezama,et al.  Programming by sketching for bit-streaming programs , 2005, PLDI '05.

[31]  Sanjit A. Seshia,et al.  Sketching stencils , 2007, PLDI '07.

[32]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[33]  Sumit Gulwani,et al.  Test-driven synthesis , 2014, PLDI.

[34]  Jeffrey Heer,et al.  Proactive wrangling: mixed-initiative end-user programming of data transformation scripts , 2011, UIST.