Polynomial-Time Proactive Synthesis of Tree-to-String Functions from Examples

Synthesis from examples enables non-expert users to generate programs by specifying examples of their behavior. A domain-specific form of such synthesis has been recently deployed in a widely used spreadsheet software product. In this paper we contribute to foundations of such techniques and present a complete algorithm for synthesis of a class of recursive functions defined by structural recursion over a given algebraic data type definition. The functions we consider map an algebraic data type to a string; they are useful for, e.g., pretty printing and serialization of programs and data. We formalize our problem as learning deterministic sequential top-down tree-to-string transducers with a single state. The first problem we consider is learning a tree-to-string transducer from any set of input/output examples provided by the user. We show that this problem is NP-complete in general, but can be solved in polynomial time under a (practically useful) closure condition that each subtree of a tree in the input/output example set is also part of the input/output examples. Because coming up with relevant input/output examples may be difficult for the user while creating hard constraint problems for the synthesizer, we also study a more automated active learning scenario in which the algorithm chooses the inputs for which the user provides the outputs. Our algorithm asks a worst-case linear number of queries as a function of the size of the algebraic data type definition to determine a unique transducer.

[1]  Joachim Niehren,et al.  Learning Sequential Tree-to-Word Transducers , 2014, LATA.

[2]  Sumit Gulwani,et al.  Synthesizing Number Transformations from Input-Output Examples , 2012, CAV.

[3]  Joost Engelfriet,et al.  Output String Languages of Compositions of Deterministic Macro Tree Transducers , 2002, J. Comput. Syst. Sci..

[4]  Wojciech Plandowski,et al.  Testing Equivalence of Morphisms on Context-Free Languages , 1994, ESA.

[5]  V. S. Guba,et al.  Equivalence of infinite systems of equations in free groups and semigroups to finite subsystems , 1986 .

[6]  Hubert Comon,et al.  Tree automata techniques and applications , 1997 .

[7]  Michael H. Albert,et al.  A Proof of Ehrenfeucht's Conjecture , 1985, Theor. Comput. Sci..

[8]  Jean-Yves Marion,et al.  Learning tree languages from positive examples and membership queries , 2007, Theor. Comput. Sci..

[9]  Rajeev Alur,et al.  Streaming Tree Transducers , 2012, ICALP.

[10]  Helmut Seidl,et al.  Equivalence of Deterministic Top-Down Tree-to-String Transducers is Decidable , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[11]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[12]  Sumit Gulwani,et al.  Recursive Program Synthesis , 2013, CAV.

[13]  Grégoire Laurence,et al.  Normalisation et Apprentissage de Transductions d'Arbres en Mots. (Normalization and Learning of Tree to Words Transductions) , 2014 .

[14]  Wojciech Plandowski Satisfiability of word equations with constants is in PSPACE , 2004, JACM.

[15]  Jurgen J. Vinju,et al.  Towards a universal code formatter through machine learning , 2016, SLE.

[16]  Viktor Kuncak,et al.  Proactive Synthesis of Recursive Tree-to-String Functions from Examples , 2017, ECOOP.

[17]  Kevin Knight,et al.  Training Tree Transducers , 2004, NAACL.

[18]  Sumit Gulwani,et al.  Test-driven synthesis , 2014, PLDI.

[19]  Mikaël Mayer,et al.  Optimal Test Sets for Context-Free Languages , 2016, ArXiv.

[20]  Viktor Kuncak,et al.  An Update on Deductive Synthesis and Repair in the Leon Tool , 2016, SYNT@CAV.

[21]  Joachim Niehren,et al.  Equivalence of Deterministic Nested Word to Word Transducers , 2009, FCT.

[22]  Karel Culik,et al.  Test Sets for Context Free Languages and Algebraic Systems of Equations over a Free Monoid , 1982, Inf. Control..

[23]  Isil Dillig,et al.  Synthesizing data structure transformations from input-output examples , 2015, PLDI.

[24]  Wojciech Plandowski,et al.  The Complexity of the Morphism Equivalence Problem for Context-Free Languages , 1995 .

[25]  Ruzica Piskac,et al.  Interactive Synthesis of Code Snippets , 2011, CAV.

[26]  Pavol Cerný,et al.  Expressiveness of streaming string transducers , 2010, FSTTCS.

[27]  Sumit Gulwani,et al.  FlashMeta: a framework for inductive program synthesis , 2015, OOPSLA.

[28]  Rastislav Bodík,et al.  Programming by manipulation for layout , 2014, UIST.

[29]  David Walker,et al.  Example-directed synthesis: a type-theoretic interpretation , 2016, POPL.

[30]  Armando Solar-Lezama,et al.  Program synthesis from polymorphic refinement types , 2015, PLDI.

[31]  Ruzica Piskac,et al.  Complete completion using types and weights , 2013, PLDI.

[32]  Martin Odersky,et al.  Instant pickles: generating object-oriented pickler combinators for fast and extensible serialization , 2013, OOPSLA.

[33]  Joost Engelfriet,et al.  Deciding equivalence of top-down XML transformations in polynomial time , 2009, J. Comput. Syst. Sci..

[34]  Adrien Boiret,et al.  Deciding Equivalence of Linear Tree-to-Word Transducers in Polynomial Time , 2016, DLT.

[35]  Sumit Gulwani,et al.  User Interaction Models for Disambiguation in Programming by Example , 2015, UIST.

[36]  Butler W. Lampson,et al.  A colorful approach to text processing by example , 2013, UIST.

[37]  Sumit Gulwani,et al.  FlashExtract: a framework for data extraction by examples , 2014, PLDI.

[38]  Hélène Kirchner,et al.  Programming with Equalitiers, Subsorts, Overloading and Parametrization in OBJ , 1992, J. Log. Program..

[39]  Patrik Jansson Functional Polytypic Programming , 2000 .

[40]  Patrick Bahr,et al.  Programming macro tree transducers , 2013, WGP '13.

[41]  Sumit Gulwani Synthesis from Examples , 2016 .

[42]  Kamalika Chaudhuri,et al.  Active Learning from Weak and Strong Labelers , 2015, NIPS.