Proactive Synthesis of Recursive Tree-to-String Functions from Examples

Synthesis from examples enables non-expert users to generate programs by specifying examples of their behavior. A domain-specific form of such synthesis has been recently deployed in a widely used spreadsheet software product. In this paper we contribute to foundations of such techniques and present a complete algorithm for synthesis of a class of recursive functions defined by structural recursion over a given algebraic data type definition. The functions we consider map an algebraic data type to a string; they are useful for, e.g., pretty printing and serialization of programs and data. We formalize our problem as learning deterministic sequential top-down tree-to-string transducers with a single state (1STS). The first problem we consider is learning a tree-to-string transducer from any set of input/output examples provided by the user. We show that, given a set of input/output examples, checking whether there exists a 1STS consistent with these examples is NP-complete in general. In contrast, the problem can be solved in polynomial time under a (practically useful) closure condition that each subtree of a tree in the input/output example set is also part of the input/output examples. Because coming up with relevant input/output examples may be difficult for the user while creating hard constraint problems for the synthesizer, we also study a more automated active learning scenario in which the algorithm chooses the inputs for which the user provides the outputs. Our algorithm asks a worst-case linear number of queries as a function of the size of the algebraic data type definition to determine a unique transducer. To construct our algorithms we present two new results on formal languages. First, we define a class of word equations, called sequential word equations, for which we prove that satisfiability can be solved in deterministic polynomial time. This is in contrast to the general word equations for which the best known complexity upper bound is in linear space. Second, we close a long-standing open problem about the asymptotic size of test sets for context-free languages. A test set of a language of words L is a subset T of L such that any two word homomorphisms equivalent on T are also equivalent on L. We prove that it is possible to build test sets of cubic size for context-free languages, matching for the first time the lower bound found 20 years ago.

[1]  Sumit Gulwani,et al.  Recursive Program Synthesis , 2013, CAV.

[2]  Karel Culik,et al.  Test Sets for Context Free Languages and Algebraic Systems of Equations over a Free Monoid , 1982, Inf. Control..

[3]  Grégoire Laurence,et al.  Normalisation et Apprentissage de Transductions d'Arbres en Mots. (Normalization and Learning of Tree to Words Transductions) , 2014 .

[4]  Hélène Kirchner,et al.  Programming with Equalitiers, Subsorts, Overloading and Parametrization in OBJ , 1992, J. Log. Program..

[5]  Patrik Jansson Functional Polytypic Programming , 2000 .

[6]  Adrien Boiret,et al.  Deciding Equivalence of Linear Tree-to-Word Transducers in Polynomial Time , 2016, DLT.

[7]  Tero Harju,et al.  Combinatorics on Words , 2004 .

[8]  Sumit Gulwani,et al.  Synthesizing Number Transformations from Input-Output Examples , 2012, CAV.

[9]  Viktor Kuncak,et al.  Proactive Synthesis of Recursive Tree-to-String Functions from Examples (Artifact) , 2017, Dagstuhl Artifacts Ser..

[10]  Kevin Knight,et al.  Training Tree Transducers , 2004, NAACL.

[11]  Helmut Seidl,et al.  Equivalence of Deterministic Top-Down Tree-to-String Transducers is Decidable , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[12]  Joost Engelfriet,et al.  Deciding equivalence of top-down XML transformations in polynomial time , 2009, J. Comput. Syst. Sci..

[13]  Sumit Gulwani,et al.  Test-driven synthesis , 2014, PLDI.

[14]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[15]  Ruzica Piskac,et al.  Complete completion using types and weights , 2013, PLDI.

[16]  Wojciech Plandowski,et al.  Satisfiability of word equations with constants is in PSPACE , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[17]  Patrick Bahr,et al.  Programming macro tree transducers , 2013, WGP '13.

[18]  Ruzica Piskac,et al.  Interactive Synthesis of Code Snippets , 2011, CAV.

[19]  Isil Dillig,et al.  Synthesizing data structure transformations from input-output examples , 2015, PLDI.

[20]  Joachim Niehren,et al.  Equivalence of Deterministic Nested Word to Word Transducers , 2009, FCT.

[21]  Wojciech Plandowski,et al.  The Complexity of the Morphism Equivalence Problem for Context-Free Languages , 1995 .

[22]  Pavol Cerný,et al.  Expressiveness of streaming string transducers , 2010, FSTTCS.

[23]  Mikaël Mayer,et al.  Optimal Test Sets for Context-Free Languages , 2016, ArXiv.

[24]  David Walker,et al.  Example-directed synthesis: a type-theoretic interpretation , 2016, POPL.

[25]  Armando Solar-Lezama,et al.  Program synthesis from polymorphic refinement types , 2015, PLDI.

[26]  Martin Odersky,et al.  Instant pickles: generating object-oriented pickler combinators for fast and extensible serialization , 2013, OOPSLA.

[27]  Sumit Gulwani,et al.  FlashNormalize: Programming by Examples for Text Normalization , 2015, IJCAI.

[28]  Joost Engelfriet,et al.  Output String Languages of Compositions of Deterministic Macro Tree Transducers , 2002, J. Comput. Syst. Sci..

[29]  Wojciech Plandowski,et al.  Testing Equivalence of Morphisms on Context-Free Languages , 1994, ESA.

[30]  Michael H. Albert,et al.  A Proof of Ehrenfeucht's Conjecture , 1985, Theor. Comput. Sci..

[31]  Sumit Gulwani Synthesis from Examples , 2016 .

[32]  Kamalika Chaudhuri,et al.  Active Learning from Weak and Strong Labelers , 2015, NIPS.

[33]  Joachim Niehren,et al.  Learning Sequential Tree-to-Word Transducers , 2014, LATA.

[34]  Viktor Kuncak,et al.  Polynomial-Time Proactive Synthesis of Tree-to-String Functions from Examples , 2017, ArXiv.

[35]  Viktor Kuncak,et al.  An Update on Deductive Synthesis and Repair in the Leon Tool , 2016, SYNT@CAV.

[36]  Sumit Gulwani,et al.  FlashExtract: a framework for data extraction by examples , 2014, PLDI.

[37]  V. S. Guba,et al.  Equivalence of infinite systems of equations in free groups and semigroups to finite subsystems , 1986 .

[38]  Artur Jez Word equations in linear space , 2017, ArXiv.

[39]  Jean-Yves Marion,et al.  Learning tree languages from positive examples and membership queries , 2007, Theor. Comput. Sci..

[40]  Adrien Boiret,et al.  Normal Form on Linear Tree-to-Word Transducers , 2016, LATA.

[41]  Jurgen J. Vinju,et al.  Towards a universal code formatter through machine learning , 2016, SLE.

[42]  Rastislav Bodík,et al.  Programming by manipulation for layout , 2014, UIST.

[43]  Butler W. Lampson,et al.  A colorful approach to text processing by example , 2013, UIST.

[44]  Rajeev Alur,et al.  Streaming Tree Transducers , 2012, ICALP.

[45]  Sumit Gulwani,et al.  User Interaction Models for Disambiguation in Programming by Example , 2015, UIST.