On the difficulty of benchmarking inductive program synthesis methods

A variety of inductive program synthesis (IPS) techniques have recently been developed, emerging from different areas of computer science. However, these techniques have not been adequately compared on general program synthesis problems. In this paper we compare several methods on problems requiring solution programs to handle various data types, control structures, and numbers of outputs. The problem set also spans levels of abstraction; some would ordinarily be approached using machine code or assembly language, while others would ordinarily be approached using high-level languages. The presented comparisons are focused on the possibility of success; that is, on whether the system can produce a program that passes all tests, for all training and unseen testing inputs. The compared systems are Flash Fill, MagicHaskeller, TerpreT, and two forms of genetic programming. The two genetic programming methods chosen were PushGP and Grammar Guided Genetic Programming. The results suggest that PushGP and, to an extent, TerpreT and Grammar Guided Genetic Programming are more capable of finding solutions than the others, albeit at a higher computational cost. A more salient observation is the difficulty of comparing these methods due to drastically different intended applications, despite the common goal of program synthesis.

[1]  Sumit Gulwani,et al.  Spreadsheet table transformations from examples , 2011, PLDI '11.

[2]  Sumit Gulwani,et al.  Automating string processing in spreadsheets using input-output examples , 2011, POPL '11.

[3]  Lee Spector,et al.  Genetic Programming and Autoconstructive Evolution with the Push Programming Language , 2002, Genetic Programming and Evolvable Machines.

[4]  Emanuel Kitzelmann,et al.  Inductive Programming: A Survey of Program Synthesis Techniques , 2009, AAIP.

[5]  Sumit Gulwani,et al.  Test-driven synthesis , 2014, PLDI.

[6]  Lee Spector,et al.  General Program Synthesis Benchmark Suite , 2015, GECCO.

[7]  Sumit Gulwani,et al.  Synthesizing Number Transformations from Input-Output Examples , 2012, CAV.

[8]  Sumit Gulwani,et al.  Learning Semantic String Transformations from Examples , 2012, Proc. VLDB Endow..

[9]  Susumu Katayama Efficient Exhaustive Generation of Functional Programs Using Monte-Carlo Search with Iterative Deepening , 2008, PRICAI.

[10]  Pushmeet Kohli,et al.  TerpreT: A Probabilistic Programming Language for Program Induction , 2016, ArXiv.

[11]  Pedro M. Domingos,et al.  Programming by Demonstration Using Version Space Algebra , 2003, Machine Learning.

[12]  E. Kitzelmann Two New Operators for IGOR2 to Increase Synthesis Efficiency , 2011, AAIP.

[13]  Lee Spector,et al.  Lexicase Selection for Program Synthesis: A Diversity Analysis , 2016 .

[14]  Lee Spector,et al.  Solving Uncompromising Problems With Lexicase Selection , 2015, IEEE Transactions on Evolutionary Computation.

[15]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[16]  Susumu Katayama,et al.  Recent Improvements of MagicHaskeller , 2009, AAIP.

[17]  Thomas Helmuth Detailed Problem Descriptions for General Program Synthesis Benchmark Suite Technical Report UM-CS-2015-006 , 2015 .

[18]  Susumu Katayama Systematic search for lambda expressions , 2005, Trends in Functional Programming.

[19]  David Fagan,et al.  A Grammar Design Pattern for Arbitrary Program Synthesis Problems in Genetic Programming , 2017, EuroGP.

[20]  Sumit Gulwani,et al.  Spreadsheet data manipulation using examples , 2012, CACM.

[21]  Maarten Keijzer,et al.  The Push3 execution stack and the evolution of control , 2005, GECCO '05.