Example-driven program synthesis for end-user programming: technical perspective

underStandInG hOw tO get a computer to perform a given task is a central question in computer science. For many years the standard answer has been to use a programming language to write a program the computer will then execute to accomplish the task. An intriguing alternative, however, is to provide the computer with examples of inputs and corresponding outputs, then have the computer automatically generalize the examples to produce a program that performs the desired task for all inputs. Researchers have worked on this approach for decades, first in the LISP community, 4 then later in the inductive logic programming community, 1–3 to name two prominent examples. Given the relatively modest size of the programs the resulting techniques are able to produce, the field has evolved to focus largely on data mining, concept learning , knowledge discovery, and other applications (as opposed to mainstream software development). The following paper focuses on an important emerging area—end user programming. As information technology has come to permeate our society, broader classes of users have developed the need for more sophisticated data manipulation and processing. While users in the past were satisfied with relatively simple interactive models of computation such as spreadsheets and other business applications, current users are now looking to automate custom data manipulations such as reformatting, reorganizing, simple calculations, or data cleaning. While such users may have a good command of the interactive functionality of their application, they often lack the expertise, time, or inclination to develop software specifically for their task. The authors illustrate how to apply example-driven program synthesis to automate spreadsheet computations. This work, therefore, is of interest to the millions of people worldwide who use spreadsheets. The methodology consists of four basic steps: ˲ Domain-specific language: Develop a domain-specific language capable of representing the desired set of computations. ˲ Data structure: Develop a data structure that can efficiently represent the large set of programs that are consistent with a given input/output example. ˲ Learn and intersect: Generate data structures for representing the programs consistent with each individual input/output example, then intersect the data structures to obtain a representation of the programs consistent with all examples. ˲ Rank the resulting set of programs , preferring more general programs over less general programs. Users can then view the results of the ranked programs on different inputs to guide the program selection process. This approach effectively addresses many of …