Editing by example

An editing by example system is an automatic program synthesis facility embedded in a text editor that can be used to solve repetitive text editing problems. The user provides the editor with a few examples of a text transformation. The system analyzes the examples and generalizes them into a program that can perform the transformation to the rest of the user's text. This paper presents the design, analysis, and implementation of a practical editing by example system. In particular, we study the problem of synthesizing a text processing program that generalizes the transformation implicitly described by a small number of input/output examples. We define a class of text processing programs called gap programs, characterize their computational power, study the problems associated with synthesizing them from examples, and derive an efficient heuristic that provably synthesizes a gap program from examples of its input/output behavior. We evaluate how well the gap program synthesis heuristic performs on the text encountered in practice. This evaluation inspires the development of several modifications to the gap program synthesis heuristic that act both to improve the quality of the hypotheses proposed by the system and to reduce the number of examples required to converge to a target program. The result is a gap program synthesis heuristic that can usually synthesize a target gap program from two or three input examples and a single output example. The editing by example system derived from this analysis has been embedded in a production text editor. The system is presented as a group of editor commands that use the standard interfaces of the editor to collect examples, show synthesized programs, and run them. By developing an editing by example system that solves a useful class of text processing problems, we demonstrate that program synthesis is feasible in the domain of text editing.

[1]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[2]  Philip Wadler,et al.  Experience with an applicative string processing language , 1980, POPL '80.

[3]  R. M. Wharton Grammar Enumeration and Inference , 1977, Inf. Control..

[4]  John R. Ellis,et al.  Tools: An environment for time‐shared computing and programming , 1983, Softw. Pract. Exp..

[5]  Dana Angluin,et al.  A Note on the Number of Queries Needed to Identify Regular Languages , 1981, Inf. Control..

[6]  David Canfield Smith,et al.  Pygmalion: A COMPUTER PROGRAM TO Model and Stimulate Creative Thought , 1975 .

[7]  Kathleen Knobe,et al.  A Method for Inferring Context-free Grammars , 1976, Inf. Control..

[8]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[9]  Takeshi Shinohara,et al.  Polynomial Time Inference of Extended Regular Pattern Languages , 1983, RIMS Symposium on Software Science and Engineering.

[10]  John W. Carr,et al.  A Solution of the Syntactical Induction-Inference Problem for Regular Languages , 1978, Comput. Lang..

[11]  Warren Teitelman,et al.  The cedar programming environment: a midterm report and examination , 1984 .

[12]  Jonathan Rees,et al.  T: a dialect of Lisp or LAMBDA: The ultimate software tool , 1982, LFP '82.

[13]  J. Feldman,et al.  A SURVEY OF RESULTS IN GRAMMATICAL INFERENCE , 1972 .

[14]  James Jay Horning,et al.  A study of grammatical inference , 1969 .

[15]  E. Mark Gold,et al.  Complexity of Automaton Identification from Given Data , 1978, Inf. Control..

[16]  Henry Lieberman,et al.  A session with Tinker: Interleaving program testing with program design , 1980, LISP Conference.

[17]  David Maier,et al.  The Complexity of Some Problems on Subsequences and Supersequences , 1978, JACM.

[18]  Azriel Rosenfeld,et al.  Grammatical inference by hill climbing , 1976, Inf. Sci..

[19]  Thomas G. Szymanski,et al.  A fast algorithm for computing longest common subsequences , 1977, CACM.

[20]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[21]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[22]  Moshé M. Zloof Query-by-Example: A Data Base Language , 1977, IBM Syst. J..

[23]  Dana Angluin,et al.  Finding Patterns Common to a Set of Strings , 1980, J. Comput. Syst. Sci..

[24]  DANA ANGLUIN,et al.  On the Complexity of Minimum Inference of Regular Sets , 1978, Inf. Control..

[25]  Moshé M. Zloof Office-by-Example: A Business Language that Unifies Data and Word Processing and Electronic Mail , 1982, IBM Syst. J..

[26]  Jerome A. Feldman,et al.  On the Synthesis of Finite-State Machines from Samples of Their Behavior , 1972, IEEE Transactions on Computers.

[27]  Andries van Dam,et al.  Interactive Editing Systems: Part II , 1982, CSUR.

[28]  Brian K. Reid,et al.  A high-level approach to computer document formatting , 1980, POPL '80.

[29]  R. Stallman EMACS the extensible, customizable self-documenting display editor , 1981, SIGPLAN SIGOA Symposium on Text Manipulation.

[30]  Andries van Dam,et al.  Interactive Editing Systems: Part I , 1982, CSUR.

[31]  Alfred V. Aho,et al.  Awk — a pattern scanning and processing language , 1979, Softw. Pract. Exp..

[32]  Richard M. Stallman EMACS the extensible, customizable self-documenting display editor , 1981 .

[33]  Robert Nix,et al.  Editing by example , 1985, POPL '84.

[34]  Gael A. Curry,et al.  Programming by abstract demonstration. , 1978 .

[35]  Steven R. Wood Z - the 95% program editor , 1981, SIGPLAN SIGOA Symposium on Text Manipulation.

[36]  Edgar T. Irons,et al.  A CRT editing system , 1972, CACM.

[37]  S. Reder,et al.  Grammatical complexity and inference , 1969 .

[38]  Giovanni Guida,et al.  Noncounting Context-Free Languages , 1978, JACM.

[39]  Stefano Crespi-Reghizzi,et al.  The use of grammatical inference for designing programming languages , 1973, Commun. ACM.