Identifying syntactic differences between two programs

Programmers frequently face the need to identify the differences between two programs, usually two different versions of a program. Text‐based tools such as the UNIXr̀ utility diff often produce unsatisfactory comparisons because they cannot accurately pinpoint the differences and because they sometimes produce irrelevant differences. Since programs have a rigid syntactic structure as described by the grammar of the programming language in which they are written, we develop a comparison algorithm that exploits knowledge of the grammar. The algorithm, which is based on a dynamic programming scheme, can point out the differences between two programs more accurately than previous text comparison tools. Finally, the two programs are pretty‐printed ‘synchronously’ with the differences highlighted so that the differences are easily identified.

[1]  D Sankoff,et al.  Matching sequences under deletion-insertion constraints. , 1972, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[3]  Peter H. Sellers,et al.  An Algorithm for the Distance Between Two Finite Sequences , 1974, J. Comb. Theory, Ser. A.

[4]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[5]  Robert A. Wagner,et al.  An Extension of the String-to-String Correction Problem , 1975, JACM.

[6]  Stanley M. Selkow,et al.  The Tree-to-Tree Editing Problem , 1977, Inf. Process. Lett..

[7]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[8]  Paul Heckel,et al.  A technique for isolating differences between files , 1978, CACM.

[9]  Shin-Yee Lu A Tree-to-Tree Distance and Its Application to Cluster Analysis , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[11]  J. Welsh,et al.  Formatted programming languages , 1981, Softw. Pract. Exp..

[12]  Christoph M. Hoffmann,et al.  Pattern Matching in Trees , 1982, JACM.

[13]  Lisa F. Rubin Syntax-Directed Pretty Printing—A First Step Towards a Syntax-Directed Editor , 1983, IEEE Transactions on Software Engineering.

[14]  Prabhaker Mateti A specification schema for indenting programs , 1983, Softw. Pract. Exp..

[15]  Vincent J. Kruskal Managing Multi-Version Programs with an Editor , 1984, IBM J. Res. Dev..

[16]  Walter F. Tichy,et al.  The string-to-string correction problem with block moves , 1984, TOCS.

[17]  Guy L. Steele,et al.  C, a reference manual , 1984 .

[18]  David V. Moffat Some concerns aboul Modula-2 , 1984, SIGP.

[19]  Thomas W. Reps,et al.  The synthesizer generator , 1984, SDE 1.

[20]  Eugene W. Myers,et al.  A file comparison program , 1985, Softw. Pract. Exp..

[21]  MARK WOODMAN,et al.  Formatted syntaxes and modula‐2 , 1986, Softw. Pract. Exp..

[22]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[23]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..

[24]  Raffaele Giancarlo,et al.  Speeding up Dynamic Programming with Applications to Molecular Biology , 1989, Theor. Comput. Sci..

[25]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[26]  Susan Horwitz,et al.  Identifying the semantic and textual differences between two versions of a program , 1990, PLDI '90.

[27]  Vance E. Waddle,et al.  Production trees: a compact representation of parsed programs , 1990, TOPL.