Type-safe diff for families of datatypes

The UNIX diff program finds the difference between two text files using a classic algorithm for determining the longest common subsequence; however, when working with structured input (e.g. program code), we often want to find the difference between tree-like data (e.g. the abstract syntax tree). In a functional programming language such as Haskell, we can represent this data with a family of (mutually recursive) datatypes. In this paper, we describe a functional, datatype-generic implementation of diff (and the associated program patch). Our approach requires advanced type system features to preserve type safety; therefore, we present the code in Agda, a dependently-typed language well-suited to datatype-generic programming. In order to establish the usefulness of our work, we show that its efficiency can be improved with memoization and that it can also be defined in Haskell.

[1]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[2]  J. W. Hunt,et al.  An Algorithm for Differential File Comparison , 2008 .

[3]  Johan Jeuring,et al.  PolyP—a polytypic programming language extension , 1997, POPL '97.

[4]  Ulf Norell,et al.  Dependently typed programming in Agda , 2009, TLDI '09.

[5]  Luuk Peters Change Detection in XML Trees : a Survey , 2005 .

[6]  Johan Jeuring,et al.  Generic programming with fixed points for mutually recursive datatypes , 2009, ICFP.

[7]  L. Bergroth,et al.  A survey of longest common subsequence algorithms , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[8]  Daniel S. Hirschberg,et al.  The longest common subsequence problem. , 1975 .

[9]  Andres Löh,et al.  Exploring generic Haskell , 2004 .

[10]  Stanley M. Selkow,et al.  The Tree-to-Tree Editing Problem , 1977, Inf. Process. Lett..

[11]  Bruno C. d. S. Oliveira,et al.  Extensible and modular generics for the masses , 2006, Trends in Functional Programming.

[12]  Jeremy Gibbons Datatype-Generic Programming , 2006, SSDGP.

[13]  Jennifer Widom,et al.  Change detection in hierarchically structured information , 1996, SIGMOD '96.

[14]  Wouter Swierstra,et al.  The power of Pi , 2008, ICFP.

[15]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..

[16]  Simon L. Peyton Jones,et al.  Template meta-programming for Haskell , 2002, Haskell '02.

[17]  Douglas Crockford,et al.  The application/json Media Type for JavaScript Object Notation (JSON) , 2006, RFC.

[18]  Johan Jeuring,et al.  Generic Views on Data Types , 2006, MPC.

[19]  Wuu Yang,et al.  Identifying syntactic differences between two programs , 1991, Softw. Pract. Exp..

[20]  Hector Garcia-Molina,et al.  Meaningful change detection in structured data , 1997, SIGMOD '97.

[21]  Philip N. Klein,et al.  Computing the Edit-Distance between Unrooted Ordered Trees , 1998, ESA.

[22]  Peter Dybjer,et al.  Universes for Generic Programs and Proofs in Dependent Type Theory , 2003, Nord. J. Comput..

[23]  Peter W. J. Morris,et al.  Constructing Universes for Generic Programming , 2007 .

[24]  Valiente Feruglio,et al.  On the maximum common embedded subtree problem for ordered trees , 2003 .

[25]  U. Norell,et al.  Towards a practical programming language based on dependent type theory , 2007 .