Run, xtatic, run: efficient implementation of an object-oriented language with regular pattern matching

Schema languages such as DTD, XML Schema, and Relax NG have been steadily growing in importance in the XML community. A schema language provides a mechanism for defining the type of XML documents; i.e., the set of constraints that specify the structure of XML documents that are acceptable as data for a certain programming task. A number of recent language designs---many of them descended from the XDUCE language of Hosoya, Pierce, and Vouillon---have showed how such schemas can be used statically for type-checking XML processing code and dynamically for evaluation of XML structures. The technical foundation of such languages is the notion of regular types, a mild generalization of nondeterministic top-down tree automata, which correspond to a core of most popular schema notations, and the notion of regular patterns---regular types decorated with variable binders---a powerful and convenient primitive for dynamic inspection of XML values. This dissertation is concerned with one of XDUCE's descendants, XTATIC. The goal of the XTATIC project is to bring the regular type and regular pattern technologies to a wide audience by integrating them with a mainstream object-oriented language. My research focuses on an efficient implementation of XTATIC including a compiler that generates fast and compact target programs and a run-time system that is designed to support efficient manipulation of XML fragments. Many techniques described here are applicable not only to XTATIC but also to other XDUCE derivatives such as CDUCE and Cω.

[1]  Benjamin C. Pierce,et al.  Regular expression types for XML , 2000, TOPL.

[2]  Benjamin C. Pierce,et al.  XDuce: A statically typed XML processing language , 2003, TOIT.

[3]  Giuseppe Castagna,et al.  Parametric polymorphism for XML , 2005, POPL '05.

[4]  Philip Wadler,et al.  Featherweight Java: a minimal core calculus for Java and GJ , 1999, OOPSLA '99.

[5]  Michael Y. Levin Compiling regular patterns , 2003, ACM SIGPLAN Notices.

[6]  Alain Frisch,et al.  Théorie, conception et réalisation d'un langage de programmation adapté à XML , 2004 .

[7]  Haim Kaplan,et al.  Simple Confluently Persistent Catenable Lists , 2000, SIAM J. Comput..

[8]  Philip Wadler The Concatenate Vanishes , 1987 .

[9]  Andreas Krall,et al.  Implementation techniques for Prolog , 1994, WLP.

[10]  Luc Maranget,et al.  Optimizing pattern matching , 2001, ICFP '01.

[11]  Peter T. Wood Minimising Simple XPath Expressions , 2001, WebDB.

[12]  Giuseppe Castagna,et al.  CDuce: an XML-centric general-purpose language , 2003, ACM SIGPLAN Notices.

[13]  Benjamin C. Pierce,et al.  Type-Based Optimization for Regular Patterns , 2005, DBPL.

[14]  Jean-Yves Vion-Dury,et al.  Logic-based XPath optimization , 2004, DocEng '04.

[15]  Stijn Vansummeren,et al.  Type inference for unique pattern matching , 2006, TOPL.

[16]  Giuseppe Castagna,et al.  Semantic subtyping , 2002, Proceedings 17th Annual IEEE Symposium on Logic in Computer Science.

[17]  Luca Cardelli,et al.  Greedy Regular Expression Matching , 2004, ICALP.

[18]  Georg Gottlob,et al.  XPath query evaluation: improving time and space efficiency , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[19]  Benjamin C. Pierce,et al.  XDuce: A Typed XML Processing Language (Preliminary Report) , 2000, WebDB.

[20]  Martin Kempa On XML Objects , 2002 .

[21]  Derick Wood,et al.  Regular Tree Languages Over Non-Ranked Alphabets , 1998 .

[22]  Helmut Seidl,et al.  Locating Matches of Tree Patterns in Forests , 1998, FSTTCS.

[23]  Chris Okasaki,et al.  Amortization, lazy evaluation, and persistence: lists with catenation via lazy linking , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[24]  Christian Kirkegaard,et al.  Static analysis of XML transformations in Java , 2003, IEEE Transactions on Software Engineering.

[25]  Benjamin C. Pierce,et al.  XML Goes Native: Run-Time Representations for Xtatic , 2005, CC.

[26]  Steven Skiena,et al.  Principles and practice of unification factoring , 1996, TOPL.

[27]  Vivek Sarkar,et al.  XJ: facilitating XML processing in Java , 2005, WWW '05.

[28]  M. R. Sleep,et al.  A short note concerning lazy reduction rules for append , 1982 .

[29]  Hubert Comon,et al.  Tree automata techniques and applications , 1997 .

[30]  Georg Gottlob,et al.  Efficient Algorithms for Processing XPath Queries , 2002, VLDB.

[31]  Makoto Murata,et al.  Hedge automata: a formal model for xml schemata , 1999 .

[32]  Haim Kaplan,et al.  Persistent lists with catenation via recursive slow-down , 1995, STOC '95.

[33]  Benjamin C. Pierce,et al.  Regular Object Types , 2003, ECOOP.

[34]  Erik,et al.  Programming with Circles , Triangles and Rectangles , 2003 .

[35]  Robert M. Keller Divide and concer: Data structuring in applicative multiprocessing systems , 1980, LISP Conference.

[36]  Leon Sterling,et al.  The Art of Prolog , 1987, IEEE Expert.

[37]  Wolfram Schulte,et al.  Unifying Tables, Objects and Documents , 2003 .

[38]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[39]  Benjamin C. Pierce,et al.  Paths Into Patterns , 2004 .

[40]  Lennart Augustsson,et al.  Compiling Pattern Matching , 1985, FPCA.

[41]  Michael I. Schwartzbach,et al.  The Design Space of Type Checkers for XML Transformation Languages , 2004 .

[42]  Haim Kaplan,et al.  Purely functional, real-time deques with catenation , 1999, JACM.

[43]  Andrew W. Appel,et al.  Shrinking lambda Expressions in Linear Time , 1997, J. Funct. Program..

[44]  Benjamin C. Pierce,et al.  Regular expression pattern matching for XML , 2003, J. Funct. Program..

[45]  Makoto Murata,et al.  Boolean operations and inclusion test for attribute-element constraints , 2006, Theor. Comput. Sci..

[46]  Benjamin C. Pierce,et al.  The XTATIC Experience , 2004 .

[47]  Simon L. Peyton Jones,et al.  Secrets of the Glasgow Haskell Compiler inliner , 2002, Journal of Functional Programming.

[48]  Christian Kirkegaard,et al.  A Runtime System for XML Transformations in Java , 2004, XSym.

[49]  Ville Laurikari,et al.  Efficient submatch addressing for regular expressions , 2001 .

[50]  Janis Voigtl Concatenate, Reverse and Map Vanish For Free , 2002 .

[51]  Alain Frisch Regular Tree Language Recognition with Static Information , 2004, IFIP TCS.

[52]  Peter Sestoft,et al.  MK Pattern Match Compilation and Partial Evaluation , 1996, Dagstuhl Seminar on Partial Evaluation.

[53]  Philip Wadler,et al.  Experience with an applicative string processing language , 1980, POPL '80.

[54]  John Hughes,et al.  A Novel Representation of Lists and its Application to the Function "reverse" , 1986, Inf. Process. Lett..

[55]  Haruo Hosaya,et al.  Regular expression filters for XML , 2006, Journal of Functional Programming.