Regular expression types for XML

We propose regular expression types as a foundation for XML processing languages. Regular expression types are a natural generalization of Document Type Definitions (DTDs), describing structures in XML documents using regular expression operators (i.e., *, ?, |, etc.) and supporting a simple but powerful notion of subtyping.The decision problem for the subtype relation is EXPTIME-hard, but it can be checked quite efficiently in many cases of practical interest. The subtyping algorithm developed here is a variant of Aiken and Murphy's set-inclusion constraint solver, to which are added several optimizations and two new properties: (1) our algorithm is provably complete, and (2) it allows a useful "subtagging" relation between nodes with different labels in XML trees.

[1]  Janusz A. Brzozowski,et al.  Derivatives of Regular Expressions , 1964, JACM.

[2]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[3]  Ewing L. Lusk,et al.  Proceedings of the 9th International Conference on Automated Deduction , 1988 .

[4]  Amy P. Felty,et al.  Lambda-Prolog: An Extended Logic Programming Language , 1988, CADE.

[5]  Ewing Lusk,et al.  9th International Conference on Automated Deduction , 1988, Lecture Notes in Computer Science.

[6]  Helmut Seidl Deciding Equivalence of Finite Tree Automata , 1990, SIAM J. Comput..

[7]  Alexander Aiken,et al.  Implementing Regular Tree Expressions , 1991, FPCA.

[8]  Luca Cardelli,et al.  Subtyping recursive types , 1991, POPL '91.

[9]  Frank Pfenning,et al.  Refinement types for ML , 1991, PLDI '91.

[10]  Alex Aiken,et al.  Solving Systems of Set Constraints (Extended Abstract) , 1992, LICS 1992.

[11]  Alexander Aiken,et al.  Solving systems of set constraints , 1992, [1992] Proceedings of the Seventh Annual IEEE Symposium on Logic in Computer Science.

[12]  Flemming M. Damm,et al.  Subtyping with Union Types, Intersection Types and Recursive Types , 1994, TACS.

[13]  Robert Paige,et al.  Using Multiset Discrimination to Solve Language Processing Problems Without Hashing , 1995, Theor. Comput. Sci..

[14]  Making Set-Constraint Program Analyses Scale , 1996 .

[15]  C. M. Sperberg-McQueen,et al.  Extensible markup language , 1997 .

[16]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[17]  Hubert Comon,et al.  Tree automata techniques and applications , 1997 .

[18]  Fritz Henglein,et al.  Coinductive Axiomatization of Recursive Type Equality and Subtyping , 1998, Fundam. Informaticae.

[19]  Dan Suciu,et al.  Adding Structure to Unstructured Data , 1997, ICDT.

[20]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[21]  Alin Deutsch,et al.  XML-QL: A Query Language for XML , 1998 .

[22]  Sophie Cluet,et al.  Using YAT to Build a Web Server , 1998, WebDB.

[23]  Lennart Augustsson,et al.  Cayenne—a language with dependent types , 1998, ICFP '98.

[24]  Makoto Murata,et al.  Hedge automata: a formal model for xml schemata , 1999 .

[25]  Peter Buneman,et al.  Union Types for Semistructured Data , 1999, DBPL.

[26]  Sophie Tison,et al.  Set Constraints and Automata , 1999, Inf. Comput..

[27]  Sudarshan S. Chawathe,et al.  Comparing Hierarchical Data in External Memory , 1999, VLDB.

[28]  Colin Runciman,et al.  Haskell and XML: generic combinators or type-based translation? , 1999, ICFP '99.

[29]  Benjamin C. Pierce,et al.  XDuce: A Typed XML Processing Language (Preliminary Report) , 2000, WebDB.

[30]  Dan Suciu,et al.  Typechecking for XML transformers , 2000, J. Comput. Syst. Sci..

[31]  Philip Wadler,et al.  An Algebra for XML Query , 2000, FSTTCS.

[32]  Benjamin C. Pierce,et al.  Recursive subtyping revealed: (functional pearl) , 2000, ICFP '00.

[33]  Yannis Papakonstantinou,et al.  DTD inference for views of XML data , 2000, PODS.

[34]  Nils Klarlund,et al.  DSD: A schema language for XML , 2000, FMSP '00.

[35]  Gabriel M. Kuper,et al.  Subsumption for XML types , 2001, ICDT.

[36]  Regular expression pattern matching for XML , 2001, POPL.

[37]  Benjamin C. Pierce,et al.  Regular expression pattern matching for XML , 2003, POPL '01.

[38]  Erik Meijer,et al.  Type-indexed rows , 2001, POPL '01.

[39]  Philip Wadler,et al.  A Semi-monad for Semi-structured Data , 2001, ICDT.

[40]  Benjamin C. Pierce,et al.  Recursive subtyping revealed , 2000, Journal of Functional Programming.

[41]  Giuseppe Castagna,et al.  Semantic subtyping , 2002, Proceedings 17th Annual IEEE Symposium on Logic in Computer Science.