Regular expression filters for XML

XML data are described by types involving regular expressions. This raises the question of what language feature is convenient for manipulating such data. Previously, we have given an answer to this question by proposing regular expression pattern matching. However, since this construct is derived from ML pattern matching, it does not have an iteration functionality in itself, which makes it cumbersome to process data typed by Kleene stars. In this paper, we propose a novel programming feature regular expression filters. This construct extends the previous proposal by permitting pattern clauses to be closed under arbitrary regular expression operators. This yields many convenient programming idioms such as non-uniform processing of sequences and almost-copying of trees. We further develop a type inference mechanism that obtains (1) types for pattern variables that are locally precise with respect to the type of input values and (2) a type for the result of the whole filter expression that is also locally precise with respect to the types of the body expressions. We discuss how our construct is useful in the practice of XML processing and, in particular, how our type inference is crucial for avoiding changes of programs when types of data to be processed evolve frequently.

[1]  Giuseppe Castagna,et al.  CDuce: an XML-centric general-purpose language , 2003, ACM SIGPLAN Notices.

[2]  Dan Suciu,et al.  Typechecking for XML transformers , 2000, PODS '00.

[3]  Martin Sulzmann,et al.  Xhaskell: regular expression types for haskell , 2004 .

[4]  Philip Wadler,et al.  A Semi-monad for Semi-structured Data , 2001, ICDT.

[5]  C. M. Sperberg-McQueen,et al.  Extensible markup language , 1997 .

[6]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[7]  Murali Mani,et al.  Taxonomy of XML schema languages using formal language theory , 2005, TOIT.

[8]  Benjamin C. Pierce,et al.  Xduce: a typed xml processing language , 1997 .

[9]  Alin Deutsch,et al.  A Query Language for XML , 1999, Comput. Networks.

[10]  Helmut Seidl,et al.  Macro forest transducers , 2004, Inf. Process. Lett..

[11]  Luca Cardelli,et al.  A Query Language Based on the Ambient Logic , 2001, SEBD.

[12]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[13]  Benjamin C. Pierce,et al.  Regular expression types for XML , 2000, TOPL.

[14]  Giuseppe Castagna,et al.  Parametric polymorphism for XML , 2005, POPL '05.

[15]  Akihiko Tozawa Towards static type checking for XSLT , 2001, DocEng '01.

[16]  Giuseppe Castagna,et al.  CDuce: an XML-centric general-purpose language , 2003, ICFP '03.

[17]  Paolo Manghi,et al.  Types for path correctness of XML queries , 2004, ICFP '04.

[18]  Benjamin C. Pierce,et al.  Regular expression pattern matching for XML , 2003, J. Funct. Program..

[19]  Alin Deutsch,et al.  XML-QL: A Query Language for XML , 1998 .

[20]  Hubert Comon,et al.  Tree automata techniques and applications , 1997 .

[21]  Helmut Seidl,et al.  XML type checking with macro tree transducers , 2005, PODS.

[22]  Benjamin C. Pierce,et al.  XDuce: A statically typed XML processing language , 2003, TOIT.

[23]  M. Murata Extended Path Expressions for XML [ Extended , 2001 .

[24]  Giuseppe Castagna,et al.  Semantic subtyping , 2002, Proceedings 17th Annual IEEE Symposium on Logic in Computer Science.

[25]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[26]  Haruo Hosoya,et al.  Regular expression pattern matching---a simpler design , 2003 .

[27]  Sophie Cluet,et al.  Using YAT to Build a Web Server , 1998, WebDB.

[28]  Frank Neven,et al.  Typechecking Top-Down Uniform Unranked Tree Transducers , 2003, ICDT.