Regular expression pattern matching for XML

We propose regular expression pattern matching as a core feature of programming languages for manipulating XML. We extend conventional pattern-matching facilities (as in ML) with regular expression operators such as repetition (*), alternation (|), etc., that can match arbitrarily long sequences of subtrees, allowing a compact pattern to extract data from the middle of a complex sequence. We then show how to check standard notions of exhaustiveness and redundancy for these patterns. Regular expression patterns are intended to be used in languages with type systems based on regular expression types. To avoid excessive type annotations, we develop a type inference scheme that propagates type constraints to pattern variables from the type of input values. The type inference algorithm translates types and patterns into regular tree automata, and then works in terms of standard closure operations (union, intersection, and difference) on tree automata. The main technical challenge is dealing with the interaction of repetition and alternation patterns with the first-match policy, which gives rise to subtleties concerning both the termination and precision of the analysis. We address these issues by introducing a data structure representing these closure operations lazily.

[1]  Robert Cartwright,et al.  A practical soft type system for scheme , 1997, TOPL.

[2]  Christian Queinnec,et al.  Compilation of Non-Linear, Second Order Patterns on S-Expressions , 1990, PLILP.

[3]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[4]  Giora Slutzki,et al.  Alternating Tree Automata , 1983, Theoretical Computer Science.

[5]  Benjamin C. Pierce,et al.  Recursive subtyping revealed , 2000, Journal of Functional Programming.

[6]  Helmut Seidl,et al.  Locating Matches of Tree Patterns in Forests , 1998, FSTTCS.

[7]  Benjamin C. Pierce,et al.  Xduce: a typed xml processing language , 1997 .

[8]  Luca Cardelli,et al.  Anytime, anywhere: modal logics for mobile ambients , 2000, POPL '00.

[9]  Robin Milner,et al.  Definition of standard ML , 1990 .

[10]  Makoto Murata,et al.  Transformation of Documents and Schemas by Patterns and Contextual Conditions , 1996, PODP.

[11]  Luca Cardelli,et al.  TQL: a query language for semistructured data based on the ambient logic , 2004, Mathematical Structures in Computer Science.

[12]  John Tang Boyland,et al.  Statically checkable pattern abstractions , 1997, ICFP '97.

[13]  Laurence Puel,et al.  Compiling pattern matching by term decomposition , 1990, LISP and Functional Programming.

[14]  Benjamin C. Pierce,et al.  XDuce: A Typed XML Processing Language (Preliminary Report) , 2000, WebDB.

[15]  Dan Suciu,et al.  Typechecking for XML transformers , 2000, J. Comput. Syst. Sci..

[16]  Sophie Cluet,et al.  Using YAT to Build a Web Server , 1998, WebDB.

[17]  Thomas Schwentick,et al.  Expressive and efficient pattern languages for tree-structured data (extended abstract) , 2000, PODS '00.

[18]  Philip Wadler,et al.  The Glasgow Haskell Compiler: a technical overview , 1993 .

[19]  Sophie Tison,et al.  Set Constraints and Automata , 1999, Inf. Comput..

[20]  Benjamin C. Pierce,et al.  Regular expression types for XML , 2000, TOPL.

[21]  Yannis Papakonstantinou,et al.  DTD inference for views of XML data , 2000, PODS.

[22]  Dan Suciu,et al.  Type inference for queries on semistructured data , 1999, PODS '99.

[23]  Rod M. Burstall,et al.  HOPE: An experimental applicative language , 1980, LISP Conference.

[24]  Nils Klarlund,et al.  DSD: A schema language for XML , 2000, FMSP '00.

[25]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[26]  Benjamin C. Pierce,et al.  Recursive subtyping revealed: (functional pearl) , 2000, ICFP '00.

[27]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[28]  Helmut Seidl Deciding Equivalence of Finite Tree Automata , 1990, SIAM J. Comput..

[29]  Steven J. DeRose,et al.  XML Path Language (XPath) , 1999 .

[30]  Alin Deutsch,et al.  XML-QL: A Query Language for XML , 1998 .

[31]  Alex Aiken,et al.  Solving Systems of Set Constraints (Extended Abstract) , 1992, LICS 1992.