Expressiveness and complexity of XML Schema

The common abstraction of XML Schema by unranked regular tree languages is not entirely accurate. To shed some light on the actual expressive power of XML Schema, intuitive semantical characterizations of the Element Declarations Consistent (EDC) rule are provided. In particular, it is obtained that schemas satisfying EDC can only reason about regular properties of ancestors of nodes. Hence, with respect to expressive power, XML Schema is closer to DTDs than to tree automata. These theoretical results are complemented with an investigation of the XML Schema Definitions (XSDs) occurring in practice, revealing that the extra expressiveness of XSDs over DTDs is only used to a very limited extent. As this might be due to the complexity of the XML Schema specification and the difficulty of understanding the effect of constraints on typing and validation of schemas, a simpler formalism equivalent to XSDs is proposed. It is based on contextual patterns rather than on recursive types and it might serve as a light-weight front end for XML Schema. Next, the effect of EDC on the way XML documents can be typed is discussed. It is argued that a cleaner, more robust, larger but equally feasible class is obtained by replacing EDC with the notion of 1-pass preorder typing (1PPT): schemas that allow one to determine the type of an element of a streaming document when its opening tag is met. This notion can be defined in terms of grammars with restrained competition regular expressions and there is again an equivalent syntactical formalism based on contextual patterns. Finally, algorithms for recognition, simplification, and inclusion of schemas for the various classes are given.

[1]  Eric van der Vlist,et al.  XML Schema , 2002 .

[2]  Rajeev Alur,et al.  Visibly pushdown languages , 2004, STOC '04.

[3]  Yannis Papakonstantinou,et al.  Incremental validation of XML documents , 2003, TODS.

[4]  Murali Mani,et al.  Taxonomy of XML schema languages using formal language theory , 2005, TOIT.

[5]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .

[6]  Fabio Vitali,et al.  Schemapath, a minimal extension to xml schema for conditional constraints , 2004, WWW '04.

[7]  Philip Wadler,et al.  The essence of XML , 2003, POPL '03.

[8]  Derick Wood,et al.  Regular tree and regular hedge languages over unranked alphabets , 2001 .

[9]  Thomas Schwentick,et al.  Complexity of Decision Problems for Simple Regular Expressions , 2004, MFCS.

[10]  Fabio Vitali,et al.  Datatype- and namespace-aware DTDs: A minimal extension , 2003, Extreme Markup Languages®.

[11]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[12]  Oasis RELAX NG Specification , 2001 .

[13]  Thomas Schwentick,et al.  Expressiveness of XSDs: from practice to theory, there and back again , 2005, WWW '05.

[14]  Derick Wood,et al.  One-Unambiguous Regular Languages , 1998, Inf. Comput..

[15]  Dongwon Lee,et al.  Comparative analysis of six XML schema languages , 2000, SGMD.

[16]  Nils Klarlund,et al.  The DSD Schema Language , 2002, Automated Software Engineering.

[17]  Christof Löding,et al.  Deterministic Automata on Unranked Trees , 2005, FCT.

[18]  Yannis Papakonstantinou,et al.  DTD inference for views of XML data , 2000, PODS.

[19]  Benjamin C. Pierce,et al.  XDuce: A statically typed XML processing language , 2003, TOIT.

[20]  Stefanie Scherzinger,et al.  Attribute grammars for scalable query processing on XML streams , 2005, The VLDB Journal.

[21]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[22]  Helmut Seidl Deciding Equivalence of Finite Tree Automata , 1990, SIAM J. Comput..

[23]  Frank Neven,et al.  On the complexity of typechecking top-down XML transformations , 2005, Theor. Comput. Sci..

[24]  Frank Neven,et al.  Automata, Logic, and XML , 2002, CSL.

[25]  Victor Vianu,et al.  Validating streaming XML documents , 2002, PODS.

[26]  守屋 悦朗,et al.  J.E.Hopcroft, J.D. Ullman 著, "Introduction to Automata Theory, Languages, and Computation", Addison-Wesley, A5変形版, X+418, \6,670, 1979 , 1980 .

[27]  Frank Neven,et al.  DTDs versus XML schema: a practical study , 2004, WebDB '04.

[28]  Thomas Schwentick,et al.  Which XML Schemas Admit 1-Pass Preorder Typing? , 2005, ICDT.

[29]  Fabio Vitali,et al.  DTD++ 2.0: Adding support for co-constraints , 2004, Extreme Markup Languages®.

[30]  Dan Suciu Typechecking for Semistructured Data , 2001, DBPL.

[31]  Frank Neven,et al.  Frontiers of tractability for typechecking simple XML transformations , 2004, PODS.

[32]  Arnaud Sahuguet Everything You Ever Wanted to Know About DTDs, But Were Afraid to Ask , 2000, WebDB.

[33]  Joachim Niehren,et al.  Minimizing Tree Automata for Unranked Trees , 2005, DBPL.

[34]  Thomas Wilke,et al.  Translating Regular Expressions into Small epsilon-Free Nondeterministic Finite Automata , 1997, STACS.

[35]  Albert R. Meyer,et al.  Word problems requiring exponential time(Preliminary Report) , 1973, STOC.

[36]  Frank Neven,et al.  Automata theory for XML researchers , 2002, SGMD.