Schemas for Unordered XML on a DIME

We investigate schema languages for unordered XML having no relative order among siblings. First, we propose unordered regular expressions (UREs), essentially regular expressions with unordered concatenation instead of standard concatenation, that define languages of unordered words to model the allowed content of a node (i.e., collections of the labels of children). However, unrestricted UREs are computationally too expensive as we show the intractability of two fundamental decision problems for UREs: membership of an unordered word to the language of a URE and containment of two UREs. Consequently, we propose a practical and tractable restriction of UREs, disjunctive interval multiplicity expressions (DIMEs). Next, we employ DIMEs to define languages of unordered trees and propose two schema languages: disjunctive interval multiplicity schema (DIMS), and its restriction, disjunction-free interval multiplicity schema (IMS). We study the complexity of the following static analysis problems: schema satisfiability, membership of a tree to the language of a schema, schema containment, as well as twig query satisfiability, implication, and containment in the presence of schema. Finally, we study the expressive power of the proposed schema languages and compare them with yardstick languages of unordered trees (FO, MSO, and Presburger constraints) and DTDs under commutative closure. Our results show that the proposed schema languages are capable of expressing many practical languages of unordered trees and enjoy desirable computational properties.

[1]  Harold R. Solbrig,et al.  Validating RDF with Shape Expressions , 2014, ArXiv.

[2]  Silvano Dal-Zilio,et al.  XML Schema, Tree Logic and Sheaves Automata , 2003, RTA.

[3]  Thomas J. Schaefer,et al.  The complexity of satisfiability problems , 1978, STOC.

[4]  Thomas Schwentick,et al.  Counting in trees , 2008, Logic and Automata.

[5]  Catriel Beeri,et al.  Schemas for Integration and Translation of Structured and Semi-structured Data , 1999, ICDT.

[6]  Alain J. Mayer,et al.  The Complexity of Word Problems - This Time with Interleaving , 1994, Inf. Comput..

[7]  Laks V. S. Lakshmanan,et al.  Tree pattern query minimization , 2002, The VLDB Journal.

[8]  Victor Vianu,et al.  Validating streaming XML documents , 2002, PODS.

[9]  Jean-Marc Talbot,et al.  Expressiveness of spatial logic for trees , 2005, 20th Annual IEEE Symposium on Logic in Computer Science (LICS' 05).

[10]  Ioana Manolescu,et al.  A Benchmark for XML Data Management , 2002 .

[11]  Frank Neven,et al.  DTDs versus XML schema: a practical study , 2004, WebDB '04.

[12]  Peter T. Wood,et al.  XPath Query Satisfiability is in PTIME for Real-World DTDs , 2007, XSym.

[13]  Albert R. Meyer,et al.  Word problems requiring exponential time(Preliminary Report) , 1973, STOC.

[14]  Jean-Marc Talbot,et al.  Automata and Logics for Unranked and Unordered Trees , 2005, RTA.

[15]  Dan Suciu,et al.  Containment and equivalence for a fragment of XPath , 2004, JACM.

[16]  Cristina Sirangelo,et al.  Constant-Memory Validation of Streaming XML Documents Against DTDs , 2007, ICDT.

[17]  Thomas Schwentick,et al.  Complexity of Decision Problems for XML Schemas and Chain Regular Expressions , 2009, SIAM J. Comput..

[18]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[19]  Michael Benedikt,et al.  XPath satisfiability in the presence of DTDs , 2008, JACM.

[20]  Yannis Papakonstantinou,et al.  DTD inference for views of XML data , 2000, PODS.

[21]  Thomas Schwentick,et al.  XML schemas without order , 1999 .

[22]  Derick Wood,et al.  Normal form algorithms for extended context-free grammars , 2001, Theor. Comput. Sci..

[23]  Alain J. Mayer,et al.  Word Problems - This Time with Interleaving , 1991 .

[24]  Anthony Widjaja Lin,et al.  Parikh Images of Grammars: Complexity and Applications , 2010, 2010 25th Annual IEEE Symposium on Logic in Computer Science.

[25]  Derick Wood,et al.  One-Unambiguous Regular Languages , 1998, Inf. Comput..

[26]  Dario Colazzo,et al.  Efficient inclusion for a class of XML types with interleaving and counting , 2009, Inf. Syst..

[27]  Frank Neven,et al.  On the complexity of typechecking top-down XML transformations , 2005, Theor. Comput. Sci..

[28]  ColazzoDario,et al.  Almost-linear inclusion for XML regular expression types , 2013 .

[29]  Serge Abiteboul,et al.  Highly Expressive Query Languages for Unordered Data Trees , 2012, ICDT '12.

[30]  Thomas Schwentick,et al.  Validity of Tree Pattern Queries with Respect to Schema Information , 2013, MFCS.

[31]  Dario Colazzo,et al.  Linear time membership in a class of regular expressions with interleaving and counting , 2008, CIKM '08.

[32]  Denis Lugiez,et al.  XML schema, tree logic and sheaves automata , 2003, Applicable Algebra in Engineering, Communication and Computing.

[33]  Frank Neven,et al.  Typechecking top-down XML transformations: Fixed input or output schemas , 2006, Inf. Comput..

[34]  Luca Cardelli,et al.  TQL: A Query Language for Semistructured Data Based on the Ambient Logic , 2003 .

[35]  Derek C. Oppen,et al.  A 2^2^2^pn Upper Bound on the Complexity of Presburger Arithmetic , 1978, J. Comput. Syst. Sci..

[36]  Maarten Marx,et al.  The quality of the XML web , 2011, CIKM '11.

[37]  Slawomir Staworko,et al.  Learning twig and path queries , 2012, ICDT '12.

[38]  Frank Neven,et al.  Optimizing Schema Languages for XML: Numerical Constraints and Interleaving , 2009, SIAM J. Comput..

[39]  Toru Fujiwara,et al.  Validity of Positive XPath Queries with Wildcard in the Presence of DTDs , 2011, DBPL.

[40]  Dag Hovland,et al.  The Membership Problem for Regular Expressions with Unordered Concatenation and Numerical Constraints , 2012, LATA.

[41]  Thomas Schwentick,et al.  Inference of concise regular expressions and DTDs , 2010, TODS.

[42]  Dario Colazzo,et al.  Almost-linear inclusion for XML regular expression types , 2013, TODS.

[43]  Thomas Schwentick,et al.  Complexity of Decision Problems for Simple Regular Expressions , 2004, MFCS.

[44]  Slawomir Staworko,et al.  Learning Schemas for Unordered XML , 2013, DBPL.

[45]  Thomas Schwentick,et al.  On the complexity of XPath containment in the presence of disjunction, DTDs, and variables , 2006, Log. Methods Comput. Sci..

[46]  Iovka Boneva,et al.  Simple Schemas for Unordered XML , 2013, WebDB.

[47]  Henrik Björklund,et al.  Recognizing Shuffled Languages , 2011, LATA.

[48]  Thomas Schwentick,et al.  Numerical document queries , 2003, PODS.

[49]  Thomas Schwentick Trees, automata and XML , 2004, PODS '04.