Schema design for XML repositories: complexity and tractability

Abiteboul et al. initiated the systematic study of distributed XML documents consisting of several logical parts, possibly located on different machines. The physical distribution of such documents immediately raises the following question: how can a global schema for the distributed document be broken up into local schemas for the different logical parts? The desired set of local schemas should guarantee that, if each logical part satisfies its local schema, then the distributed document satisfies the global schema. Abiteboul et al. proposed three levels of desirability for local schemas: local typing, maximal local typing, and perfect local typing. Immediate algorithmic questions are: (i) given a typing, determine whether it is local, maximal local, or perfect, and (ii) given a document and a schema, establish whether a (maximal) local or perfect typing exists. This paper improves the open complexity results in their work and initiates the study of (i) and (ii) for schema restrictions arising from the current standards: DTDs and XML Schemas with deterministic content models. The most striking result is that these restrictions yield tractable complexities for the perfect typing problem. Furthermore, an open problem in Formal Language Theory is settled: deciding language primality for deterministic finite automata is pspace-complete.

[1]  Grzegorz Rozenberg,et al.  Handbook of Formal Languages , 1997, Springer Berlin Heidelberg.

[2]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[3]  Wojciech Rytter,et al.  Linear-Time Prime Decomposition Of Regular Prefix Codes , 2003, Int. J. Found. Comput. Sci..

[4]  Georg Gottlob,et al.  Distributed XML Design , 2011, J. Comput. Syst. Sci..

[5]  Derick Wood,et al.  One-Unambiguous Regular Languages , 1998, Inf. Comput..

[6]  Kai Salomaa Language Decompositions, Primality, and Trajectory-Based Operations , 2008, CIAA.

[7]  Paul J. Walmsley,et al.  XML Schema Part 0: Primer Second Edition , 2004 .

[8]  Diego Calvanese,et al.  Rewriting of regular expressions and regular path queries , 1999, PODS '99.

[9]  Sergey V. Avgustinovich,et al.  A Unique Decomposition Theorem for Factorial Languages , 2005, Int. J. Algebra Comput..

[10]  Tao Jiang,et al.  Minimal NFA Problems are Hard , 1991, SIAM J. Comput..

[11]  Arto Salomaa,et al.  On the decomposition of finite languages , 1999, Developments in Language Theory.

[12]  C. M. Sperberg-McQueen,et al.  W3C XML Schema Definition Language (XSD) 1.1 Part 1: Structures , 2012 .

[13]  Moshe Y. Vardi An Automata-Theoretic Approach to Linear Temporal Logic , 1996, Banff Higher Order Workshop.

[14]  Shou-Feng Wang,et al.  𝒫𝒮-regular languages , 2011, Int. J. Comput. Math..

[15]  Thomas Schwentick,et al.  Complexity of Decision Problems for XML Schemas and Chain Regular Expressions , 2009, SIAM J. Comput..

[16]  Michal Kunc,et al.  What Do We Know About Language Equations? , 2007, Developments in Language Theory.

[17]  J. Conway Regular algebra and finite machines , 1971 .

[18]  Sebastian Bala Regular Language Matching and Other Decidable Cases of the Satisfiability Problem for Constraints between Regular Open Terms , 2004, STACS.

[19]  Yannis Papakonstantinou,et al.  DTD inference for views of XML data , 2000, PODS.

[20]  W. Wieczorek,et al.  An algorithm for the decomposition of finite languages , 2010, Log. J. IGPL.

[21]  Arto Salomaa,et al.  Length Codes, Products of Languages and Primality , 2008, LATA.

[22]  Oasis RELAX NG Specification , 2001 .

[23]  Frank Neven,et al.  Simplifying XML schema: effortless handling of nondeterministic regular expressions , 2009, SIGMOD Conference.

[24]  Derick Wood,et al.  Prime Decompositions of Regular Languages , 2006, Developments in Language Theory.

[25]  Derick Wood,et al.  One-Unambiguous Regular Languages , 1998, Inf. Comput..