XML data exchange: consistency and query answering

Data exchange is the problem of finding an instance of a target schema, given an instance of a source schema and a specification of the relationship between the source and the target. Theoretical foundations of data exchange have recently been investigated for relational data.In this paper, we start looking into the basic properties of XML data exchange, that is, restructuring of XML documents that conform to a source DTD under a target DTD, and answering queries written over the target schema. We define XML data exchange settings in which source-to-target dependencies refer to the hierarchical structure of the data. Combining DTDs and dependencies makes some XML data exchange settings inconsistent. We investigate the consistency problem and determine its exact complexity.We then move to query answering, and prove a dichotomy theorem that classifies data exchange settings into those over which query answering is tractable, and those over which it is coNP-complete, depending on classes of regular expressions used in DTDs. Furthermore, for all tractable cases we give polynomial-time algorithms that compute target XML documents over which queries can be answered.

[1]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[2]  Laura M. Haas,et al.  The Clio project: managing heterogeneity , 2001, SGMD.

[3]  Serge Abiteboul,et al.  On the Representation and Querying of Sets of Possible Worlds , 1991, Theor. Comput. Sci..

[4]  Thomas Schwentick,et al.  XPath Containment in the Presence of Disjunction, DTDs, and Variables , 2003, ICDT.

[5]  Ronald Fagin,et al.  Locally consistent transformations and query answering in data exchange , 2004, PODS '04.

[6]  Laks V. S. Lakshmanan,et al.  Tree pattern query minimization , 2002, The VLDB Journal.

[7]  Dan Suciu,et al.  Journal of the ACM , 2006 .

[8]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[9]  Serge Abiteboul,et al.  On the representation and querying of sets of possible worlds , 1987, SIGMOD '87.

[10]  Cong Yu,et al.  Constraint-based XML query rewriting for data integration , 2004, SIGMOD '04.

[11]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[12]  Hendrik W. Lenstra,et al.  Integer Programming with a Fixed Number of Variables , 1983, Math. Oper. Res..

[13]  Serge Abiteboul,et al.  Representing and querying XML with incomplete information , 2006, TODS.

[14]  Christos H. Papadimitriou,et al.  On the complexity of integer programming , 1981, JACM.

[15]  Georg Gottlob,et al.  Conjunctive queries over trees , 2004, JACM.

[16]  Jeffrey F. Naughton,et al.  XML-SQL Query Translation Literature: The State of the Art and Open Problems , 2003, Xsym.

[17]  Hubert Comon,et al.  Tree automata techniques and applications , 1997 .

[18]  Serge Abiteboul,et al.  Complexity of answering queries using materialized views , 1998, PODS.

[19]  Alin Deutsch,et al.  Containment and Integrity Constraints for XPath , 2001, KRDB.

[20]  Helmut Seidl Deciding Equivalence of Finite Tree Automata , 1990, SIAM J. Comput..

[21]  Gabriel M. Kuper,et al.  Structural properties of XPath fragments , 2003, Theor. Comput. Sci..

[22]  Ronald Fagin,et al.  Translating Web Data , 2002, VLDB.

[23]  Sihem Amer-Yahia,et al.  A Web-services architecture for efficient XML data exchange , 2004, Proceedings. 20th International Conference on Data Engineering.

[24]  Vincent Y. Lum,et al.  EXPRESS: a data EXtraction, Processing, and Restructuring System , 1977, TODS.

[25]  Dexter Kozen On two letters versus three , 2002, FICS.

[26]  Frank Neven,et al.  Automata, Logic, and XML , 2002, CSL.

[27]  Ronald Fagin,et al.  Composing schema mappings: second-order dependencies to the rescue , 2004, PODS '04.

[28]  Peter T. Wood,et al.  Containment for XPath Fragments under DTD Constraints , 2003, ICDT.

[29]  Laks V. S. Lakshmanan,et al.  On Testing Satisfiability of Tree Pattern Queries , 2004, VLDB.

[30]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.