Validity-Sensitive Querying of XML Databases Extended Abstract †

We consider the problem of using XPath to query XML documents which are not valid with respect to given DTDs. If a query is formulated under the assumption that documents satisfy a DTD and a given document does not, then the query answer in this document may be different from the expected one. We propose an alternative mode of query evaluation which actively incorporates the schema into the evaluation process: If a document is invalid, all possible valid documents obtained from it by applying the minimum number of repairing operations are considered. Conceptually, the query is evaluated in each such document, and the intersection of all the results is returned to the user. We study the issue of how the computational complexity of query evaluation in our approach depends on the repertoire of the repairing operations and the syntax of the query. We describe experiments that validate our approach.

[1]  Yannis Papakonstantinou,et al.  Incremental validation of XML documents , 2003, TODS.

[2]  Kaizhong Zhang,et al.  Approximate tree pattern matching , 1997 .

[3]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[4]  Stanley M. Selkow,et al.  The Tree-to-Tree Editing Problem , 1977, Inf. Process. Lett..

[5]  Serge Abiteboul,et al.  Detecting changes in XML documents , 2002, Proceedings 18th International Conference on Data Engineering.

[6]  Alberto H. F. Laender,et al.  Automatic web news extraction using tree edit distance , 2004, WWW '04.

[7]  Alfred V. Aho,et al.  A Minimum Distance Error-Correcting Parser for Context-Free Languages , 1972, SIAM J. Comput..

[8]  Renée J. Miller,et al.  ConQuer: efficient management of inconsistent databases , 2005, SIGMOD '05.

[9]  Nobutaka Suzuki,et al.  Finding an optimum edit script between an XML document and a DTD , 2005, SAC '05.

[10]  Wee Hyong Tok,et al.  Data cleaning and XML: the DBLP experience , 2002, Proceedings 18th International Conference on Data Engineering.

[11]  Michael Benedikt,et al.  XPath satisfiability in the presence of DTDs , 2008, JACM.

[12]  Frank Neven,et al.  Automata, Logic, and XML , 2002, CSL.

[13]  Thomas Schwentick,et al.  XML: Model, Schemas, Types, Logics, and Queries , 2003, Logics for Emerging Applications of Databases.

[14]  Jennifer Widom,et al.  Change detection in hierarchically structured information , 1996, SIGMOD '96.

[15]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[16]  Michel de Rougemont,et al.  Correctors for XML Data , 2004, XSym.

[17]  Sergio Greco,et al.  Repairs and Consistent Answers for XML Data with Functional Dependencies , 2003, Xsym.

[18]  Sudipto Guha,et al.  Approximate XML joins , 2002, SIGMOD '02.

[19]  Graham Cormode,et al.  The string edit distance matching problem with moves , 2002, SODA '02.

[20]  Serge Abiteboul,et al.  Incremental Maintenance for Materialized Views over Semistructured Data , 1998, VLDB.

[21]  Alex Thomo,et al.  Query Answering and Containment for Regular Path Queries under Distortions , 2004, FoIKS.

[22]  Sergio Greco,et al.  Querying and Repairing Inconsistent XML Data , 2005, WISE.

[23]  Maarten Marx,et al.  Conditional XPath, the first order complete XPath dialect , 2004, PODS.

[24]  Cong Yu,et al.  Schema-Free XQuery , 2004, VLDB.

[25]  Rajeev Rastogi,et al.  A cost-based model and effective heuristic for repairing constraints by value modification , 2005, SIGMOD '05.

[26]  Alex Thomo,et al.  Query containment and rewriting using views for regular path queries under constraints , 2003, PODS.

[27]  Moshe Y. Vardi The complexity of relational query languages (Extended Abstract) , 1982, STOC '82.