Representing and querying XML with incomplete information

We study the representation and querying of XML with incomplete information. We consider a simple model for XML data and their DTDs, a very simple query language, and a representation system for incomplete information in the spirit of the representations systems developed by Imielinski and Lipski for relational databases. In the scenario we consider, the incomplete information about an XML document is continuously enriched by successive queries to the document. We show that our representation system can represent partial information about the source document acquired by successive queries, and that it can be used to intelligently answer new queries. We also consider the impact on complexity of enriching our representation system or query language with additional features. The results suggest that our approach achieves a practically appealing balance between expressiveness and tractability. The research presented here was motivated by the Xyleme project at INRIA, whose objective it to develop a data warehouse for Web XML documents.

[1]  Jennifer Widom,et al.  The WHIPS prototype for data warehouse creation and maintenance , 1997, SIGMOD '97.

[2]  Sophie Cluet,et al.  Your mediators need data conversion! , 1998, SIGMOD '98.

[3]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[4]  Diego Calvanese,et al.  Rewriting of regular expressions and regular path queries , 1999, PODS '99.

[5]  Moshe Y. Vardi The complexity of relational query languages (Extended Abstract) , 1982, STOC '82.

[6]  Diego Calvanese,et al.  Answering regular path queries using views , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[7]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[8]  Kyuseok Shim,et al.  Optimizing queries with materialized views , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[9]  Moshe Y. Vardi On the integrity of databases with incomplete information , 1985, PODS.

[10]  Yannis Papakonstantinou,et al.  DTD inference for views of XML data , 2000, PODS.

[11]  Werner Nutt,et al.  Queries with incomplete answers over semistructured data , 1999, PODS '99.

[12]  Stavros S. Cosmadakis The Complexity of Evaluating Relational Queries , 1983, Inf. Control..

[13]  Serge Abiteboul,et al.  On the Representation and Querying of Sets of Possible Worlds , 1991, Theor. Comput. Sci..

[14]  Serge Abiteboul,et al.  Complexity of answering queries using materialized views , 1998, PODS.

[15]  Gösta Grahne,et al.  The Problem of Incomplete Information in Relational Databases , 1991, Lecture Notes in Computer Science.

[16]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[17]  Diego Calvanese,et al.  Semi-structured Data with Constraints and Incomplete Information , 1998, Description Logics.

[18]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[19]  Hector Garcia-Molina,et al.  Expiring Data in a Warehouse , 1998, VLDB.

[20]  Carlo Zaniolo,et al.  Database relations with null values , 1982, J. Comput. Syst. Sci..

[21]  Laks V. S. Lakshmanan,et al.  Minimization of tree pattern queries , 2001, SIGMOD '01.

[22]  Diego Calvanese,et al.  View-based query processing for regular path queries with inverse , 2000, PODS '00.

[23]  Jeffrey D. Ullman,et al.  Answering queries using templates with binding patterns (extended abstract) , 1995, PODS '95.

[24]  Diego Calvanese,et al.  Lossless regular views , 2002, PODS.

[25]  Mihalis Yannakakis,et al.  On the Complexity of Testing Implications of Functional and Join Dependencies , 1981, JACM.

[26]  Mihalis Yannakakis,et al.  Testing the Universal Instance Assumption , 1980, Inf. Process. Lett..

[27]  Catriel Beeri,et al.  Schemas for Integration and Translation of Structured and Semi-structured Data , 1999, ICDT.

[28]  Raymond Reiter,et al.  A sound and sometimes complete query evaluation algorithm for relational databases with null values , 1986, JACM.

[29]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[30]  Larry Joseph Stockmeyer,et al.  The complexity of decision problems in automata theory and logic , 1974 .

[31]  E. F. Codd,et al.  Understanding Relations (Installment #7) , 1974, FDT Bull. ACM SIGFIDET SIGMOD.

[32]  Anand Rajaraman,et al.  Answering queries using templates with binding patterns (extended abstract) , 1995, PODS.

[33]  Alon Y. Halevy,et al.  Theory of answering queries using views , 2000, SGMD.