PXML: a probabilistic semistructured data model and algebra

Despite the recent proliferation of work on semistructured data models, there has been little work to date on supporting uncertainty in these models. We propose a model for probabilistic semistructured data (PSD). The advantage of our approach is that it supports a flexible representation that allows the specification of a wide class of distributions over semistructured instances. We provide two semantics for the model and show that the semantics are probabilistically coherent. Next, we develop an extension of the relational algebra to handle probabilistic semistructured data and describe efficient algorithms for answering queries that use this algebra. Finally, we present experimental results showing the efficiency of our algorithms.

[1]  H. V. Jagadish,et al.  ProTDB: Probabilistic Data in XML , 2002, VLDB.

[2]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[3]  Laks V. S. Lakshmanan,et al.  ProbView: a flexible probabilistic database system , 1997, TODS.

[4]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[5]  S. Boag,et al.  XQuery 1.0 : An XML query language, W3C Working Draft 12 November 2003 , 2003 .

[6]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[7]  Laks V. S. Lakshmanan,et al.  TAX: A Tree Algebra for XML , 2001, DBPL.

[8]  Thomas Lukasiewicz,et al.  Extension of the Relational Algebra to Probabilistic Complex Values , 2000, FoIKS.

[9]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[10]  Sumit Sarkar,et al.  A probabilistic relational model and algebra , 1996, TODS.

[11]  Alin Deutsch,et al.  XML-QL: A Query Language for XML , 1998 .

[12]  Arvind Malhotra,et al.  Xml schema part 2: datatypes , 1999 .

[13]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[14]  Rina Dechter,et al.  Bucket elimination: A unifying framework for probabilistic inference , 1996, UAI.

[15]  Roy Goldman,et al.  Lore: a database management system for semistructured data , 1997, SGMD.

[16]  Catriel Beeri,et al.  SAL: An Algebra for Semistructured Data and XML , 1999, WebDB.