On the expressiveness of probabilistic XML models

Various known models of probabilistic XML can be represented as instantiations of the abstract notion of p-documents. In addition to ordinary nodes, p-documents have distributional nodes that specify the possible worlds and their probabilistic distribution. Particular families of p-documents are determined by the types of distributional nodes that can be used as well as by the structural constraints on the placement of those nodes in a p-document. Some of the resulting families provide natural extensions and combinations of previously studied probabilistic XML models. The focus of the paper is on the expressive power of families of p-documents. In particular, two main issues are studied. The first is the ability to (efficiently) translate a given p-document of one family into another family. The second is closure under updates, namely, the ability to (efficiently) represent the result of updating the instances of a p-document of a given family as another p-document of that family. For both issues, we distinguish two variants corresponding to value-based and object-based semantics of p-documents.

[1]  Dan Suciu,et al.  Management of probabilistic data: foundations and challenges , 2007, PODS '07.

[2]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[3]  Christoph E. Koch MayBMS: A System for Managing Large Uncertain and Probabilistic Databases , 2009 .

[4]  Yehoshua Sagiv,et al.  Incorporating constraints in probabilistic XML , 2009, TODS.

[5]  Serge Abiteboul,et al.  Querying and Updating Probabilistic Information in XML , 2006, EDBT.

[6]  Yehoshua Sagiv,et al.  Query evaluation over probabilistic XML , 2009, The VLDB Journal.

[7]  Val Tannen,et al.  Models for Incomplete and Probabilistic Information , 2006, IEEE Data Eng. Bull..

[8]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[9]  Jennifer Widom,et al.  Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[10]  Te Li,et al.  PEPX: a query-friendly probabilistic XML database , 2006, CIKM '06.

[11]  Yehoshua Sagiv,et al.  Query efficiency in probabilistic XML models , 2008, SIGMOD Conference.

[12]  Yehoshua Sagiv,et al.  Matching Twigs in Probabilistic XML , 2007, VLDB.

[13]  Charu C. Aggarwal,et al.  Managing and Mining Uncertain Data , 2009, Advances in Database Systems.

[14]  V. S. Subrahmanian,et al.  Probabilistic interval XML , 2003, TOCL.

[15]  Sarath Kumar Kondreddi,et al.  A Probabilistic XML Approach to Data Integration , 2009 .

[16]  Jef Wijsen,et al.  Current Trends in Database Technology - EDBT 2006: EDBT 2006 Workshop PhD, DataX, IIDB, IIHA, ICSNW, QLQP, PIM, PaRMa, and Reactivity on the Web, Munich, ... Papers (Lecture Notes in Computer Science) , 2006 .

[17]  Yehoshua Sagiv,et al.  Running tree automata on probabilistic XML , 2009, PODS.

[18]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[19]  V. S. Subrahmanian,et al.  PXML: a probabilistic semistructured data model and algebra , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[20]  Serge Abiteboul,et al.  On the complexity of managing probabilistic XML data , 2007, PODS '07.

[21]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[22]  Pierre Senellart Understanding the Hidden Web. (Comprendre le Web caché) , 2007 .

[23]  H. V. Jagadish,et al.  ProTDB: Probabilistic Data in XML , 2002, VLDB.