Representing and querying uncertain data with complex correlations

Many applications such as sensor networks, RFID, scientific experimental measurements, stock market prediction, information extraction, etc., need to manage uncertain data and process complex correlations among uncertain data. In probabilistic database systems, uncertain data are represented through attaching probability value to tuples, maybe attributes. Some probabilistic data models assume that tuples are independent of each other and cannot express data correlations effectively. Although others based on probabilistic graph model can capture the representation of uncertainty and complex correlations, the scalability of query and probabilistic inference cannot satisfy the needs of the applications well. In this paper, a novel probabilistic data model RTx-PDM is proposed. RTx-PDM can not only handle arbitrary uncertain data natively at the attribute or tuple level but also represent the correlations among uncertain data with the intuitive BLOCK structure. Especially, RTx-PDM can effectively express shared and schema-level correlations in a compact way through using BLOCK. Traditional relation operators are extended to support manipulating BLOCKs and representing correlations in the operation results. Experimental results validate our approach and demonstrate the effectiveness of exploiting data correlations during query processing.

[1]  Sunil Prabhakar,et al.  U-DBMS: A Database System for Managing Constantly-Evolving Data , 2005, VLDB.

[2]  Jennifer Widom,et al.  Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[3]  Dan Olteanu,et al.  Efficient Representation and Processing of Incomplete Information , 2006 .

[4]  Prithviraj Sen,et al.  Representing and Querying Correlated Tuples in Probabilistic Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[5]  Christopher Ré,et al.  Access Methods for Markovian Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[6]  Amol Deshpande,et al.  Online Filtering, Smoothing and Probabilistic Modeling of Streaming data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7]  Martin Vetterli,et al.  Network correlated data gathering with explicit communication: NP-completeness and algorithms , 2006 .

[8]  Dan Olteanu,et al.  MayBMS: Managing Incomplete Information with Probabilistic World-Set Decompositions , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[9]  Christopher Ré,et al.  Event queries on correlated probabilistic streams , 2008, SIGMOD Conference.

[10]  Dan Olteanu,et al.  10106 Worlds and Beyond: Efficient Representation and Processing of Incomplete Information , 2007, ICDE.

[11]  Rina Dechter,et al.  Bucket elimination: A unifying framework for probabilistic inference , 1996, UAI.

[12]  Jennifer Widom,et al.  Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[13]  Frank Jensen,et al.  Optimal junction Trees , 1994, UAI.

[14]  Wei Hong,et al.  Approximate Data Collection in Sensor Networks using Probabilistic Models , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[15]  Samuel Madden,et al.  MauveDB: supporting model-based user views in database systems , 2006, SIGMOD Conference.

[16]  Samuel Madden,et al.  Using Probabilistic Models for Data Management in Acquisitional Environments , 2005, CIDR.

[17]  Susanne E. Hambrusch,et al.  The Orion Uncertain Data Management System , 2008, COMAD.

[18]  Susanne E. Hambrusch,et al.  Database Support for Probabilistic Attributes and Tuples , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[19]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[20]  Nevin L. Zhang,et al.  A simple approach to Bayesian network computations , 1994 .