Discussion and analysis of the distributed uncertain database systems ranking

Large databases with uncertainty became more common in many applications. Ranking queries are essential tools to process these databases and return only the most relevant answers of a query, based on a scoring function. Many approaches were proposed to study and analyze the problem of efficiently answering such ranking queries. Managing distributed uncertain database is also an important issue. In fact ranking queries in such systems are an open challenge. The main objective of this paper is to discuss ranking in distributed uncertain database along with its issued problems. Starting with uncertain data representation, query processing and query types in such systems are discussed along with their challenges and open research area. Top-k query is presented with its properties, as a ranking technique in uncertain data environment, mentioning distributed top-k and distributed ranking problems.

[1]  Jeffrey Scott Vitter,et al.  Efficient join processing over uncertain data , 2006, CIKM '06.

[2]  Christian Böhm,et al.  ProVeR: Probabilistic Video Retrieval using the Gauss-Tree , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3]  Adnan Darwiche,et al.  Functional Treewidth: Bounding Complexity in the Presence of Functional Dependencies , 2006, SAT.

[4]  Serge Abiteboul,et al.  On the representation and querying of sets of possible worlds , 1987, SIGMOD '87.

[5]  Dan Suciu,et al.  The dichotomy of conjunctive queries on probabilistic structures , 2006, PODS '07.

[6]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[7]  Feifei Li,et al.  Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Trans. Knowl. Data Eng..

[8]  Renée J. Miller,et al.  ConQuer: efficient management of inconsistent databases , 2005, SIGMOD '05.

[9]  Feifei Li,et al.  Semantics of Ranking Queries for Probabilistic Data and Expected Ranks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[10]  Lise Getoor,et al.  An Introduction to Probabilistic Graphical Models for Relational Data , 2006, IEEE Data Eng. Bull..

[11]  Kevin Chen-Chuan Chang,et al.  Probabilistic top-k and ranking-aggregate queries , 2008, TODS.

[12]  Jian Pei,et al.  Ranking queries on uncertain data: a probabilistic threshold approach , 2008, SIGMOD Conference.

[13]  Prithviraj Sen,et al.  Representing and Querying Correlated Tuples in Probabilistic Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[14]  Rajeev Rastogi,et al.  Independence is good: dependency-based histogram synopses for high-dimensional data , 2001, SIGMOD '01.

[15]  Jennifer Widom,et al.  Databases with uncertainty and lineage , 2008, The VLDB Journal.

[16]  Adnan Darwiche,et al.  A differential approach to inference in Bayesian networks , 2000, JACM.

[17]  Judea Pearl,et al.  Causal networks: semantics and expressiveness , 2013, UAI.

[18]  Hans-Peter Kriegel,et al.  Probabilistic Similarity Join on Uncertain Data , 2006, DASFAA.

[19]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[20]  Dan Suciu,et al.  Management of probabilistic data: foundations and challenges , 2007, PODS '07.

[21]  Xi Zhang,et al.  On the semantics and evaluation of top-k queries in probabilistic databases , 2008, ICDE Workshops.

[22]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[23]  Dan Olteanu,et al.  Fast and Simple Relational Processing of Uncertain Data , 2007, 2008 IEEE 24th International Conference on Data Engineering.

[24]  Val Tannen,et al.  Models for Incomplete and Probabilistic Information , 2006, IEEE Data Eng. Bull..

[25]  FuhrNorbert,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997 .

[26]  Susanne E. Hambrusch,et al.  Indexing Uncertain Categorical Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[27]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[28]  Christopher Ré,et al.  Event queries on correlated probabilistic streams , 2008, SIGMOD Conference.

[29]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[30]  Christopher Ré,et al.  Applications of Probabilistic Constraints , 2007 .

[31]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[32]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[33]  Jennifer Widom,et al.  Making Aggregation Work in Uncertain and Probabilistic Databases , 2011, IEEE Transactions on Knowledge and Data Engineering.

[34]  Feifei Li,et al.  Ranking distributed probabilistic data , 2009, SIGMOD Conference.

[35]  Jian Pei,et al.  Efficiently Answering Probabilistic Threshold Top-k Queries on Uncertain Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[36]  Hans-Peter Kriegel,et al.  Probabilistic Nearest-Neighbor Query on Uncertain Objects , 2007, DASFAA.

[37]  Dan Suciu,et al.  A formal analysis of information disclosure in data exchange , 2004, SIGMOD '04.

[38]  Dan Suciu,et al.  The Boundary Between Privacy and Utility in Data Publishing , 2007, VLDB.

[39]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[40]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[41]  Dan Olteanu,et al.  10106 Worlds and Beyond: Efficient Representation and Processing of Incomplete Information , 2007, ICDE.

[42]  Christian Böhm,et al.  Probabilistic Ranking Queries on Gaussians , 2006, 18th International Conference on Scientific and Statistical Database Management (SSDBM'06).

[43]  Dan Olteanu,et al.  From complete to incomplete information and back , 2007, SIGMOD '07.

[44]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[45]  Yufei Tao,et al.  Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions , 2005, VLDB.

[46]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[47]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[48]  Rahul Gupta,et al.  Creating probabilistic databases from information extraction models , 2006, VLDB.

[49]  T. S. Jayram,et al.  Efficient allocation algorithms for OLAP over imprecise data , 2006, VLDB.