Ranking distributed uncertain database systems: Discussion and analysis

Large databases with uncertainty became more common in many applications. Ranking queries are essential tools to process these databases and return only the most relevant answers of a query, based on a scoring function. Many approaches were proposed to study and analyze the problem of efficiently answering such ranking queries. Managing distributed uncertain database is also an important issue. In fact ranking queries in such systems are an open challenge. The main objective of this paper is to discuss ranking in distributed uncertain database along with its issued problems. Starting with uncertain data representation, query processing and query types in such systems are discussed along with their challenges and open research area. Top-k query is presented with its properties, as a ranking technique in uncertain data environment, mentioning distributed top-k and distributed ranking problems.

[1]  Feifei Li,et al.  Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Transactions on Knowledge and Data Engineering.

[2]  Lise Getoor,et al.  An Introduction to Probabilistic Graphical Models for Relational Data , 2006, IEEE Data Eng. Bull..

[3]  Jeffrey Scott Vitter,et al.  Efficient join processing over uncertain data , 2006, CIKM '06.

[4]  Prithviraj Sen,et al.  Representing and Querying Correlated Tuples in Probabilistic Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[5]  Rajeev Rastogi,et al.  Independence is good: dependency-based histogram synopses for high-dimensional data , 2001, SIGMOD '01.

[6]  Val Tannen,et al.  Models for Incomplete and Probabilistic Information , 2006, IEEE Data Eng. Bull..

[7]  FuhrNorbert,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997 .

[8]  Feifei Li,et al.  Semantics of Ranking Queries for Probabilistic Data and Expected Ranks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[9]  Dan Suciu,et al.  The Boundary Between Privacy and Utility in Data Publishing , 2007, VLDB.

[10]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[11]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[12]  Dan Suciu,et al.  A formal analysis of information disclosure in data exchange , 2004, SIGMOD '04.

[13]  Yufei Tao,et al.  Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions , 2005, VLDB.

[14]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[15]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[16]  Dan Olteanu,et al.  10106 Worlds and Beyond: Efficient Representation and Processing of Incomplete Information , 2007, ICDE.

[17]  Adnan Darwiche,et al.  A differential approach to inference in Bayesian networks , 2000, JACM.

[18]  Hans-Peter Kriegel,et al.  Probabilistic Similarity Join on Uncertain Data , 2006, DASFAA.

[19]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[20]  Dan Olteanu,et al.  Fast and Simple Relational Processing of Uncertain Data , 2007, 2008 IEEE 24th International Conference on Data Engineering.

[21]  Christopher Ré,et al.  Event queries on correlated probabilistic streams , 2008, SIGMOD Conference.

[22]  Hans-Peter Kriegel,et al.  Probabilistic Nearest-Neighbor Query on Uncertain Objects , 2007, DASFAA.

[23]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[24]  Susanne E. Hambrusch,et al.  Indexing Uncertain Categorical Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[25]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[26]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[27]  Serge Abiteboul,et al.  On the Representation and Querying of Sets of Possible Worlds , 1991, Theor. Comput. Sci..

[28]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[29]  Dan Suciu,et al.  Management of probabilistic data: foundations and challenges , 2007, PODS '07.

[30]  Xi Zhang,et al.  On the semantics and evaluation of top-k queries in probabilistic databases , 2008, ICDE Workshops.

[31]  Rahul Gupta,et al.  Creating probabilistic databases from information extraction models , 2006, VLDB.

[32]  T. S. Jayram,et al.  Efficient allocation algorithms for OLAP over imprecise data , 2006, VLDB.

[33]  Christopher Ré,et al.  Applications of Probabilistic Constraints , 2007 .

[34]  Jennifer Widom,et al.  Making Aggregation Work in Uncertain and Probabilistic Databases , 2011, IEEE Transactions on Knowledge and Data Engineering.

[35]  Leslie G. Valiant,et al.  The Complexity of Enumeration and Reliability Problems , 1979, SIAM J. Comput..

[36]  Christian Böhm,et al.  Probabilistic Ranking Queries on Gaussians , 2006, 18th International Conference on Scientific and Statistical Database Management (SSDBM'06).

[37]  Dan Olteanu,et al.  From complete to incomplete information and back , 2007, SIGMOD '07.

[38]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[39]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[40]  Feifei Li,et al.  Ranking distributed probabilistic data , 2009, SIGMOD Conference.

[41]  Jian Pei,et al.  Efficiently Answering Probabilistic Threshold Top-k Queries on Uncertain Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[42]  Adnan Darwiche,et al.  Functional Treewidth: Bounding Complexity in the Presence of Functional Dependencies , 2006, SAT.

[43]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[44]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[45]  Renée J. Miller,et al.  ConQuer: efficient management of inconsistent databases , 2005, SIGMOD '05.

[46]  Dan Suciu,et al.  The dichotomy of conjunctive queries on probabilistic structures , 2006, PODS.

[47]  Dan Olteanu,et al.  $${10^{(10^{6})}}$$ worlds and beyond: efficient representation and processing of incomplete information , 2006, 2007 IEEE 23rd International Conference on Data Engineering.

[48]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[49]  Kevin Chen-Chuan Chang,et al.  Probabilistic top-k and ranking-aggregate queries , 2008, TODS.

[50]  Jian Pei,et al.  Ranking queries on uncertain data: a probabilistic threshold approach , 2008, SIGMOD Conference.

[51]  Jennifer Widom,et al.  Databases with uncertainty and lineage , 2008, The VLDB Journal.