On the semantics and evaluation of top-k queries in probabilistic databases

We formulate three intuitive semantic properties for top-k queries in probabilistic databases, and propose Global-Topk query semantics which satisfies all of them. We provide a dynamic programming algorithm to evaluate top-k queries under Global-Topk in simple probabilistic relations. For general probabilistic relations, we show a polynomial reduction to the simple case. Our analysis shows that the complexity of query evaluation is linear in k and at most quadratic in database size.

[1]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[2]  Walid G. Aref,et al.  Joining Ranked Inputs in Practice , 2002, VLDB.

[3]  Feifei Li,et al.  Semantics of Ranking Queries for Probabilistic Data and Expected Ranks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[4]  Serge Abiteboul,et al.  Foundations of Databases: The Logical Level , 1995 .

[5]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[6]  Wendy Hui Wang,et al.  The Threshold Algorithm: From Middleware Systems to the Relational Engine , 2007, IEEE Transactions on Knowledge and Data Engineering.

[7]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[8]  Joseph Y. Halpern An Analysis of First-Order Logics of Probability , 1989, IJCAI.

[9]  Venu Govindaraju,et al.  Biometrics Driven Smart Environments: Abstract Framework and Evaluation , 2008, UIC.

[10]  Esteban Zimányi,et al.  Query Evaluation in Probabilistic Relational Databases , 1997, Theor. Comput. Sci..

[11]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  John R. Smith,et al.  Supporting Incremental Join Queries on Ranked Inputs , 2001, VLDB.

[13]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[14]  Kevin Chen-Chuan Chang,et al.  Probabilistic top-k and ranking-aggregate queries , 2008, TODS.

[15]  Jennifer Widom,et al.  Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[16]  Jian Pei,et al.  Ranking queries on uncertain data: a probabilistic threshold approach , 2008, SIGMOD Conference.

[17]  Walid G. Aref,et al.  Supporting top-kjoin queries in relational databases , 2004, The VLDB Journal.

[18]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[19]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[20]  Laks V. S. Lakshmanan,et al.  ProbView: a flexible probabilistic database system , 1997, TODS.

[21]  Dan Olteanu,et al.  World-Set Decompositions: Expressiveness and Efficient Algorithms , 2007, ICDT.

[22]  Feifei Li,et al.  Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Transactions on Knowledge and Data Engineering.

[23]  Sudipto Guha,et al.  Merging the Results of Approximate Match Operations , 2004, VLDB.

[24]  Michael Pittarelli,et al.  The Theory of Probabilistic Databases , 1987, VLDB.

[25]  T. S. Jayram,et al.  OLAP over uncertain and imprecise data , 2007, The VLDB Journal.

[26]  Luis Gravano,et al.  Evaluating top-k queries over web-accessible databases , 2004, TODS.

[27]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[28]  Christoph Koch,et al.  World-set decompositions: Expressiveness and efficient algorithms , 2007, Theor. Comput. Sci..