Top-k Best Probability Queries on Probabilistic Data

There has been much interest in answering top-k queries on probabilistic data in various applications such as market analysis, personalised services, and decision making. In relation to probabilistic data, the most common problem in answering top-k queries is selecting the semantics of results according to their scores and top-k probabilities. In this paper, we propose a novel top-k best probability query to obtain results which are not only the best top-k scores but also the best top-k probabilities. We also introduce an efficient algorithm for top-k best probability queries without requiring the user's defined threshold. Then, the top-k best probability answer is analysed, which satisfies the semantic ranking properties of queries [3,18] on uncertain data. The experimental studies are tested with both the real data to verify the effectiveness of the top-k best probability queries and the efficiency of our algorithm.

[1]  Kevin Chen-Chuan Chang,et al.  Probabilistic top-k and ranking-aggregate queries , 2008, TODS.

[2]  Kenneth Lange,et al.  Numerical analysis for statisticians , 1999 .

[3]  Jian Pei,et al.  Ranking queries on uncertain data: a probabilistic threshold approach , 2008, SIGMOD Conference.

[4]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[5]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[6]  Jian Pei,et al.  Managing Uncertain Data: Probabilistic Approaches , 2008, 2008 The Ninth International Conference on Web-Age Information Management.

[7]  Xi Zhang,et al.  On the semantics and evaluation of top-k queries in probabilistic databases , 2008, ICDE Workshops.

[8]  Jeffrey Xu Yu,et al.  Sliding-window top-k queries on uncertain streams , 2008, The VLDB Journal.

[9]  Jan Chomicki,et al.  Skyline with Presorting: Theory and Optimizations , 2005, Intelligent Information Systems.

[10]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[11]  Jean-Daniel Zucker,et al.  Abstraction, Reformulation and Approximation, 6th International Symposium, SARA 2005, Airth Castle, Scotland, UK, July 26-29, 2005, Proceedings , 2005, SARA.

[12]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[13]  Philip S. Yu,et al.  A Survey of Uncertain Data Algorithms and Applications , 2009, IEEE Transactions on Knowledge and Data Engineering.

[14]  Stanley B. Zdonik,et al.  Top-k queries on uncertain data: on score distribution and typical answers , 2009, SIGMOD Conference.

[15]  Mikhail J. Atallah,et al.  Computing all skyline probabilities for uncertain data , 2009, PODS.

[16]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[17]  Xi Zhang,et al.  Semantics and evaluation of top-k queries in probabilistic databases , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[18]  Wilfred Ng,et al.  Robust Ranking of Uncertain Data , 2011, DASFAA.

[19]  Feifei Li,et al.  Semantics of Ranking Queries for Probabilistic Data and Expected Ranks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[20]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[21]  Chengqi Zhang,et al.  A Probabilistic Data Model and Its Semantics , 2004 .

[22]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[23]  Wenfei Fan,et al.  Conditional functional dependencies for capturing data inconsistencies , 2008, TODS.

[24]  Chengqi Zhang,et al.  A Probability Data Model and its Semantics , 2003, J. Res. Pract. Inf. Technol..

[25]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[26]  Feifei Li,et al.  Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Transactions on Knowledge and Data Engineering.