论文信息 - Top-k Best Probability Queries on Probabilistic Data

Top-k Best Probability Queries on Probabilistic Data

There has been much interest in answering top-k queries on probabilistic data in various applications such as market analysis, personalised services, and decision making. In relation to probabilistic data, the most common problem in answering top-k queries is selecting the semantics of results according to their scores and top-k probabilities. In this paper, we propose a novel top-k best probability query to obtain results which are not only the best top-k scores but also the best top-k probabilities. We also introduce an efficient algorithm for top-k best probability queries without requiring the user's defined threshold. Then, the top-k best probability answer is analysed, which satisfies the semantic ranking properties of queries [3,18] on uncertain data. The experimental studies are tested with both the real data to verify the effectiveness of the top-k best probability queries and the efficiency of our algorithm.

Jinli Cao | Trieu Minh Nhut Le | Jinli Cao

[1] Kevin Chen-Chuan Chang,et al. Probabilistic top-k and ranking-aggregate queries , 2008, TODS.

[2] Kenneth Lange,et al. Numerical analysis for statisticians , 1999 .

[3] Jian Pei,et al. Ranking queries on uncertain data: a probabilistic threshold approach , 2008, SIGMOD Conference.

[4] Vipin Kumar,et al. Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[5] Ihab F. Ilyas,et al. A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[6] Jian Pei,et al. Managing Uncertain Data: Probabilistic Approaches , 2008, 2008 The Ninth International Conference on Web-Age Information Management.

[7] Xi Zhang,et al. On the semantics and evaluation of top-k queries in probabilistic databases , 2008, ICDE Workshops.

[8] Jeffrey Xu Yu,et al. Sliding-window top-k queries on uncertain streams , 2008, The VLDB Journal.

[9] Jan Chomicki,et al. Skyline with Presorting: Theory and Optimizations , 2005, Intelligent Information Systems.

[10] Mohamed A. Soliman,et al. Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[11] Jean-Daniel Zucker,et al. Abstraction, Reformulation and Approximation, 6th International Symposium, SARA 2005, Airth Castle, Scotland, UK, July 26-29, 2005, Proceedings , 2005, SARA.

[12] Bin Jiang,et al. Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[13] Philip S. Yu,et al. A Survey of Uncertain Data Algorithms and Applications , 2009, IEEE Transactions on Knowledge and Data Engineering.

[14] Stanley B. Zdonik,et al. Top-k queries on uncertain data: on score distribution and typical answers , 2009, SIGMOD Conference.

[15] Mikhail J. Atallah,et al. Computing all skyline probabilities for uncertain data , 2009, PODS.

[16] Lise Getoor,et al. Learning Probabilistic Relational Models , 1999, IJCAI.

[17] Xi Zhang,et al. Semantics and evaluation of top-k queries in probabilistic databases , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[18] Wilfred Ng,et al. Robust Ranking of Uncertain Data , 2011, DASFAA.

[19] Feifei Li,et al. Semantics of Ranking Queries for Probabilistic Data and Expected Ranks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[20] Jennifer Widom,et al. Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[21] Chengqi Zhang,et al. A Probabilistic Data Model and Its Semantics , 2004 .

[22] Bernhard Seeger,et al. An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[23] Wenfei Fan,et al. Conditional functional dependencies for capturing data inconsistencies , 2008, TODS.

[24] Chengqi Zhang,et al. A Probability Data Model and its Semantics , 2003, J. Res. Pract. Inf. Technol..

[25] Daniel T. Larose,et al. Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[26] Feifei Li,et al. Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Transactions on Knowledge and Data Engineering.