Continuous ranking on uncertain streams

Data uncertainty widely exists in many web applications, financial applications and sensor networks. Ranking queries that return a number of tuples with maximal ranking scores are important in the field of database management. Most existing work focuses on proposing static solutions for various ranking semantics over uncertain data. Our focus is to handle continuous ranking queries on uncertain data streams: testing each new tuple to output highly-ranked tuples. The main challenge comes from not only the fact that the possible world space will grow exponentially when new tuples arrive, but also the requirement for low space- and time-complexity to adapt to the streaming environments. This paper aims at handling continuous ranking queries on uncertain data streams. We first study how to handle this issue exactly, then we propose a novel method (exponential sampling) to estimate the expected rank of a tuple with high quality. Analysis in theory and detailed experimental reports evaluate the proposed methods.

[1]  Ming Gao,et al.  Handling ER-topk Query on Uncertain Streams , 2011, DASFAA.

[2]  Aoying Zhou,et al.  Computing rarity on uncertain data , 2011, Science China Information Sciences.

[3]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[4]  Andrew McGregor,et al.  Conditioning and aggregating uncertain data streams , 2010, Proc. VLDB Endow..

[5]  Anna Liu,et al.  PODS: a new model and processing algorithms for uncertain data streams , 2010, SIGMOD Conference.

[6]  Philip S. Yu,et al.  A Framework for Clustering Uncertain Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7]  Jeffrey Xu Yu,et al.  Probabilistic Skyline Operator over Sliding Windows , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[8]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[9]  Xueqi Cheng,et al.  Learning multiple metrics for ranking , 2011, Frontiers of Computer Science in China.

[10]  T. S. Jayram,et al.  Efficient aggregation algorithms for probabilistic data , 2007, SODA '07.

[11]  Xi Zhang,et al.  Semantics and evaluation of top-k queries in probabilistic databases , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[12]  Feifei Li,et al.  Efficient Threshold Monitoring for Distributed Probabilistic Data , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[13]  Wilfred Ng,et al.  Robust Ranking of Uncertain Data , 2011, DASFAA.

[14]  Feifei Li,et al.  Finding frequent items in probabilistic data , 2008, SIGMOD Conference.

[15]  Feifei Li,et al.  Semantics of Ranking Queries for Probabilistic Data and Expected Ranks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[16]  Jian Pei,et al.  Continuously monitoring top-k uncertain data streams: a probabilistic threshold method , 2009, Distributed and Parallel Databases.

[17]  Dan Suciu,et al.  Management of probabilistic data: foundations and challenges , 2007, PODS '07.

[18]  Charu C. Aggarwal,et al.  Managing and Mining Uncertain Data , 2009, Advances in Database Systems.

[19]  Ihab F. Ilyas,et al.  Ranking with Uncertain Scores , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[20]  Jian Li,et al.  A unified approach to ranking in probabilistic databases , 2009, The VLDB Journal.

[21]  Jian Pei,et al.  Ranking queries on uncertain data: a probabilistic threshold approach , 2008, SIGMOD Conference.

[22]  Jeffrey Xu Yu,et al.  Sliding-window top-k queries on uncertain streams , 2008, The VLDB Journal.

[23]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[24]  Dan Olteanu,et al.  From complete to incomplete information and back , 2007, SIGMOD '07.

[25]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[26]  Liangxiao Jiang,et al.  Learning random forests for ranking , 2011, Frontiers of Computer Science in China.

[27]  Russ Bubley,et al.  Randomized algorithms , 1995, CSUR.

[28]  Stanley B. Zdonik,et al.  Top-k queries on uncertain data: on score distribution and typical answers , 2009, SIGMOD Conference.

[29]  Graham Cormode,et al.  Sketching probabilistic data streams , 2007, SIGMOD '07.

[30]  Rajeev Motwani,et al.  Randomized Algorithms: Tail Inequalities , 1995 .