Sliding-window top-k queries on uncertain streams

Query processing on uncertain data streams has attracted a lot of attentions lately, due to the imprecise nature in the data generated from a variety of streaming applications, such as readings from a sensor network. However, all of the existing works on uncertain data streams study unbounded streams. This paper takes the first step towards the important and challenging problem of answering sliding-window queries on uncertain data streams, with a focus on arguably one of the most important types of queries---top-k queries. The challenge of answering sliding-window top-k queries on uncertain data streams stems from the strict space and time requirements of processing both arriving and expiring tuples in high-speed streams, combined with the difficulty of coping with the exponential blowup in the number of possible worlds induced by the uncertain data model. In this paper, we design a unified framework for processing sliding-window top-k queries on uncertain streams. We show that all the existing top-k definitions in the literature can be plugged into our framework, resulting in several succinct synopses that use space much smaller than the window size, while are also highly efficient in terms of processing time. In addition to the theoretical space and time bounds that we prove for these synopses, we also present a thorough experimental report to verify their practical efficiency on both synthetic and real data.

[1]  Andrew McGregor,et al.  Estimating statistical aggregates on probabilistic data streams , 2007, PODS.

[2]  Feifei Li,et al.  Finding frequent items in probabilistic data , 2008, SIGMOD Conference.

[3]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[4]  Dimitrios Gunopulos,et al.  Answering top-k queries using views , 2006, VLDB.

[5]  Sudipto Guha,et al.  Approximate quantiles and the order of the stream , 2006, PODS.

[6]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[7]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[8]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[9]  Graham Cormode,et al.  Robust lower bounds for communication and stream computation , 2008, Theory Comput..

[10]  Feifei Li,et al.  Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Transactions on Knowledge and Data Engineering.

[11]  Kyriakos Mouratidis,et al.  Continuous monitoring of top-k queries over sliding windows , 2006, SIGMOD Conference.

[12]  Jian Pei,et al.  Efficiently Answering Probabilistic Threshold Top-k Queries on Uncertain Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[13]  Christopher Olston,et al.  Distributed top-k monitoring , 2003, SIGMOD '03.

[14]  Jian Pei,et al.  Efficiently Answering Top-k Typicality Queries on Large Databases , 2007, VLDB.

[15]  Dan Suciu,et al.  The dichotomy of conjunctive queries on probabilistic structures , 2006, PODS.

[16]  Jiawei Han,et al.  Answering top-k queries with multi-dimensional selections: the ranking cube approach , 2006, VLDB.

[17]  Surya Nepal,et al.  Query processing issues in image (multimedia) databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[18]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[19]  Piotr Indyk,et al.  Maintaining stream statistics over sliding windows: (extended abstract) , 2002, SODA '02.

[20]  Graham Cormode,et al.  Time-decaying sketches for sensor data aggregation , 2007, PODC '07.

[21]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[22]  Jiawei Han,et al.  Extracting redundancy-aware top-k patterns , 2006, KDD '06.

[23]  Jian Pei,et al.  Ranking queries on uncertain data: a probabilistic threshold approach , 2008, SIGMOD Conference.

[24]  Graham Cormode,et al.  Sketching probabilistic data streams , 2007, SIGMOD '07.

[25]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[26]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[27]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[28]  T. S. Jayram,et al.  Efficient aggregation algorithms for probabilistic data , 2007, SODA '07.

[29]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[30]  Lukasz Golab,et al.  Sliding Window Query Processing over Data Streams , 2006 .

[31]  T. S. Jayram,et al.  Tight lower bounds for selection in randomly ordered streams , 2008, SODA '08.

[32]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[33]  Philip S. Yu,et al.  A Framework for Clustering Uncertain Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[34]  Graham Cormode,et al.  Exponentially Decayed Aggregates on Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[35]  Philip S. Yu,et al.  A Survey of Uncertain Data Algorithms and Applications , 2009, IEEE Transactions on Knowledge and Data Engineering.

[36]  Ke Yi,et al.  Dynamic Structures for Top- k Queries on Uncertain Data , 2007, ISAAC.