An optimal strategy for monitoring top-k queries in streaming windows

Continuous top-k queries, which report a certain number (k) of top preferred objects from data streams, are important for a broad class of real-time applications, ranging from financial analysis to network traffic monitoring. Existing solutions for tackling this problem aim to reduce the computational costs by incrementally updating the top-k results upon each window slide. However, they all suffer from the performance bottleneck of periodically requiring a complete recomputation of the top-k results from scratch. Such an operation is not only computationally expensive but also causes significant memory consumption, as it requires keeping all objects alive in the query window. To solve this problem, we identify the "Minimal Top-K candidate set" (MTK), namely the subset of stream objects that is both necessary and sufficient for continuous top-k monitoring. Based on this theoretical foundation, we design the MinTopk algorithm that elegantly maintains MTK and thus eliminates the need for recomputation. We prove the optimality of the MinTopk algorithm in both CPU and memory utilization for continuous top-k monitoring. Our experimental study shows that both the efficiency and scalability of our proposed algorithm is clearly superior to the state-of-the-art solutions.

[1]  John R. Smith,et al.  The onion technique: indexing for linear optimization queries , 2000, SIGMOD '00.

[2]  Christopher Olston,et al.  Distributed top-k monitoring , 2003, SIGMOD '03.

[3]  Yuguo Chen,et al.  Efficient maintenance of materialized top-k views , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[4]  Matthew O. Ward,et al.  A Shared Execution Strategy for Multiple Pattern Mining Requests over Streaming Data , 2009, Proc. VLDB Endow..

[5]  Walid G. Aref,et al.  Joining Ranked Inputs in Practice , 2002, VLDB.

[6]  Seung-won Hwang,et al.  Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[7]  Matthew O. Ward,et al.  Neighbor-based pattern detection for windows over streaming data , 2009, EDBT '09.

[8]  Kyriakos Mouratidis,et al.  Continuous monitoring of top-k queries over sliding windows , 2006, SIGMOD Conference.

[9]  Gerhard Weikum,et al.  Top-k Query Evaluation with Probabilistic Guarantees , 2004, VLDB.

[10]  Jeffrey Xu Yu,et al.  Sliding-window top-k queries on uncertain streams , 2008, The VLDB Journal.

[11]  Vagelis Hristidis,et al.  Algorithms and applications for answering ranked queries using ranked views , 2003, The VLDB Journal.

[12]  Michael J. Franklin,et al.  On-the-fly sharing for streamed aggregation , 2006, SIGMOD Conference.

[13]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[14]  Luis Gravano,et al.  Optimizing top-k selection queries over multimedia repositories , 2004, IEEE Transactions on Knowledge and Data Engineering.

[15]  Karl Aberer,et al.  Evaluating top-k queries over incomplete data streams , 2009, CIKM.

[16]  Walid G. Aref,et al.  Rank-aware query optimization , 2004, SIGMOD '04.

[17]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.