Window-aware load shedding for aggregation queries over data streams

Data stream management systems may be subject to higher input rates than their resources can handle. When overloaded, the system must shed load in order to maintain low-latency query results. In this paper, we describe a load shedding technique for queries consisting of one or more aggregate operators with sliding windows. We introduce a new type of drop operator, called a "Window Drop". This operator is aware of the window properties (i.e., window size and window slide) of its downstream aggregate operators in the query plan. Accordingly, it logically divides the input stream into windows and probabilistically decides which windows to drop. This decision is further encoded into tuples by marking the ones that are disallowed from starting new windows. Unlike earlier approaches, our approach preserves integrity of windows throughout a query plan, and always delivers subsets of original query answers with minimal degradation in result quality.

[1]  Ying Xing,et al.  Distributed operation in the Borealis stream processing engine , 2005, SIGMOD '05.

[2]  Jennifer Widom,et al.  Characterizing memory requirements for queries over continuous data streams , 2002, PODS '02.

[3]  Michael Stonebraker,et al.  Linear Road: A Stream Data Management Benchmark , 2004, VLDB.

[4]  Divesh Srivastava,et al.  On computing correlated aggregates over continual data streams , 2001, SIGMOD '01.

[5]  Frederick Reiss,et al.  Data Triage: an adaptive architecture for load shedding in TelegraphCQ , 2005, 21st International Conference on Data Engineering (ICDE'05).

[6]  Jennifer Widom,et al.  Memory-Limited Execution of Windowed Stream Joins , 2004, VLDB.

[7]  Rajeev Rastogi,et al.  Processing complex aggregate queries over data streams , 2002, SIGMOD '02.

[8]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[9]  S. Muthukrishnan,et al.  Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries , 2001, VLDB.

[10]  Minos N. Garofalakis,et al.  Approximate Query Processing: Taming the TeraBytes , 2001, VLDB.

[11]  Rajeev Motwani,et al.  Load shedding for aggregation queries over data streams , 2004, Proceedings. 20th International Conference on Data Engineering.

[12]  Kyuseok Shim,et al.  Approximate query processing using wavelets , 2001, The VLDB Journal.

[13]  Stanley B. Zdonik,et al.  Dealing with Overload in Distributed Stream Processing Systems , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[14]  David Maier,et al.  Semantics and evaluation techniques for window aggregates in data streams , 2005, SIGMOD '05.

[15]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[16]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[17]  Abhinandan Das,et al.  Approximate join processing over data streams , 2003, SIGMOD '03.

[18]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[19]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[20]  David Maier,et al.  Exploiting Punctuation Semantics in Continuous Data Streams , 2003, IEEE Trans. Knowl. Data Eng..

[21]  Carlo Zaniolo,et al.  Query Languages and Data Models for Database Sequences and Data Streams , 2004, VLDB.

[22]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[23]  Jeffrey F. Naughton,et al.  Evaluating window joins over unbounded streams , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[24]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[25]  Mark J. Buller,et al.  Confidence-based data management for personal area sensor networks , 2004, DMSN '04.

[26]  Michael Stonebraker,et al.  Aurora: a data stream management system , 2003, SIGMOD '03.

[27]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[28]  Nesime Tatbul,et al.  Window-aware Load Shedding for Data Streams , 2007 .

[29]  Jeffrey F. Naughton,et al.  Static optimization of conjunctive queries with sliding windows over infinite streams , 2004, SIGMOD '04.