Load Shedding in a Data Stream Manager

A Data Stream Manager accepts push-based inputs from a set of data sources, processes these inputs with respect to a set of standing queries, and produces outputs based on Quality-of-Service (QoS) specifications. When input rates exceed system capacity, the system will become overloaded and latency will deteriorate. Under these conditions, the system will shed load, thus degrading the answer, in order to improve the observed latency of the results. This paper examines a technique for dynamically inserting and removing drop operators into query plans as required by the current load. We examine two types of drops: the first drops a fraction of the tuples in a randomized fashion, and the second drops tuples based on the importance of their content. We address the problems of determining when load shedding is needed, where in the query plan to insert drops, and how much of the load should be shed at that point in the plan. We describe efficient solutions and present experimental evidence that they can bring the system back into the useful operating range with minimal degradation in answer quality.

[1]  Ying Xing,et al.  Scalable Distributed Stream Processing , 2003, CIDR.

[2]  Michael Stonebraker,et al.  Load Shedding on Data Streams , 2003 .

[3]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[4]  Johannes Gehrke,et al.  Query Processing in Sensor Networks , 2003, CIDR.

[5]  S. Muthukrishnan,et al.  Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries , 2001, VLDB.

[6]  Peter J. Haas,et al.  The New Jersey Data Reduction Report , 1997 .

[7]  David L. Tennenhouse,et al.  Collaborative load shedding for media-based applications , 1994, 1994 Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[8]  Jeffrey F. Naughton,et al.  Evaluating window joins over unbounded streams , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[9]  Abhinandan Das,et al.  Approximate join processing over data streams , 2003, SIGMOD '03.

[10]  Divesh Srivastava,et al.  On computing correlated aggregates over continual data streams , 2001, SIGMOD '01.

[11]  Michael Stonebraker,et al.  Aurora: a data stream management system , 2003, SIGMOD '03.

[12]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[13]  Michael Stonebraker,et al.  Operator Scheduling in a Data Stream Manager , 2003, VLDB.

[14]  Cui-Qing Yang,et al.  A taxonomy for congestion control algorithms in packet switching networks , 1995, IEEE Netw..

[15]  Michael Stonebraker,et al.  Aurora: a new model and architecture for data stream management , 2003, The VLDB Journal.