Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing

In distributed stream processing environments, large numbers of continuous queries are distributed onto multiple servers. When one or more of these servers become overloaded due to bursty data arrival, excessive load needs to be shed in order to preserve low latency for the query results. Because of the load dependencies among the servers, load shedding decisions on these servers must be well-coordinated to achieve end-to-end control on the output quality. In this paper, we model the distributed load shedding problem as a linear optimization problem, for which we propose two alternative solution approaches: a solver-based centralized approach, and a distributed approach based on metadata aggregation and propagation, whose centralized implementation is also available. Both of our solutions are based on generating a series of load shedding plans in advance, to be used under certain input load conditions. We have implemented our techniques as part of the Borealis distributed stream processing system. We present experimental results from our prototype implementation showing the performance of these techniques under different input and query workloads.

[1]  Randy H. Katz,et al.  OverQoS: An Overlay Based Architecture for Enhancing Internet QoS , 2004, NSDI.

[2]  Jeffrey F. Naughton,et al.  Static optimization of conjunctive queries with sliding windows over infinite streams , 2004, SIGMOD '04.

[3]  Frederick Reiss,et al.  Data Triage: an adaptive architecture for load shedding in TelegraphCQ , 2005, 21st International Conference on Data Engineering (ICDE'05).

[4]  Ratul Mahajan,et al.  Controlling high bandwidth aggregates in the network , 2002, CCRV.

[5]  David L. Black,et al.  An Architecture for Differentiated Service , 1998 .

[6]  Stanley B. Zdonik,et al.  Window-aware load shedding for aggregation queries over data streams , 2006, VLDB.

[7]  Scott Shenker,et al.  Integrated Services in the Internet Architecture : an Overview Status of this Memo , 1994 .

[8]  Timos K. Sellis,et al.  Parametric query optimization , 1992, The VLDB Journal.

[9]  Farnam Jahanian,et al.  Salamander: A Push-based Distribution Substrate for Internet Applications , 1997, USENIX Symposium on Internet Technologies and Systems.

[10]  S. Sudarshan,et al.  Parametric Query Optimization for Linear and Piecewise Linear Cost Functions , 2002, VLDB.

[11]  Sumit Ganguly,et al.  Design and Analysis of Parametric Query Optimization Algorithms , 1998, VLDB.

[12]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[13]  Van Jacobson,et al.  Congestion avoidance and control , 1988, SIGCOMM '88.

[14]  Rajeev Motwani,et al.  Load shedding for aggregation queries over data streams , 2004, Proceedings. 20th International Conference on Data Engineering.

[15]  Hanan Samet,et al.  The Quadtree and Related Hierarchical Data Structures , 1984, CSUR.

[16]  Krithi Ramamritham,et al.  An Efficient and Resilient Approach to Filtering and Disseminating Streaming Data , 2003, VLDB.

[17]  Navendu Jain,et al.  Adaptive Control of Extreme-scale Stream Processing Systems , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[18]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[19]  Margo I. Seltzer,et al.  Network-Aware Operator Placement for Stream-Processing Systems , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[20]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[21]  Eric A. Brewer,et al.  Highly available, fault-tolerant, parallel dataflows , 2004, SIGMOD '04.

[22]  Zheng Wang,et al.  An Architecture for Differentiated Services , 1998, RFC.