Providing resiliency to load variations in distributed stream processing

Scalability in stream processing systems can be achieved by using a cluster of computing devices. The processing burden can, thus, be distributed among the nodes by partitioning the query graph. The specific operator placement plan can have a huge impact on performance. Previous work has focused on how to move query operators dynamically in reaction to load changes in order to keep the load balanced. Operator movement is too expensive to alleviate short-term bursts; moreover, some systems do not support the ability to move operators dynamically. In this paper, we develop algorithms for selecting an operator placement plan that is resilient to changes in load. In other words, we assume that operators cannot move, therefore, we try to place them in such a way that the resulting system will be able to withstand the largest set of input rate combinations. We call this a resilient placement.This paper first formalizes the problem for operators that exhibit linear load characteristics (e.g., filter, aggregate), and introduces a resilient placement algorithm. We then show how we can extend our algorithm to take advantage of additional workload information (such as known minimum input stream rates). We further show how this approach can be extended to operators that exhibit non-linear load characteristics (e.g., join). Finally, we present prototype- and simulation-based experiments that quantify the benefits of our approach over existing techniques using real network traffic traces.

[1]  Ali R. Hurson,et al.  Scheduling and Load Balancing in Parallel and Distributed Systems , 1995 .

[2]  Ying Xing,et al.  Scalable Distributed Stream Processing , 2003, CIDR.

[3]  Martin G. Everett,et al.  Dynamic Load-Balancing for Parallel Adaptive Unstructured Meshes , 1997, PP.

[4]  David S. Johnson,et al.  Approximation Algorithms for Bin-Packing — An Updated Survey , 1984 .

[5]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[6]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[7]  Ying Xing,et al.  Dynamic load distribution in the Borealis stream processor , 2005, 21st International Conference on Data Engineering (ICDE'05).

[8]  Nesa L'abbe Wu,et al.  Linear programming and extensions , 1981 .

[9]  Ian H. Sloan,et al.  Multiple Integrals in Many Dimensions , 1997 .

[10]  Michael Stonebraker,et al.  Contract-Based Load Management in Federated Distributed Systems , 2004, NSDI.

[11]  Minos N. Garofalakis,et al.  Multi-dimensional resource scheduling for parallel queries , 1996, SIGMOD '96.

[12]  R. Diekman,et al.  Load balancing strategies for distributed memory machines , 2000 .

[13]  Michael Stonebraker,et al.  Aurora: a data stream management system , 2003, SIGMOD '03.

[14]  Joseph M. Hellerstein,et al.  Flux: an adaptive partitioning operator for continuous query systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[15]  A. Peressini,et al.  The Mathematics Of Nonlinear Programming , 1988 .

[16]  Harald Niederreiter,et al.  Random number generation and Quasi-Monte Carlo methods , 1992, CBMS-NSF regional conference series in applied mathematics.

[17]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[18]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[19]  Robert van Engelen,et al.  Graph Partitioning for High Performance Scienti c Simulations , 2000 .

[20]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .