Dynamic load distribution in the Borealis stream processor

Distributed and parallel computing environments are becoming cheap and commonplace. The availability of large numbers of CPU's makes it possible to process more data at higher speeds. Stream-processing systems are also becoming more important, as broad classes of applications require results in real-time. Since load can vary in unpredictable ways, exploiting the abundant processor cycles requires effective dynamic load distribution techniques. Although load distribution has been extensively studied for the traditional pull-based systems, it has not yet been fully studied in the context of push-based continuous query processing. In this paper, we present a correlation based load distribution algorithm that aims at avoiding overload and minimizing end-to-end latency by minimizing load variance and maximizing load correlation. While finding the optimal solution for such a problem is NP-hard, our greedy algorithm can find reasonable solutions in polynomial time. We present both a global algorithm for initial load distribution and a pair-wise algorithm for dynamic load migration.

[1]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[2]  Ali R. Hurson,et al.  Scheduling and Load Balancing in Parallel and Distributed Systems , 1995 .

[3]  Michael Stonebraker,et al.  Contract-Based Load Management in Federated Distributed Systems , 2004, NSDI.

[4]  Greg J. Regnier,et al.  TCP performance re-visited , 2003, 2003 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS 2003..

[5]  R. Diekman,et al.  Load balancing strategies for distributed memory machines , 2000 .

[6]  Ying Xing Load Distribution for Distributed Stream Processing , 2004, EDBT Workshops.

[7]  Yung-Terng Wang,et al.  Load Sharing in Distributed Systems , 1985, IEEE Transactions on Computers.

[8]  A. Adas,et al.  Traffic models in broadband networks , 1997, IEEE Commun. Mag..

[9]  Francis C. M. Lau,et al.  Nearest-neighbor algorithms for load-balancing in parallel computers , 1995, Concurr. Pract. Exp..

[10]  Jeffrey S. Chase,et al.  Trapeze / IP : TCP / IP at Near-Gigabit Speeds , 1999 .

[11]  Ying Xing,et al.  Scalable Distributed Stream Processing , 2003, CIDR.

[12]  Martin G. Everett,et al.  Dynamic Load-Balancing for Parallel Adaptive Unstructured Meshes , 1997, PPSC.

[13]  Martin G. Everett,et al.  Parallel dynamic load-balancing for adaptive unstructured meshes , 1997, Parallel CFD.

[14]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[15]  Forouzan Golshani,et al.  Proceedings of the Eighth International Conference on Data Engineering , 1992 .

[16]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[17]  Vipin Kumar,et al.  Graph partitioning for high-performance scientific simulations , 2003 .

[18]  Walter Willinger,et al.  Self-similarity through high-variability: statistical analysis of ethernet LAN traffic at the source level , 1995, SIGCOMM '95.

[19]  Joseph M. Hellerstein,et al.  Flux: an adaptive partitioning operator for continuous query systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[20]  Robert van Engelen,et al.  Graph Partitioning for High Performance Scienti c Simulations , 2000 .

[21]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .