Operator fission for load balancing in distributed heterogeneous data stream processing systems

Distributed data stream processing allows to optimize resource consumption. A query's operators can be executed by several systems. The placement of filter or aggregate operators near the data source omits unnecessary data transfer. The operator placement decision is a complex problem. In certain scenarios the goal is not only overall minimization of e.g. resource consumption but an evenly distributed load. We propose an operator fission algorithm, that works on the basis of an initial operator placement. The algorithm selects certain operators from the set of operators that allow fission for parallel execution by multiple systems. Load is thus divided between processors in a more fine-grained way, resulting in lower maximum load and lower load variance. We present and evaluate three different variants of the algorithm to allow tuning the trade-off between optimization time and result quality.

[1]  Klaus Meyer-Wegener,et al.  Efficient and cost-aware operator placement in heterogeneous stream-processing environments , 2011, DEBS '11.

[2]  Russell C. Eberhart,et al.  A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.

[3]  Kun-Lung Wu,et al.  Auto-parallelizing stateful distributed streaming applications , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[4]  Robert Grimm,et al.  A catalog of stream processing optimizations , 2014, ACM Comput. Surv..

[5]  Luca P. Carloni,et al.  Flexible filters: load balancing through backpressure for stream programs , 2009, EMSOFT '09.

[6]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[7]  Michael I. Gordon,et al.  Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.

[8]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[9]  Shinn-Ying Ho,et al.  OPSO: Orthogonal Particle Swarm Optimization and Its Application to Task Assignment Problems , 2008, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[10]  Kun-Lung Wu,et al.  Elastic scaling of data parallel operators in stream processing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[11]  Rolf Wanka,et al.  Social interaction in particle swarm optimization, the ranked FIPS, and adaptive multi-swarms , 2008, GECCO '08.

[12]  Peng-Yeng Yin,et al.  A hybrid particle swarm optimization algorithm for optimal task assignment in distributed systems , 2006, Comput. Stand. Interfaces.