Efficient Operator Placement for Distributed Data Stream Processing Applications

In the last few years, a large number of real-time analytics applications rely on the Data Stream Processing (DSP) so to extract, in a timely manner, valuable information from distributed sources. Moreover, to efficiently handle the increasing amount of data, recent trends exploit the emerging presence of edge/Fog computing resources so to decentralize the execution of DSP applications. Since determining the Optimal DSP Placement (for short, ODP) is an NP-hard problem, we need efficient heuristics that can identify a good application placement on the computing infrastructure in a feasible amount of time, even for large problem instances. In this paper, we present several DSP placement heuristics that consider the heterogeneity of computing and network resources; we divide them in two main groups: model-based and model-free. The former employ different strategies for efficiently solving the ODP model. The latter implement, for the problem at hand, some of the well-known meta-heuristics, namely greedy first-fit, local search, and tabu search. By leveraging on ODP, we conduct a thorough experimental evaluation, aimed to assess the heuristics’ efficiency and efficacy under different configurations of infrastructure size, application topology, and optimization objectives.

[1]  Margo I. Seltzer,et al.  Network-Aware Operator Placement for Stream-Processing Systems , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2]  Ying Li,et al.  Placement Strategies for Internet-Scale Data Stream Systems , 2008, IEEE Internet Computing.

[3]  Song Guo,et al.  A General Communication Cost Optimization Framework for Big Data Stream Processing in Geo-Distributed Data Centers , 2016, IEEE Transactions on Computers.

[4]  Kurt Rothermel,et al.  MCEP: A Mobility-Aware Complex Event Processing System , 2014, ACM Trans. Internet Techn..

[5]  Badrish Chandramouli,et al.  Accurate latency estimation in a distributed event processing system , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[6]  Jian Tang,et al.  A predictive scheduling framework for fast and distributed stream data processing , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[7]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[8]  Boris Koldehofe,et al.  TCEP: Adapting to Dynamic User Environments by Enabling Transitions between Operator Placement Mechanisms , 2018, DEBS.

[9]  Qian Zhu,et al.  Resource Allocation for Distributed Streaming Applications , 2008, 2008 37th International Conference on Parallel Processing.

[10]  Thomas Locher,et al.  Task allocation for distributed stream processing , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[11]  Ibrahim Matta,et al.  BRITE: an approach to universal topology generation , 2001, MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[12]  Valeria Cardellini,et al.  Optimal operator deployment and replication for elastic distributed data stream processing , 2018, Concurr. Comput. Pract. Exp..

[13]  Mohammad Hosseini,et al.  R-Storm: Resource-Aware Scheduling in Storm , 2015, Middleware.

[14]  Rajkumar Buyya,et al.  Latency-Aware Application Module Management for Fog Computing Environments , 2018, ACM Trans. Internet Techn..

[15]  Silvia Bonomi,et al.  Elastic Symbiotic Scaling of Operators and Resources in Stream Processing Systems , 2018, IEEE Transactions on Parallel and Distributed Systems.

[16]  Vincenzo Grassi,et al.  Distributed QoS-aware scheduling in storm , 2015, DEBS.

[17]  Bugra Gedik,et al.  Pipelined fission for stream programs with dynamic selectivity and partitioned state , 2016, J. Parallel Distributed Comput..

[18]  Jie Liu,et al.  Greedy is Good: On Service Tree Placement for In-Network Stream Processing , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[19]  R. Srikant,et al.  Scheduling Storms and Streams in the Cloud , 2015, SIGMETRICS.

[20]  Pavel A. Smirnov,et al.  Performance-aware scheduling of streaming applications using genetic algorithm , 2017, ICCS.

[21]  Abraham Bernstein,et al.  Scalable Linked Data Stream Processing via Network-Aware Workload Scheduling , 2013, SSWS@ISWC.

[22]  Rajkumar Buyya,et al.  Distributed data stream processing and edge computing: A survey on resource elasticity and future directions , 2017, J. Netw. Comput. Appl..

[23]  Ying Li,et al.  Placement of replicated tasks for distributed stream processing systems , 2010, DEBS '10.

[24]  Samir Khuller,et al.  Minimizing Communication Cost in Distributed Multi-query Processing , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[25]  Holger Ziekow,et al.  The DEBS 2015 grand challenge , 2015, DEBS.

[26]  Thomas Plagemann,et al.  Mobile Distributed Complex Event Processing - Ubi Sumus? Quo Vadimus? , 2018, Mobile Big Data.

[27]  Pavel Smrz,et al.  Scheduling Decisions in Stream Processing on Heterogeneous Clusters , 2014, 2014 Eighth International Conference on Complex, Intelligent and Software Intensive Systems.

[28]  Kun-Lung Wu,et al.  Elastic Scaling for Data Stream Processing , 2014, IEEE Transactions on Parallel and Distributed Systems.

[29]  Vladimir Vlassov,et al.  SpanEdge: Towards Unifying Stream Processing over Central and Near-the-Edge Data Centers , 2016, 2016 IEEE/ACM Symposium on Edge Computing (SEC).

[30]  Vincenzo Grassi,et al.  Optimal Operator Replication and Placement for Distributed Stream Processing Systems , 2017, PERV.

[31]  Frank Dürr,et al.  Solving the Multi-Operator Placement Problem in Large-Scale Operator Networks , 2010, 2010 Proceedings of 19th International Conference on Computer Communications and Networks.

[32]  Stratis Viglas,et al.  Fast Heuristics for Near-Optimal Task Allocation in Data Stream Processing over Clusters , 2014, CIKM.

[33]  Rodrigo Fonseca,et al.  Managing parallelism for stream processing in the cloud , 2012, HotCDP '12.

[34]  Ning Xu,et al.  StroMAX: Partitioning-Based Scheduler for Real-Time Stream Processing System , 2017, DASFAA.

[35]  Roberto Baldoni,et al.  Adaptive online scheduling in storm , 2013, DEBS.

[36]  Ching-Lai Hwang,et al.  Multiple attribute decision making : an introduction , 1995 .

[37]  Edward G. Coffman,et al.  Approximation algorithms for bin packing: a survey , 1996 .

[38]  Beng Chin Ooi,et al.  Efficient Dynamic Operator Placement in a Locally Distributed Continuous Query System , 2006, OTM Conferences.

[39]  Valeria Cardellini,et al.  Decentralized self-adaptation for elastic Data Stream Processing , 2018, Future Gener. Comput. Syst..

[40]  Daniel Kuhn,et al.  SQPR: Stream query planning with reuse , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[41]  Jian Tang,et al.  T-Storm: Traffic-Aware Online Scheduling in Storm , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[42]  Lu Tian,et al.  Resource Allocation in Streaming Environments , 2006, 2006 7th IEEE/ACM International Conference on Grid Computing.

[43]  Felix Naumann,et al.  The Stratosphere platform for big data analytics , 2014, The VLDB Journal.

[44]  Shrideep Pallickara,et al.  Online Scheduling and Interference Alleviation for Low-Latency, High-Throughput Processing of Data Streams , 2017, IEEE Transactions on Parallel and Distributed Systems.

[45]  Cong Wang,et al.  Twitter Heron: Towards Extensible Streaming Engines , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[46]  Hamid Reza Arkian,et al.  MIST: Fog-based data analytics scheme with cost-efficient resource provisioning for IoT crowdsensing applications , 2017, J. Netw. Comput. Appl..

[47]  David M. Eyers,et al.  T3-Scheduler: A topology and Traffic aware two-level Scheduler for stream processing systems in a heterogeneous cluster , 2018, Future Gener. Comput. Syst..

[48]  Robert Tappan Morris,et al.  Vivaldi: a decentralized network coordinate system , 2004, SIGCOMM '04.

[49]  Vincenzo Grassi,et al.  Optimal operator placement for distributed stream processing applications , 2016, DEBS.

[50]  Ioana Stanoi,et al.  WhiteWater: Distributed Processing of Fast Streams , 2007, IEEE Transactions on Knowledge and Data Engineering.