Topology-aware task allocation for online distributed stream processing applications with latency constraints

Abstract There have been increasing demands for real time processing of the ever-growing data. In order to meet this requirement and ensure the reliable processing of streaming data, a variety of distributed stream processing architectures and platforms have been developed, which handles the fundamental task of allocating processing tasks to the currently available physical resources and routing streaming data between these resources. However, many stream processing systems lack an intelligent scheduling mechanism, in which their default schedulers allocate tasks without taking resource demands and availability, or the transfer latency between resources into consideration. Besides, stream processing has a strict request for latency. Thus it is important to give latency guarantee for distributed stream processing. In this paper, we propose two new algorithms for stream processing with latency guarantee, both the algorithms consider transfer latency and resource demand in task allocation. Both algorithms can guarantee latency constraints. Algorithm AHA reduces more than 21.3% and 58.9% resources compared with the greedy and the round-robin algorithms, and algorithm PHA further improves the resource utilization to 32.1% and 73.2%.

[1]  Xiaohui Wei,et al.  Topology-Aware Task Allocation for Distributed Stream Processing with Latency Guarantee , 2018, ICAIP '18.

[2]  Michael Stonebraker,et al.  High-availability algorithms for distributed stream processing , 2005, 21st International Conference on Data Engineering (ICDE'05).

[3]  Xiang Li,et al.  Integrated recovery and task allocation for stream processing , 2017, 2017 IEEE 36th International Performance Computing and Communications Conference (IPCCC).

[4]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[5]  Ioana Stanoi,et al.  WhiteWater: Distributed Processing of Fast Streams , 2007, IEEE Transactions on Knowledge and Data Engineering.

[6]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[7]  Beng Chin Ooi,et al.  Efficient Dynamic Operator Placement in a Locally Distributed Continuous Query System , 2006, OTM Conferences.

[8]  Jian Tang,et al.  T-Storm: Traffic-Aware Online Scheduling in Storm , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[9]  Nishant Garg Apache Kafka , 2013 .

[10]  Daniele Vigo,et al.  Bin packing approximation algorithms: Survey and classification , 2013 .

[11]  Depei Qian,et al.  Operator placement with QoS constraints for distributed stream processing , 2011, 2011 7th International Conference on Network and Service Management.

[12]  Jignesh M. Patel,et al.  Twitter Heron: Stream Processing at Scale , 2015, SIGMOD Conference.

[13]  Xiang Li,et al.  Task Allocation for Stream Processing with Recovery Latency Guarantee , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).

[14]  Vincenzo Grassi,et al.  Optimal operator placement for distributed stream processing applications , 2016, DEBS.

[15]  Abraham Bernstein,et al.  Scalable Linked Data Stream Processing via Network-Aware Workload Scheduling , 2013, SSWS@ISWC.

[16]  Guenter Hesse,et al.  Conceptual Survey on Data Stream Processing Systems , 2015, 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS).

[17]  Roberto Baldoni,et al.  Adaptive online scheduling in storm , 2013, DEBS.

[18]  Xiaohui Wei,et al.  MapReduce delay scheduling with deadline constraint , 2014, Concurr. Comput. Pract. Exp..

[19]  Michael Stonebraker,et al.  The 8 requirements of real-time stream processing , 2005, SGMD.

[20]  Karsten Schwan,et al.  Cache Topology Aware Mapping of Stream Processing Applications onto CMPs , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.

[21]  Rajeev Motwani,et al.  Chain: operator scheduling for memory minimization in data stream systems , 2003, SIGMOD '03.

[22]  Xiaohui Wei,et al.  An Optimal Checkpointing Model with Online OCI Adjustment for Stream Processing Applications , 2018, 2018 27th International Conference on Computer Communication and Networks (ICCCN).

[23]  Tiziano De Matteis,et al.  Proactive elasticity and energy awareness in data stream processing , 2017, J. Syst. Softw..

[24]  Stratis Viglas,et al.  Fast Heuristics for Near-Optimal Task Allocation in Data Stream Processing over Clusters , 2014, CIKM.

[25]  Henrik I. Christensen,et al.  Approximation and online algorithms for multidimensional bin packing: A survey , 2017, Comput. Sci. Rev..

[26]  Rodrigo Fonseca,et al.  Managing parallelism for stream processing in the cloud , 2012, HotCDP '12.

[27]  Kun Yang,et al.  Topology-Aware Partial Virtual Cluster Mapping Algorithm on Shared Distributed Infrastructures , 2014, IEEE Transactions on Parallel and Distributed Systems.

[28]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[29]  Zhengping Qian,et al.  TimeStream: reliable stream computation in the cloud , 2013, EuroSys '13.

[30]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.