A Topology and Traffic Aware Two-Level Scheduler for Stream Processing Systems in a Heterogeneous Cluster

To efficiently handle a large volume of data, scheduling algorithms in stream processing systems need to minimise the data movement between communicating tasks to improve system throughput. However, finding an optimal scheduling algorithm for these systems is NP-hard. In this paper, we propose a heuristic scheduling algorithm for a heterogeneous cluster—T3-Scheduler—that can efficiently identify the communicating tasks and assign them to the same node, up to a specified level of utilisation for that node. Using three common micro-benchmarks and an evaluation using a real-world application, we demonstrate that T3-Scheduler outperforms current state-of-the-art scheduling algorithms, such as Aniello et al.’s popular ‘Online scheduler’, improving throughput by 20–72% for micro-benchmarks and 60% for the real-world application.

[1]  Indranil Gupta,et al.  Stela: Enabling Stream Processing Systems to Scale-in and Scale-out On-demand , 2016, 2016 IEEE International Conference on Cloud Engineering (IC2E).

[2]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[3]  Mohammad Hosseini,et al.  R-Storm: Resource-Aware Scheduling in Storm , 2015, Middleware.

[4]  Zhiyi Huang,et al.  P-Scheduler: adaptive hierarchical scheduling in apache storm , 2016, ACSW.

[5]  Stratis Viglas,et al.  Fast Heuristics for Near-Optimal Task Allocation in Data Stream Processing over Clusters , 2014, CIKM.

[6]  Jian Tang,et al.  T-Storm: Traffic-Aware Online Scheduling in Storm , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[7]  Sharma Chakravarthy,et al.  Stream Data Processing: A Quality of Service Perspective - Modeling, Scheduling, Load Shedding, and Complex Event Processing , 2009, Advances in Database Systems.

[8]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[9]  Vincenzo Grassi,et al.  Optimal operator placement for distributed stream processing applications , 2016, DEBS.

[10]  Amar Shan,et al.  Heterogeneous processing: a strategy for augmenting moore's law , 2006 .

[11]  Roberto Baldoni,et al.  Adaptive online scheduling in storm , 2013, DEBS.

[12]  Yin Yang,et al.  DRS: Dynamic Resource Scheduling for Real-Time Analytics over Fast Streams , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.