A Latency-Sensitive Elastic Adaptive Scheduling in Distributed Stream Computing Systems

With the massive growth of big data applications, the requirement for data processing speed is getting higher and higher in stream computing systems. The Storm, as one of the most popular distributed stream computing systems, has received more attention. However, the Storm's traditional scheduling strategy is not ideal for processing a large volume of streaming data. The resource scheduling in a distributed stream computing system should consider not only node allocation status but also fluctuating input rates of data stream. To address this problem, this paper has completed the following work: (1) A performance model La-Stream (latency-sensitive elastic adaptive scheduling) is proposed and built by adopting a quantitative method for calculating the amount of computation required between task map nodes and node communication. (2) A La-Stream based algorithm is proposed. The algorithm dynamically plans a resource allocation scheme with minimal data processing latency among available resources to achieve optimal allocation. (3) Three functional modules of La-steam are proposed and implemented: module Monitor, module Optimizer and module Scheduler. The three modules are integrated into the Storm platform with minimal overhead. Several sets of experiments are conducted, verifying the feasibility and effectiveness of La-Stream.

[1]  Yin Yang,et al.  DRS: Dynamic Resource Scheduling for Real-Time Analytics over Fast Streams , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.

[2]  Xiao Qin,et al.  EAD and PEBD: Two Energy-Aware Duplication Scheduling Algorithms for Parallel Tasks on Homogeneous Clusters , 2011, IEEE Transactions on Computers.

[3]  Jian Tang,et al.  T-Storm: Traffic-Aware Online Scheduling in Storm , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[4]  Manuel Prieto,et al.  Survey of Energy-Cognizant Scheduling Techniques , 2013, IEEE Transactions on Parallel and Distributed Systems.

[5]  Ying Wah Teh,et al.  On Density-Based Data Streams Clustering Algorithms: A Survey , 2014, Journal of Computer Science and Technology.

[6]  Dawei Sun,et al.  A Stable Online Scheduling Strategy for Real-Time Stream Computing Over Fluctuating Big Data Streams , 2016, IEEE Access.

[7]  Mohammad Hosseini,et al.  R-Storm: Resource-Aware Scheduling in Storm , 2015, Middleware.

[8]  Hai Jin,et al.  Runtime‐aware adaptive scheduling in stream processing , 2016, Concurr. Comput. Pract. Exp..

[9]  Pan Shang,et al.  RB-storm: Resource Balance Scheduling in Apache Storm , 2017, 2017 6th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI).

[10]  Rajkumar Buyya,et al.  Big Data computing and clouds: Trends and future directions , 2013, J. Parallel Distributed Comput..

[11]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[12]  Sanjeev Baskiyar,et al.  Energy aware DAG scheduling on heterogeneous systems , 2010, Cluster Computing.

[13]  Sheng-Tzong Cheng,et al.  An effective node-selection scheme for the energy efficiency of solar-powered WSNs in a stream environment , 2014, Expert Syst. Appl..

[14]  Roberto Baldoni,et al.  Adaptive online scheduling in storm , 2013, DEBS.

[15]  Fei Hu,et al.  Adaptive task scheduling in storm , 2015, 2015 4th International Conference on Computer Science and Network Technology (ICCSNT).

[16]  Jie Wu,et al.  Dache: A data aware caching for big-data applications using the MapReduce framework , 2014 .