MapReduce Scheduling for Deadline-Constrained Jobs in Heterogeneous Cloud Computing Systems

MapReduce is a software framework for processing data-intensive applications with a parallel manner in cloud computing systems. Some MapReduce jobs have the deadline requirements for their job execution. The existing deadline-constrained MapReduce scheduling schemes do not consider the following two problems: various node performance and dynamical task execution time. In this paper, we utilize the Bipartite Graph modelling to propose a new MapReduce Scheduler called the BGMRS. The BGMRS can obtain the optimal solution of the deadline-constrained scheduling problem by transforming the problem into a well-known graph problem: minimum weighted bipartite matching. The BGMRS has the following features. It considers the heterogeneous cloud computing environment, such that the computing resources of some nodes cannot meet the deadlines of some jobs. In addition to meeting the deadline requirement, the BGMRS also takes the data locality into the computing resource allocation for shortening the data access time of a job. However, if the total available computing resources of the system cannot satisfy the deadline requirements of all jobs, the BGMRS can minimize the number of jobs with the deadline violation. Finally, both simulation and testbed experiments are performed to demonstrate the effectiveness of the BGMRS in the deadline-constrained scheduling.

[1]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[2]  Jakob Engblom,et al.  The worst-case execution-time problem—overview of methods and survey of tools , 2008, TECS.

[3]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[4]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[5]  Thomas Sandholm,et al.  Dynamic Proportional Share Scheduling in Hadoop , 2010, JSSPP.

[6]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[7]  T. S. Eugene Ng,et al.  The Impact of Virtualization on Network Performance of Amazon EC2 Data Center , 2010, 2010 Proceedings IEEE INFOCOM.

[8]  Roy H. Campbell,et al.  ARIA: automatic resource inference and allocation for mapreduce environments , 2011, ICAC '11.

[9]  Pangfeng Liu,et al.  Optimal Algorithms for Cross-Rack Communication Optimization in MapReduce Framework , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[10]  Ying Wang,et al.  Scheduling Mixed Real-Time and Non-real-Time Applications in MapReduce Environment , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[11]  Rüdiger Geib,et al.  Framework for TCP Throughput Testing , 2011, RFC.

[12]  Ling Liu,et al.  Purlieus: Locality-aware resource allocation for MapReduce in a cloud , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[13]  Ying Li,et al.  A Power-Aware Scheduling of MapReduce Applications in the Cloud , 2011, 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing.

[14]  Kyong Hoon Kim,et al.  Minimizing Cost of Virtual Machines for Deadline-Constrained MapReduce Applications in the Cloud , 2012, 2012 ACM/IEEE 13th International Conference on Grid Computing.

[15]  L. S. S. Reddy,et al.  Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments , 2012, ArXiv.

[16]  Albert Y. Zomaya,et al.  Non-intrusive Slot Layering in Hadoop , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[17]  Cristina L. Abad,et al.  Natjam: design and evaluation of eviction policies for supporting priorities and deadlines in mapreduce clusters , 2013, SoCC.

[18]  Rajkumar Buyya,et al.  Scaling MapReduce Applications Across Hybrid Clouds to Meet Soft Deadlines , 2013, 2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA).

[19]  Jordi Torres,et al.  Deadline-Based MapReduce Workload Management , 2013, IEEE Transactions on Network and Service Management.

[20]  Zhen Xiao,et al.  Improving MapReduce Performance Using Smart Speculative Execution Strategy , 2014, IEEE Transactions on Computers.

[21]  NIDHI TIWARI,et al.  Classification Framework of MapReduce Scheduling Algorithms , 2015, ACM Comput. Surv..

[22]  Anthony A. Maciejewski,et al.  Power and Thermal-Aware Workload Allocation in Heterogeneous Data Centers , 2015, IEEE Transactions on Computers.

[23]  Mohamed Faten Zhani,et al.  PRISM: Fine-Grained Resource-Aware Scheduling for MapReduce , 2015, IEEE Transactions on Cloud Computing.

[24]  Yi Yao,et al.  Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters , 2017, IEEE Transactions on Cloud Computing.

[25]  Pietro Michiardi,et al.  HFSP: Bringing Size-Based Scheduling To Hadoop , 2017, IEEE Transactions on Cloud Computing.