Deadline-Constrained MapReduce Scheduling Based on Graph Modelling

MapReduce is a software framework for processing data-intensive applications with a parallel manner in cloud computing systems. There are also an increasing number of MapReduce jobs that require deadline guarantees. The existing deadline-concerning scheduling schemes do not consider the two problems in the MapReduce computing environment: slot performance heterogeneity and job time variation. In this paper, we utilize the Bipartite Graph modeling to propose a new MapReduce Scheduler called the BGMRS. The BGMRS can obtain the optimal solution of the deadline-constrained scheduling problem by transforming the problem into a well-known graph problem: minimum weighted bipartite matching. The BGMRS has the following features. It considers the heterogeneous cloud computing environment, such that the computing resources of some nodes cannot meet the deadlines of some jobs. As the job progresses, the BGMRS can dynamically find different computing resources for running the job without violating the job deadline. This is beneficial in the computing resource utilization. The BGMRS can also trade the data locality off against the deadline to make more jobs with deadline guarantees. If the available computing resources of the system cannot meet all job deadlines, the BGMRS can minimize the number of jobs with the deadline violation. Finally, simulation experiments are performed to demonstrate the effectiveness of the BGMRS in the deadline-constrained scheduling.

[1]  Zhen Xiao,et al.  Improving MapReduce Performance Using Smart Speculative Execution Strategy , 2014, IEEE Transactions on Computers.

[2]  Chita R. Das,et al.  Modeling and synthesizing task placement constraints in Google compute clusters , 2011, SoCC.

[3]  Kenli Li,et al.  MTSD: A Task Scheduling Algorithm for MapReduce Base on Deadline Constraints , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[6]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[7]  J. Morris Chang,et al.  QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing Systems , 2013, IEEE Transactions on Cloud Computing.

[8]  Ying Wang,et al.  Scheduling Mixed Real-Time and Non-real-Time Applications in MapReduce Environment , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[9]  Ling Liu,et al.  Purlieus: Locality-aware resource allocation for MapReduce in a cloud , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[10]  Jordi Torres,et al.  Deadline-Based MapReduce Workload Management , 2013, IEEE Transactions on Network and Service Management.

[11]  Kyong Hoon Kim,et al.  Minimizing Cost of Virtual Machines for Deadline-Constrained MapReduce Applications in the Cloud , 2012, 2012 ACM/IEEE 13th International Conference on Grid Computing.

[12]  Rajeev Gandhi,et al.  An Analysis of Traces from a Production MapReduce Cluster , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[13]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[14]  Rüdiger Geib,et al.  Framework for TCP Throughput Testing , 2011, RFC.

[15]  T. S. Eugene Ng,et al.  The Impact of Virtualization on Network Performance of Amazon EC2 Data Center , 2010, 2010 Proceedings IEEE INFOCOM.

[16]  Pangfeng Liu,et al.  Optimal Algorithms for Cross-Rack Communication Optimization in MapReduce Framework , 2011, 2011 IEEE 4th International Conference on Cloud Computing.