MapReduce is a software framework which canparallelize the job execution by dividing a job into a number ofmap and reduce tasks in cloud computing systems. Due to somereasons (hardware malfunction, input data skew, heterogeneouscloud environment, etc.), certain map (reduce) tasks may takeabnormally longer execution time than other tasks. As a result,the straggler map (reduce) tasks will significantly affect theentire job completion time, which is called the straggler problem.Speculative execution is a well-known method to solve thestraggler problem by re-running a straggler task in a fasternode. However, existing speculative execution schemes do not 1)preclude unsuitable idle slots for straggler tasks, 2) maximize theidle slot utilization to re-run straggler tasks as many as possible,and 3) make a re-run straggler tasks with the consistent executionprogress with non-straggler task. In this paper, we are basedon a well-known graph problem: bipartite matching to proposea new speculative execution scheme called the BM. First, theBM filters all idle slots to generate a backup slot set for eachstraggler tasks. Then, the straggler tasks and their correspondingbackup slot sets are used to model a bipartite graph. For eachstraggler task, we also estimate the progress difference betweenthe re-run straggler task and the non-straggler tasks as the edgeweight on the bipartite graph. Next, the optimal matching on theweighted bipartite graph is solved by formulating a correspondinginteger linear programming (ILP) model. Finally, the speculativeexecution of straggler tasks is scheduled by following the obtainedthe optimal matching solution. Simulation experiments are alsoperformed to demonstrate the improvement of the BM in theMapReudce speculative execution.
[1]
Jordi Torres,et al.
Deadline-Based MapReduce Workload Management
,
2013,
IEEE Transactions on Network and Service Management.
[2]
Thomas L. Magnanti,et al.
Applied Mathematical Programming
,
1977
.
[3]
M. Balazinska,et al.
A Study of Skew in MapReduce Applications
,
2011
.
[4]
Yuan Yu,et al.
Dryad: distributed data-parallel programs from sequential building blocks
,
2007,
EuroSys '07.
[5]
Sanjay Ghemawat,et al.
MapReduce: Simplified Data Processing on Large Clusters
,
2004,
OSDI.
[6]
Randy H. Katz,et al.
Improving MapReduce Performance in Heterogeneous Environments
,
2008,
OSDI.
[7]
Zhen Xiao,et al.
Improving MapReduce Performance Using Smart Speculative Execution Strategy
,
2014,
IEEE Transactions on Computers.
[8]
Chita R. Das,et al.
Modeling and synthesizing task placement constraints in Google compute clusters
,
2011,
SoCC.
[9]
Tom White,et al.
Hadoop: The Definitive Guide
,
2009
.
[10]
Hairong Kuang,et al.
The Hadoop Distributed File System
,
2010,
2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).