Improving MapReduce Performance Using Smart Speculative Execution Strategy

MapReduce is a widely used parallel computing framework for large scale data processing. The two major performance metrics in MapReduce are job execution time and cluster throughput. They can be seriously impacted by straggler machines-machines on which tasks take an unusually long time to finish. Speculative execution is a common approach for dealing with the straggler problem by simply backing up those slow running tasks on alternative machines. Multiple speculative execution strategies have been proposed, but they have some pitfalls: (i) Use average progress rate to identify slow tasks while in reality the progress rate can be unstable and misleading, (ii) Cannot appropriately handle the situation when there exists data skew among the tasks, (iii) Do not consider whether backup tasks can finish earlier when choosing backup worker nodes. In this paper, we first present a detailed analysis of scenarios where existing strategies cannot work well. Then we develop a new strategy, maximum cost performance (MCP), which improves the effectiveness of speculative execution significantly. To accurately and promptly identify stragglers, we provide the following methods in MCP: (i) Use both the progress rate and the process bandwidth within a phase to select slow tasks, (ii) Use exponentially weighted moving average (EWMA) to predict process speed and calculate a task's remaining time, (iii) Determine which task to backup based on the load of a cluster using a cost-benefit model. To choose proper worker nodes for backup tasks, we take both data locality and data skew into consideration. We evaluate MCP in a cluster of 101 virtual machines running a variety of applications on 30 physical servers. Experiment results show that MCP can run jobs up to 39 percent faster and improve the cluster throughput by up to 44 percent compared to Hadoop-0.21.

[1]  Albert G. Greenberg,et al.  Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.

[2]  M. Balazinska,et al.  A Study of Skew in MapReduce Applications , 2011 .

[3]  Thomas Sandholm,et al.  MapReduce optimization using regulated dynamic prioritization , 2009, SIGMETRICS '09.

[4]  Jingren Zhou,et al.  SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[5]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[6]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[7]  Albert G. Greenberg,et al.  Scarlett: coping with skewed content popularity in mapreduce clusters , 2011, EuroSys '11.

[8]  Shivnath Babu,et al.  Towards automatic optimization of MapReduce programs , 2010, SoCC '10.

[9]  Xin Yuan,et al.  STAR-MPI: self tuned adaptive routines for MPI collective operations , 2006, ICS '06.

[10]  Rajeev Gandhi,et al.  Ganesha: blackBox diagnosis of MapReduce systems , 2010, PERV.

[11]  Jorge-Arnulfo Quiané-Ruiz,et al.  RAFTing MapReduce: Fast recovery on the RAFT , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[12]  Michael Isard,et al.  DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language , 2008, OSDI.

[13]  P H Ellaway,et al.  Cumulative sum technique and its application to the analysis of peristimulus time histograms. , 1978, Electroencephalography and clinical neurophysiology.

[14]  Wu-chun Feng,et al.  MOON: MapReduce On Opportunistic eNvironments , 2010, HPDC '10.

[15]  Herodotos Herodotou,et al.  Profiling, what-if analysis, and cost-based optimization of MapReduce programs , 2011, Proc. VLDB Endow..

[16]  A. Kivity,et al.  kvm : the Linux Virtual Machine Monitor , 2007 .

[17]  Sathiamoorthy Manoharan,et al.  Effect of task duplication on the assignment of dependency graphs , 2001, Parallel Comput..

[18]  Joseph M. Hellerstein,et al.  MapReduce Online , 2010, NSDI.

[19]  Bora Uçar,et al.  Task assignment in heterogeneous computing systems , 2006, J. Parallel Distributed Comput..

[20]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[21]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[22]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[23]  Vitaly Shmatikov,et al.  Airavat: Security and Privacy for MapReduce , 2010, NSDI.

[24]  Christoforos E. Kozyrakis,et al.  On the energy (in)efficiency of Hadoop clusters , 2010, OPSR.

[25]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[26]  Gabriel Antoniu,et al.  BlobSeer: Bringing high throughput under heavy concurrency to Hadoop Map-Reduce applications , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[27]  Douglas Stott Parker,et al.  Map-reduce-merge: simplified relational data processing on large clusters , 2007, SIGMOD '07.