A Parameter Dynamic-Tuning Scheduling Algorithm Based on History in Heterogeneous Environments

In MapReduce model, the job execution time was prolonged by the straggler tasks in heterogeneity environments. The LATE scheduler has introduced the longest remaining time strategy, but it also has some drawbacks such as inaccurate estimated time and the wasting of system resources. In order to solve these problems, we propose two main algorithms : The parameter dynamic-tuning algorithm based history estimates progress of a task accurately since it dynamically tunes the weight of each phase of a map task and a reduce task according to the historical values of the weights, The evaluation-scheduling algorithm reduce the wasting of system resources by evaluating the free slot before launching a straggler task on this node. The two main algorithms are implemented in hadoop 0.20.1. The environment results are satisfaction to our expects and significantly reduce the wasting of system resources.

[1]  PikeRob,et al.  Interpreting the data , 2005 .

[2]  Rob Pike,et al.  Interpreting the data: Parallel analysis with Sawzall , 2005, Sci. Program..

[3]  Malgorzata Steinder,et al.  Performance-driven task co-scheduling for MapReduce environments , 2010, 2010 IEEE Network Operations and Management Symposium - NOMS 2010.

[4]  Yun Tian,et al.  Improving MapReduce performance through data placement in heterogeneous Hadoop clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[5]  Lavanya Ramakrishnan,et al.  Adapting MapReduce for HPC environments , 2011, HPDC '11.

[6]  Leonid Oliker,et al.  Scheduling in Heterogeneous Grid Environments: The Effects of DataMigration , 2004 .

[7]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[8]  Magdalena Balazinska,et al.  Estimating the progress of MapReduce pipelines , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[9]  Chao Tian,et al.  A Dynamic MapReduce Scheduler for Heterogeneous Workloads , 2009, 2009 Eighth International Conference on Grid and Cooperative Computing.

[10]  Albert G. Greenberg,et al.  Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.

[11]  Quan Chen,et al.  HAT: history-based auto-tuning MapReduce in heterogeneous environments , 2013, The Journal of Supercomputing.

[12]  Jun Huang,et al.  A heterogeneity-aware approach to load balancing of computational tasks: a theoretical and simulation study , 2008, Cluster Computing.

[13]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[14]  Michael Georgiopoulos,et al.  A Grid Based System for Data Mining Using MapReduce , 2007 .

[15]  Qin Zheng Dynamic adaptation of DAGs with uncertain execution times in heterogeneous computing systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[16]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[17]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[18]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[19]  Magdalena Balazinska,et al.  ParaTimer: a progress indicator for MapReduce DAGs , 2010, SIGMOD Conference.

[20]  A. Friesen,et al.  KAMD : A Progress Estimator for MapReduce Pipelines , 2009 .